# SQANTI3 5.1.0 {{< admonition tip "Conda" true >}} See the 'activating the conda environment' section below to access this software. {{< /admonition >}} ![SQANTI3 logo](../images/sq3-logo.png) ## SQANTI3-5.1.0 SQANTI3 is the newest version of the [SQANTI tool](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5848618/) that merges features from [SQANTI](https://github.com/ConesaLab/SQANTI) and [SQANTI2](https://github.com/Magdoll/SQANTI2), together with new additions. SQANTI3 will continue as an integrated development aiming to provide the best characterization for your new long read-defined transcriptome. SQANTI3 is the first module of the [Functional IsoTranscriptomics (FIT)](https://tappas.org/) framework, which also includes IsoAnnot and tappAS. ___________ ### New features in SQANTI3 v5.1 [LATEST]: #### Major changes: * Implemented new **rescue strategy** to recover transcriptome diversity lost after filtering (see details at the [SQ rescue wiki](https://github.com/ConesaLab/SQANTI3/wiki/Running-SQANTI3-rescue)). * Updated **conda environment** to include rescue dependencies. We recommend creating the environment again in order for SQANTI3 to run without error. * Fixed behavior of **mono-exon transcripts** during **ML filter**: - FSM now undergo intra-primming evaluation if they are mono-exons. - Corrected ML filter output when `--force_multi_exon` option is supplied: mono-exon transcripts will now be labeled as Artifacts. * Fixed reasons file output by **rules filter**: the table now includes correct filtering reasons for **mono-exon transcripts**. * Added an option to rules filter to control for mono-exon transcripts (previously available in ML filter). * Modified the **output of SQANTI3 QC** to incorporate the creation of a complete `params.txt` file, i.e. including all arguments and the full paths of all supplied files. #### Minor fixes/enhancements: - Fixed output path for IsoAnnotLite GFF3 that prevented writing the file to the correct output directory when -gff3 option was not used. - Set temporary file dir for HTML report creation (fixes Singularity container error). ___________ ### New features in SQANTI3 v5.0: #### Major changes: * Implemented new **machine learning-based filter**. * Updated **rules filter**: users can now define their own set of rules using a JSON file. By default, the rules filter applies the same set of rules that were implemented in the old `sqanti3_RulesFilter.py` script. * The `sqanti3_RulesFilter.py` script is now deprecated and has been replaced by `sqanti3_filter.py`, which works a wrapper for both filters (see details in the [documentation](https://github.com/ConesaLab/SQANTI3/wiki/Running-SQANTI3-filter)). * IsoAnnotLite updated to version 2.7.3. * Substantial modification of the SQANTI3 directory structure, with `utilities` folder now being divided into subfolders that group the scripts by their function. * Added a column in the classification file to indicate whether a polyA motif was found, which adds to the existing column detailing the detected motif (details [here](https://github.com/ConesaLab/SQANTI3/issues/138)). * Changed CAGE argument and CAGE/polyA columns to capital letters (for consistency across columns and arguments). * The `example` folder now includes sample commands and output files for SQANTI3 QC, rules filter and machine learning filter. * Added new supported transcript model (STM) plots to the SQANTI3 QC report. #### Minor fixes/enhancements: * Included cython (cDNA_cupcake dependency) as a dependency in the SQANTI3 conda environment. * pip installed in conda environment. * When supplied, the new `sqanti3_filter.py` filters the `sqanti3_qc.py` output files using the filter result (rules or ML). This was not previously done by `sqanti3_RulesFilter.py`. * Antisense vs intergenic bug: fixed inconsistencies in classification of isoforms across the two categories. * Fixed deprecation warnings in calculation of ratioTSS. * Minor report updates. ## Documentation For detailed documentation, please visit [the SQANTI3 wiki](https://github.com/ConesaLab/SQANTI3/wiki). ### Wiki contents: * [Introduction to SQANTI3](https://github.com/ConesaLab/SQANTI3/wiki/Introduction-to-SQANTI3) * [Dependencies and installation](https://github.com/ConesaLab/SQANTI3/wiki/Dependencies-and-installation) * [Isoform classification: categories and subcategories](https://github.com/ConesaLab/SQANTI3/wiki/SQANTI3-isoform-classification:-categories-and-subcategories) * [Running SQANTI3 quality control](https://github.com/ConesaLab/SQANTI3/wiki/Running-SQANTI3-Quality-Control) * [Understanding the output of SQANTI3 QC](https://github.com/ConesaLab/SQANTI3/wiki/Understanding-the-output-of-SQANTI3-QC) * [IsoAnnotLite](https://github.com/ConesaLab/SQANTI3/wiki/IsoAnnotLite) * [Running SQANTI3 filter](https://github.com/ConesaLab/SQANTI3/wiki/Running-SQANTI3-filter) * [Running SQANTI3 rescue](https://github.com/ConesaLab/SQANTI3/wiki/Running-SQANTI3-rescue) * [Tutorial: running SQANTI3 on an example dataset](https://github.com/ConesaLab/SQANTI3/wiki/Tutorial:-running-SQANTI3-on-an-example-dataset) Please, note that we are currently updating and expanding the wiki to provide as much information as possible and enhance the SQANTI3 user experience. Pages under construction -or where information is still missing- will be indicated where appropriate. Thank you for your patience! ## How to cite SQANTI3 SQANTI3 paper is currently in preparation. In the meantime, when using SQANTI3 in your research, please cite the [original SQANTI paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5848618/) as well as this repository: - Tardaguila M, de la Fuente L, Marti C, et al. SQANTI: extensive characterization of long-read transcript sequences for quality control in full-length transcriptome identification and quantification. *Genome Res*, 2018. **28**(3):396-411. doi:10.1101/gr.222976.117 ## Activating the conda environment Check out a node with `qrsh` and then run these commands: ```console bash source /local/cluster/SQANTI3/activate.sh ``` To use in SGE, generate a bash script with the source activate line above and then the SQANTI3 commands you wish to run. ## Location and version ```console $ which sqanti3_qc.py /local/cluster/SQANTI3/bin/sqanti3_qc.py $ sqanti3_qc.py --version R scripting front-end version 4.1.3 (2022-03-10) SQANTI3 5.0 ``` ## help message ```console $ sqanti3_qc.py --help R scripting front-end version 4.1.3 (2022-03-10) usage: sqanti3_qc.py [-h] [--min_ref_len MIN_REF_LEN] [--force_id_ignore] [--aligner_choice {minimap2,deSALT,gmap,uLTRA}] [--CAGE_peak CAGE_PEAK] [--polyA_motif_list POLYA_MOTIF_LIST] [--polyA_peak POLYA_PEAK] [--phyloP_bed PHYLOP_BED] [--skipORF] [--is_fusion] [--orf_input ORF_INPUT] [--fasta] [-e EXPRESSION] [-x GMAP_INDEX] [-t CPUS] [-n CHUNKS] [-o OUTPUT] [-d DIR] [-c COVERAGE] [-s SITES] [-w WINDOW] [--genename] [-fl FL_COUNT] [-v] [--saturation] [--report {html,pdf,both,skip}] [--isoAnnotLite] [--gff3 GFF3] [--short_reads SHORT_READS] [--SR_bam SR_BAM] isoforms annotation genome Structural and Quality Annotation of Novel Transcript Isoforms positional arguments: isoforms Isoforms (FASTA/FASTQ) or GTF format. It is recommended to provide them in GTF format, but if it is needed to map the sequences to the genome use a FASTA/FASTQ file with the --fasta option. annotation Reference annotation file (GTF format) genome Reference genome (Fasta format) optional arguments: -h, --help show this help message and exit --min_ref_len MIN_REF_LEN Minimum reference transcript length (default: 200 bp) --force_id_ignore Allow the usage of transcript IDs non related with PacBio's nomenclature (PB.X.Y) --aligner_choice {minimap2,deSALT,gmap,uLTRA} --CAGE_peak CAGE_PEAK FANTOM5 Cage Peak (BED format, optional) --polyA_motif_list POLYA_MOTIF_LIST Ranked list of polyA motifs (text, optional) --polyA_peak POLYA_PEAK PolyA Peak (BED format, optional) --phyloP_bed PHYLOP_BED PhyloP BED for conservation score (BED, optional) --skipORF Skip ORF prediction (to save time) --is_fusion Input are fusion isoforms, must supply GTF as input --orf_input ORF_INPUT Input fasta to run ORF on. By default, ORF is run on genome-corrected fasta - this overrides it. If input is fusion (--is_fusion), this must be provided for ORF prediction. --fasta Use when running SQANTI by using as input a FASTA/FASTQ with the sequences of isoforms -e EXPRESSION, --expression EXPRESSION Expression matrix (supported: Kallisto tsv) -x GMAP_INDEX, --gmap_index GMAP_INDEX Path and prefix of the reference index created by gmap_build. Mandatory if using GMAP unless -g option is specified. -t CPUS, --cpus CPUS Number of threads used during alignment by aligners. (default: 10) -n CHUNKS, --chunks CHUNKS Number of chunks to split SQANTI3 analysis in for speed up (default: 1). -o OUTPUT, --output OUTPUT Prefix for output files. -d DIR, --dir DIR Directory for output files. Default: Directory where the script was run. -c COVERAGE, --coverage COVERAGE Junction coverage files (provide a single file, comma-delmited filenames, or a file pattern, ex: "mydir/*.junctions"). -s SITES, --sites SITES Set of splice sites to be considered as canonical (comma-separated list of splice sites). Default: GTAG,GCAG,ATAC. -w WINDOW, --window WINDOW Size of the window in the genomic DNA screened for Adenine content downstream of TTS --genename Use gene_name tag from GTF to define genes. Default: gene_id used to define genes -fl FL_COUNT, --fl_count FL_COUNT Full-length PacBio abundance file -v, --version Display program version number. --saturation Include saturation curves into report --report {html,pdf,both,skip} select report format --html --pdf --both --skip --isoAnnotLite Run isoAnnot Lite to output a tappAS-compatible gff3 file --gff3 GFF3 Precomputed tappAS species specific GFF3 file. It will serve as reference to transfer functional attributes --short_reads SHORT_READS File Of File Names (fofn, space separated) with paths to FASTA or FASTQ from Short-Read RNA-Seq. If expression or coverage files are not provided, Kallisto (just for pair-end data) and STAR, respectively, will be run to calculate them. --SR_bam SR_BAM Directory or fofn file with the sorted bam files of Short Reads RNA-Seq mapped against the genome ``` software ref: research ref: research ref: