# SQANTI3 5.0.0 ![SQANTI3 logo](../images/sq3-logo.png) # SQANTI3 SQANTI3 is the newest version of the [SQANTI tool](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5848618/) that merges features from [SQANTI](https://github.com/ConesaLab/SQANTI) and [SQANTI2](https://github.com/Magdoll/SQANTI2), together with new additions. SQANTI3 will continue as an integrated development aiming to provide the best characterization for your new long read-defined transcriptome. SQANTI3 is the first module of the [Functional IsoTranscriptomics (FIT)](https://tappas.org/) framework, which also includes IsoAnnot and tappAS. ## Latest updates Latest SQANTI3 release (01/06/2022) is **version 5.0**. **WARNING:** v5.0 constitutes a major release of the SQANTI3 software. **Versions of SQANTI3 >= 5.0 will not have backward compatibility** with previous releases and their output (v4.3 and earlier). Users that wish to apply any of the new functionalities in v5.0 to output files from older versions will herefore need to re-run SQANTI3 QC. New features implemented in SQANTI3 v5.0: * Implemented new **machine learning-based filter**. * Updated **rules filter**: users can now define their own set of rules using a JSON file. By default, the rules filter applies the same set of rules that were implemented in the old `sqanti3_RulesFilter.py` script. * The `sqanti3_RulesFilter.py` script is now deprecated and has been replaced by `sqanti3_filter.py`, which works a wrapper for both filters (see details in the [documentation](https://github.com/ConesaLab/SQANTI3/wiki/Running-SQANTI3-filter)). * IsoAnnotLite updated to version 2.7.3. * Substantial modification of the SQANTI3 directory structure, with `utilities` folder now being divided into subfolders that group the scripts by their function. * Added a column in the classification file to indicate whether a polyA motif was found, which adds to the existing column detailing the detected motif (details [here](https://github.com/ConesaLab/SQANTI3/issues/138)). * Changed CAGE argument and CAGE/polyA columns to capital letters (for consistency across columns and arguments). * The `example` folder now includes sample commands and output files for SQANTI3 QC, rules filter and machine learning filter. * Added new supported transcript model (STM) plots to the SQANTI3 QC report. * Minor fixes/enhancements: * Included cython (cDNA_cupcake dependency) as a dependency in the SQANTI3 conda environment. * pip installed in conda environment. * When supplied, the new `sqanti3_filter.py` filters the `sqanti3_qc.py` output files using the filter result (rules or ML). This was not previously done by `sqanti3_RulesFilter.py`. * Antisense vs intergenic bug: fixed inconsistencies in classification of isoforms across the two categories. * Fixed deprecation warnings in calculation of ratioTSS. * Minor report updates. ## Documentation For detailed documentation, please visit [the SQANTI3 wiki](https://github.com/ConesaLab/SQANTI3/wiki). ### Wiki contents: * [Introduction to SQANTI3](https://github.com/ConesaLab/SQANTI3/wiki/Introduction-to-SQANTI3) * [Dependencies and installation](https://github.com/ConesaLab/SQANTI3/wiki/Dependencies-and-installation) * [Isoform classification: categories and subcategories](https://github.com/ConesaLab/SQANTI3/wiki/SQANTI3-isoform-classification:-categories-and-subcategories) * [Running SQANTI3 quality control](https://github.com/ConesaLab/SQANTI3/wiki/Running-SQANTI3-Quality-Control) * [Understanding the output of SQANTI3 QC](https://github.com/ConesaLab/SQANTI3/wiki/Understanding-the-output-of-SQANTI3-QC) * [Running SQANTI3 filter](https://github.com/ConesaLab/SQANTI3/wiki/Running-SQANTI3-filter) * [Tutorial: running SQANTI3 on an example dataset](https://github.com/ConesaLab/SQANTI3/wiki/Tutorial:-running-SQANTI3-on-an-example-dataset) ## Activating the conda env Check out a node with `qrsh` and then run these commands: ```console bash source /local/cluster/SQANTI3/activate.sh ``` To use in SGE, generate a bash script with the source activate line above and then the SQANTI3 commands you wish to run. ## Location and version ```console $ which sqanti3_qc.py /local/cluster/SQANTI3/bin/sqanti3_qc.py $ sqanti3_qc.py --version R scripting front-end version 3.6.1 (2019-07-05) SQANTI3 2.0.0 ``` ## help message ```console $ sqanti3_qc.py --help R scripting front-end version 3.6.1 (2019-07-05) usage: sqanti3_qc.py [-h] [--min_ref_len MIN_REF_LEN] [--force_id_ignore] [--aligner_choice {minimap2,deSALT,gmap}] [--cage_peak CAGE_PEAK] [--polyA_motif_list POLYA_MOTIF_LIST] [--polyA_peak POLYA_PEAK] [--phyloP_bed PHYLOP_BED] [--skipORF] [--is_fusion] [--orf_input ORF_INPUT] [-g] [-e EXPRESSION] [-x GMAP_INDEX] [-t CPUS] [-n CHUNKS] [-o OUTPUT] [-d DIR] [-c COVERAGE] [-s SITES] [-w WINDOW] [--genename] [-fl FL_COUNT] [-v] [--isoAnnotLite] [--gff3 GFF3] isoforms annotation genome Structural and Quality Annotation of Novel Transcript Isoforms positional arguments: isoforms Isoforms (FASTA/FASTQ) or GTF format. Recommend provide GTF format with the --gtf option. annotation Reference annotation file (GTF format) genome Reference genome (Fasta format) optional arguments: -h, --help show this help message and exit --min_ref_len MIN_REF_LEN Minimum reference transcript length (default: 200 bp) --force_id_ignore Allow the usage of transcript IDs non related with PacBio's nomenclature (PB.X.Y) --aligner_choice {minimap2,deSALT,gmap} --cage_peak CAGE_PEAK FANTOM5 Cage Peak (BED format, optional) --polyA_motif_list POLYA_MOTIF_LIST Ranked list of polyA motifs (text, optional) --polyA_peak POLYA_PEAK PolyA Peak (BED format, optional) --phyloP_bed PHYLOP_BED PhyloP BED for conservation score (BED, optional) --skipORF Skip ORF prediction (to save time) --is_fusion Input are fusion isoforms, must supply GTF as input using --gtf --orf_input ORF_INPUT Input fasta to run ORF on. By default, ORF is run on genome-corrected fasta - this overrides it. If input is fusion (--is_fusion), this must be provided for ORF prediction. -g, --gtf Use when running SQANTI by using as input a gtf of isoforms -e EXPRESSION, --expression EXPRESSION Expression matrix (supported: Kallisto tsv) -x GMAP_INDEX, --gmap_index GMAP_INDEX Path and prefix of the reference index created by gmap_build. Mandatory if using GMAP unless -g option is specified. -t CPUS, --cpus CPUS Number of threads used during alignment by aligners. (default: 10) -n CHUNKS, --chunks CHUNKS Number of chunks to split SQANTI3 analysis in for speed up (default: 1). -o OUTPUT, --output OUTPUT Prefix for output files. -d DIR, --dir DIR Directory for output files. Default: Directory where the script was run. -c COVERAGE, --coverage COVERAGE Junction coverage files (provide a single file, comma- delmited filenames, or a file pattern, ex: "mydir/*.junctions"). -s SITES, --sites SITES Set of splice sites to be considered as canonical (comma-separated list of splice sites). Default: GTAG,GCAG,ATAC. -w WINDOW, --window WINDOW Size of the window in the genomic DNA screened for Adenine content downstream of TTS --genename Use gene_name tag from GTF to define genes. Default: gene_id used to define genes -fl FL_COUNT, --fl_count FL_COUNT Full-length PacBio abundance file -v, --version Display program version number. --isoAnnotLite Run isoAnnot Lite to output a tappAS-compatible gff3 file --gff3 GFF3 Precomputed tappAS species specific GFF3 file. It will serve as reference to transfer functional attributes $ sqanti3_RulesFilter.py R scripting front-end version 3.6.1 (2019-07-05) usage: sqanti3_RulesFilter.py [-h] [--sam SAM] [--faa FAA] [-a INTRAPRIMING] [-r RUNALENGTH] [-m MAX_DIST_TO_KNOWN_END] [-c MIN_COV] [--filter_mono_exonic] [--skipGTF] [--skipFaFq] [--skipJunction] [-v] sqanti_class isoforms gtf_file sqanti3_RulesFilter.py: error: the following arguments are required: sqanti_class, isoforms, gtf_file (/local/cluster/SQANTI3) # davised:Linux @ x86_64-conda_cos6-linux-gnu in ~ [22:56:08] C:2 $ sqanti3_RulesFilter.py -h R scripting front-end version 3.6.1 (2019-07-05) usage: sqanti3_RulesFilter.py [-h] [--sam SAM] [--faa FAA] [-a INTRAPRIMING] [-r RUNALENGTH] [-m MAX_DIST_TO_KNOWN_END] [-c MIN_COV] [--filter_mono_exonic] [--skipGTF] [--skipFaFq] [--skipJunction] [-v] sqanti_class isoforms gtf_file Filtering of Isoforms based on SQANTI3 attributes positional arguments: sqanti_class SQANTI classification output file. isoforms fasta/fastq isoform file to be filtered by SQANTI3 gtf_file GTF of the input fasta/fastq optional arguments: -h, --help show this help message and exit --sam SAM (Optional) SAM alignment of the input fasta/fastq --faa FAA (Optional) ORF prediction faa file to be filtered by SQANTI3 -a INTRAPRIMING, --intrapriming INTRAPRIMING Adenine percentage at genomic 3' end to flag an isoform as intra-priming (default: 0.6) -r RUNALENGTH, --runAlength RUNALENGTH Continuous run-A length at genomic 3' end to flag an isoform as intra-priming (default: 6) -m MAX_DIST_TO_KNOWN_END, --max_dist_to_known_end MAX_DIST_TO_KNOWN_END Maximum distance to an annotated 3' end to preserve as a valid 3' end and not filter out (default: 50bp) -c MIN_COV, --min_cov MIN_COV Minimum junction coverage for each isoform (only used if min_cov field is not 'NA'), default: 3 --filter_mono_exonic Filter out all mono-exonic transcripts (default: OFF) --skipGTF Skip output of GTF --skipFaFq Skip output of isoform fasta/fastq --skipJunction Skip output of junctions file -v, --version Display program version number. ``` software ref: research ref: research ref: