SQANTI3
SQANTI3 is the newest version of the SQANTI
tool that merges
features from SQANTI and
SQANTI2, together with new additions.
SQANTI3 will continue as an integrated development aiming to provide the best
characterization for your new long read-defined transcriptome.
SQANTI3 is the first module of the Functional IsoTranscriptomics
(FIT) framework, which also includes IsoAnnot and
tappAS.
Latest updates
Latest SQANTI3 release (01/06/2022) is version 5.0.
WARNING: v5.0 constitutes a major release of the SQANTI3 software.
Versions of SQANTI3 >= 5.0 will not have backward compatibility with
previous releases and their output (v4.3 and earlier). Users that wish to
apply any of the new functionalities in v5.0 to output files from older
versions will herefore need to re-run SQANTI3 QC.
New features implemented in SQANTI3 v5.0:
- Implemented new machine learning-based filter.
- Updated rules filter: users can now define their own set of rules using
a JSON file. By default, the rules filter applies the same set of rules that
were implemented in the old
sqanti3_RulesFilter.py
script.
- The
sqanti3_RulesFilter.py
script is now deprecated and has been replaced
by sqanti3_filter.py
, which works a wrapper for both filters (see details
in the
documentation).
- IsoAnnotLite updated to version 2.7.3.
- Substantial modification of the SQANTI3 directory structure, with
utilities
folder now being divided into subfolders that group the scripts
by their function.
- Added a column in the classification file to indicate whether a polyA motif
was found, which adds to the existing column detailing the detected motif
(details here).
- Changed CAGE argument and CAGE/polyA columns to capital letters (for
consistency across columns and arguments).
- The
example
folder now includes sample commands and output files for
SQANTI3 QC, rules filter and machine learning filter.
- Added new supported transcript model (STM) plots to the SQANTI3 QC report.
- Minor fixes/enhancements:
- Included cython (cDNA_cupcake dependency) as a dependency in the SQANTI3 conda environment.
- pip installed in conda environment.
- When supplied, the new
sqanti3_filter.py
filters the sqanti3_qc.py
output files using the filter result (rules or ML). This was not
previously done by sqanti3_RulesFilter.py
.
- Antisense vs intergenic bug: fixed inconsistencies in classification of
isoforms across the two categories.
- Fixed deprecation warnings in calculation of ratioTSS.
- Minor report updates.
Documentation
For detailed documentation, please visit the SQANTI3 wiki.
Wiki contents:
Activating the conda env
Check out a node with qrsh
and then run these commands:
1
2
|
bash
source /local/cluster/SQANTI3/activate.sh
|
To use in SGE, generate a bash script with the source activate line above and
then the SQANTI3 commands you wish to run.
Location and version
1
2
3
4
5
|
$ which sqanti3_qc.py
/local/cluster/SQANTI3/bin/sqanti3_qc.py
$ sqanti3_qc.py --version
R scripting front-end version 3.6.1 (2019-07-05)
SQANTI3 2.0.0
|
help message
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
|
$ sqanti3_qc.py --help
R scripting front-end version 3.6.1 (2019-07-05)
usage: sqanti3_qc.py [-h] [--min_ref_len MIN_REF_LEN] [--force_id_ignore]
[--aligner_choice {minimap2,deSALT,gmap}]
[--cage_peak CAGE_PEAK]
[--polyA_motif_list POLYA_MOTIF_LIST]
[--polyA_peak POLYA_PEAK] [--phyloP_bed PHYLOP_BED]
[--skipORF] [--is_fusion] [--orf_input ORF_INPUT] [-g]
[-e EXPRESSION] [-x GMAP_INDEX] [-t CPUS] [-n CHUNKS]
[-o OUTPUT] [-d DIR] [-c COVERAGE] [-s SITES] [-w WINDOW]
[--genename] [-fl FL_COUNT] [-v] [--isoAnnotLite]
[--gff3 GFF3]
isoforms annotation genome
Structural and Quality Annotation of Novel Transcript Isoforms
positional arguments:
isoforms Isoforms (FASTA/FASTQ) or GTF format. Recommend
provide GTF format with the --gtf option.
annotation Reference annotation file (GTF format)
genome Reference genome (Fasta format)
optional arguments:
-h, --help show this help message and exit
--min_ref_len MIN_REF_LEN
Minimum reference transcript length (default: 200 bp)
--force_id_ignore Allow the usage of transcript IDs non related with
PacBio's nomenclature (PB.X.Y)
--aligner_choice {minimap2,deSALT,gmap}
--cage_peak CAGE_PEAK
FANTOM5 Cage Peak (BED format, optional)
--polyA_motif_list POLYA_MOTIF_LIST
Ranked list of polyA motifs (text, optional)
--polyA_peak POLYA_PEAK
PolyA Peak (BED format, optional)
--phyloP_bed PHYLOP_BED
PhyloP BED for conservation score (BED, optional)
--skipORF Skip ORF prediction (to save time)
--is_fusion Input are fusion isoforms, must supply GTF as input
using --gtf
--orf_input ORF_INPUT
Input fasta to run ORF on. By default, ORF is run on
genome-corrected fasta - this overrides it. If input
is fusion (--is_fusion), this must be provided for ORF
prediction.
-g, --gtf Use when running SQANTI by using as input a gtf of
isoforms
-e EXPRESSION, --expression EXPRESSION
Expression matrix (supported: Kallisto tsv)
-x GMAP_INDEX, --gmap_index GMAP_INDEX
Path and prefix of the reference index created by
gmap_build. Mandatory if using GMAP unless -g option
is specified.
-t CPUS, --cpus CPUS Number of threads used during alignment by aligners.
(default: 10)
-n CHUNKS, --chunks CHUNKS
Number of chunks to split SQANTI3 analysis in for
speed up (default: 1).
-o OUTPUT, --output OUTPUT
Prefix for output files.
-d DIR, --dir DIR Directory for output files. Default: Directory where
the script was run.
-c COVERAGE, --coverage COVERAGE
Junction coverage files (provide a single file, comma-
delmited filenames, or a file pattern, ex:
"mydir/*.junctions").
-s SITES, --sites SITES
Set of splice sites to be considered as canonical
(comma-separated list of splice sites). Default:
GTAG,GCAG,ATAC.
-w WINDOW, --window WINDOW
Size of the window in the genomic DNA screened for
Adenine content downstream of TTS
--genename Use gene_name tag from GTF to define genes. Default:
gene_id used to define genes
-fl FL_COUNT, --fl_count FL_COUNT
Full-length PacBio abundance file
-v, --version Display program version number.
--isoAnnotLite Run isoAnnot Lite to output a tappAS-compatible gff3
file
--gff3 GFF3 Precomputed tappAS species specific GFF3 file. It will
serve as reference to transfer functional attributes
$ sqanti3_RulesFilter.py
R scripting front-end version 3.6.1 (2019-07-05)
usage: sqanti3_RulesFilter.py [-h] [--sam SAM] [--faa FAA] [-a INTRAPRIMING]
[-r RUNALENGTH] [-m MAX_DIST_TO_KNOWN_END]
[-c MIN_COV] [--filter_mono_exonic] [--skipGTF]
[--skipFaFq] [--skipJunction] [-v]
sqanti_class isoforms gtf_file
sqanti3_RulesFilter.py: error: the following arguments are required: sqanti_class, isoforms, gtf_file
(/local/cluster/SQANTI3)
# davised:Linux @ x86_64-conda_cos6-linux-gnu in ~ [22:56:08] C:2
$ sqanti3_RulesFilter.py -h
R scripting front-end version 3.6.1 (2019-07-05)
usage: sqanti3_RulesFilter.py [-h] [--sam SAM] [--faa FAA] [-a INTRAPRIMING]
[-r RUNALENGTH] [-m MAX_DIST_TO_KNOWN_END]
[-c MIN_COV] [--filter_mono_exonic] [--skipGTF]
[--skipFaFq] [--skipJunction] [-v]
sqanti_class isoforms gtf_file
Filtering of Isoforms based on SQANTI3 attributes
positional arguments:
sqanti_class SQANTI classification output file.
isoforms fasta/fastq isoform file to be filtered by SQANTI3
gtf_file GTF of the input fasta/fastq
optional arguments:
-h, --help show this help message and exit
--sam SAM (Optional) SAM alignment of the input fasta/fastq
--faa FAA (Optional) ORF prediction faa file to be filtered by
SQANTI3
-a INTRAPRIMING, --intrapriming INTRAPRIMING
Adenine percentage at genomic 3' end to flag an
isoform as intra-priming (default: 0.6)
-r RUNALENGTH, --runAlength RUNALENGTH
Continuous run-A length at genomic 3' end to flag an
isoform as intra-priming (default: 6)
-m MAX_DIST_TO_KNOWN_END, --max_dist_to_known_end MAX_DIST_TO_KNOWN_END
Maximum distance to an annotated 3' end to preserve as
a valid 3' end and not filter out (default: 50bp)
-c MIN_COV, --min_cov MIN_COV
Minimum junction coverage for each isoform (only used
if min_cov field is not 'NA'), default: 3
--filter_mono_exonic Filter out all mono-exonic transcripts (default: OFF)
--skipGTF Skip output of GTF
--skipFaFq Skip output of isoform fasta/fastq
--skipJunction Skip output of junctions file
-v, --version Display program version number.
|
software ref: https://github.com/ConesaLab/SQANTI3
research ref: https://doi.org/10.1101/gr.222976.117
research ref: https://github.com/ConesaLab/SQANTI3#how-to-cite-sqanti3