# Trinity 2.13.2 {{< admonition tip "Conda" true >}} See the 'activating the conda environment' section below to access this software. {{< /admonition >}} ## trinity-2.13.2 RNA-Seq De novo Assembly Using Trinity ![TrinityCompositeLogo](https://raw.githubusercontent.com/wiki/trinityrnaseq/trinityrnaseq/images/TrinityCompositeLogo.png) ### Quick Guide for the Impatient Trinity assembles transcript sequences from Illumina RNA-Seq data. Assemble RNA-Seq data like so: Trinity --seqType fq --left reads_1.fq --right reads_2.fq --CPU 6 --max_memory 20G Find assembled transcripts as: 'trinity_out_dir/Trinity.fasta' Use the documentation links in the right-sidebar to navigate this documentation, and contact our [Google group for technical support](#contact_us). ### Intro to Trinity Trinity, developed at the [Broad Institute](http://www.broadinstitute.org) and the [Hebrew University of Jerusalem](http://www.cs.huji.ac.il), represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so: - *Inchworm* assembles the RNA-seq data into the unique sequences of transcripts, often generating full-length transcripts for a dominant isoform, but then reports just the unique portions of alternatively spliced transcripts. - *Chrysalis* clusters the Inchworm contigs into clusters and constructs complete de Bruijn graphs for each cluster. Each cluster represents the full transcriptonal complexity for a given gene (or sets of genes that share sequences in common). Chrysalis then partitions the full read set among these disjoint graphs. - *Butterfly* then processes the individual graphs in parallel, tracing the paths that reads and pairs of reads take within the graph, ultimately reporting full-length transcripts for alternatively spliced isoforms, and teasing apart transcripts that corresponds to paralogous genes. ### Trinity Publications Trinity was published in [Nature Biotechnology](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3571712/). Our protocol for transcriptome assembly and downstream analysis is published in [Nature Protocols](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3875132/), although we always have the most current instructional material available here at the Trinity website. ------------------------------------------------------------------------------- ## Activating the conda environment In order to use the trinity software, check out a node with `qrsh` and run ```console bash source /local/cluster/trinity/activate.sh ``` To use trinity over SGE, include the `source` command above prior to your `Trinity` commands in a shell script. ## Location and version ```console $ which Trinity /local/cluster/trinity-2.13.2/bin/Trinity $ Trinity --version Trinity version: Trinity-v2.13.2 ** NOTE: Latest version of Trinity is Trinity-v2.14.0, and can be obtained at: https://github.com/trinityrnaseq/trinityrnaseq/releases ``` ## help message ```console $ Trinity --help ############################################################################### # ______ ____ ____ ____ ____ ______ __ __ | || \ | || \ | || || | | | || D ) | | | _ | | | | || | | |_| |_|| / | | | | | | | |_| |_|| ~ | | | | \ | | | | | | | | | |___, | | | | . \ | | | | | | | | | | | |__| |__|\_||____||__|__||____| |__| |____/ Trinity-v2.13.2 # # # Required: # # --seqType :type of reads: ('fa' or 'fq') # # --max_memory :suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc) # provided in Gb of RAM, ie. '--max_memory 10G' # # If paired reads: # --left :left reads, one or more file names (separated by commas, no spaces) # --right :right reads, one or more file names (separated by commas, no spaces) # # Or, if unpaired reads: # --single :single reads, one or more file names, comma-delimited (note, if single file contains pairs, can use flag: --run_as_paired ) # # Or, # --samples_file tab-delimited text file indicating biological replicate relationships. # ex. # cond_A cond_A_rep1 A_rep1_left.fq A_rep1_right.fq # cond_A cond_A_rep2 A_rep2_left.fq A_rep2_right.fq # cond_B cond_B_rep1 B_rep1_left.fq B_rep1_right.fq # cond_B cond_B_rep2 B_rep2_left.fq B_rep2_right.fq # # # if single-end instead of paired-end, then leave the 4th column above empty. # #################################### ## Misc: ######################### # # --include_supertranscripts :yield supertranscripts fasta and gtf files as outputs. # # --SS_lib_type :Strand-specific RNA-Seq read orientation. # if paired: RF or FR, # if single: F or R. (dUTP method = RF) # See web documentation. # # --CPU :number of CPUs to use, default: 2 # --min_contig_length :minimum assembled contig length to report # (def=200) # # --long_reads :fasta file containing error-corrected or circular consensus (CCS) pac bio reads # (** note: experimental parameter **, this functionality continues to be under development) # # --genome_guided_bam :genome guided mode, provide path to coordinate-sorted bam file. # (see genome-guided param section under --show_full_usage_info) # # --long_reads_bam :long reads to include for genome-guided Trinity # (bam file consists of error-corrected or circular consensus (CCS) pac bio read aligned to the genome) # # --jaccard_clip :option, set if you have paired reads and # you expect high gene density with UTR # overlap (use FASTQ input file format # for reads). # (note: jaccard_clip is an expensive # operation, so avoid using it unless # necessary due to finding excessive fusion # transcripts w/o it.) # # --trimmomatic :run Trimmomatic to quality trim reads # see '--quality_trimming_params' under full usage info for tailored settings. # # --output :name of directory for output (will be # created if it doesn't already exist) # default( your current working directory: "/nfs4/core/scratch/davised/code/trinityrnaseq.wiki/trinity_out_dir" # note: must include 'trinity' in the name as a safety precaution! ) # # --full_cleanup :only retain the Trinity fasta file, rename as ${output_dir}.Trinity.fasta # # --cite :show the Trinity literature citation # # --verbose :provide additional job status info during the run. # # --version :reports Trinity version (Trinity-v2.13.2) and exits. # # --show_full_usage_info :show the many many more options available for running Trinity (expert usage). # # ############################################################################### # # *Note, a typical Trinity command might be: # # Trinity --seqType fq --max_memory 50G --left reads_1.fq --right reads_2.fq --CPU 6 # # (if you have multiple samples, use --samples_file ... see above for details) # # and for Genome-guided Trinity, provide a coordinate-sorted bam: # # Trinity --genome_guided_bam rnaseq_alignments.csorted.bam --max_memory 50G # --genome_guided_max_intron 10000 --CPU 6 # # see: /local/cluster/trinity-2.13.2/opt/trinity-2.13.2/sample_data/test_Trinity_Assembly/ # for sample data and 'runMe.sh' for example Trinity execution # # For more details, visit: http://trinityrnaseq.github.io # ############################################################################### ``` software ref: software ref: research ref: