# Trinity 2.13.2


{{< admonition tip "Conda" true >}}
See the 'activating the conda environment' section below to access this
software.
{{< /admonition >}}

## trinity-2.13.2 RNA-Seq De novo Assembly Using Trinity

![TrinityCompositeLogo](https://raw.githubusercontent.com/wiki/trinityrnaseq/trinityrnaseq/images/TrinityCompositeLogo.png)

### Quick Guide for the Impatient

Trinity assembles transcript sequences from Illumina RNA-Seq data.

Assemble RNA-Seq data like so:

     Trinity --seqType fq --left reads_1.fq --right reads_2.fq --CPU 6 --max_memory 20G

Find assembled transcripts as:  'trinity_out_dir/Trinity.fasta'

Use the documentation links in the right-sidebar to navigate this
documentation, and contact our [Google group for technical
support](#contact_us).

### Intro to Trinity

Trinity, developed at the [Broad Institute](http://www.broadinstitute.org) and
the [Hebrew University of Jerusalem](http://www.cs.huji.ac.il), represents a
novel method for the efficient and robust de novo reconstruction of
transcriptomes from RNA-seq data. Trinity combines three independent software
modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process
large volumes of RNA-seq reads. Trinity partitions the sequence data into many
individual de Bruijn graphs, each representing the transcriptional complexity
at a given gene or locus, and then processes each graph independently to
extract full-length splicing isoforms and to tease apart transcripts derived
from paralogous genes.  Briefly, the process works like so:

- *Inchworm* assembles the RNA-seq data into the unique sequences of
  transcripts, often generating full-length transcripts for a dominant
  isoform, but then reports just the unique portions of alternatively spliced
  transcripts.

- *Chrysalis* clusters the Inchworm contigs into clusters and constructs
  complete de Bruijn graphs for each cluster.  Each cluster represents the
  full transcriptonal complexity for a given gene (or sets of genes that share
  sequences in common).  Chrysalis then partitions the full read set among
  these disjoint graphs.

- *Butterfly* then processes the individual graphs in parallel, tracing the
  paths that reads and pairs of reads take within the graph, ultimately
  reporting full-length transcripts for alternatively spliced isoforms, and
  teasing apart transcripts that corresponds to paralogous genes.

### Trinity Publications

Trinity was published in [Nature
Biotechnology](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3571712/).  Our
protocol for transcriptome assembly and downstream analysis is published in
[Nature Protocols](http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3875132/),
although we always have the most current instructional material available here
at the Trinity website.

-------------------------------------------------------------------------------

## Activating the conda environment

In order to use the trinity software, check out a node with `qrsh` and run

```console
bash
source /local/cluster/trinity/activate.sh
```

To use trinity over SGE, include the `source` command above prior to your
`Trinity` commands in a shell script.

## Location and version

```console
$ which Trinity
/local/cluster/trinity-2.13.2/bin/Trinity
$ Trinity --version
Trinity version: Trinity-v2.13.2
** NOTE: Latest version of Trinity is Trinity-v2.14.0, and can be obtained at:
	https://github.com/trinityrnaseq/trinityrnaseq/releases
```

## help message

```console
$ Trinity --help


###############################################################################
#

     ______  ____   ____  ____   ____  ______  __ __
    |      ||    \ |    ||    \ |    ||      ||  |  |
    |      ||  D  ) |  | |  _  | |  | |      ||  |  |
    |_|  |_||    /  |  | |  |  | |  | |_|  |_||  ~  |
      |  |  |    \  |  | |  |  | |  |   |  |  |___, |
      |  |  |  .  \ |  | |  |  | |  |   |  |  |     |
      |__|  |__|\_||____||__|__||____|  |__|  |____/

    Trinity-v2.13.2


#
#
# Required:
#
#  --seqType <string>      :type of reads: ('fa' or 'fq')
#
#  --max_memory <string>      :suggested max memory to use by Trinity where limiting can be enabled. (jellyfish, sorting, etc)
#                            provided in Gb of RAM, ie.  '--max_memory 10G'
#
#  If paired reads:
#      --left  <string>    :left reads, one or more file names (separated by commas, no spaces)
#      --right <string>    :right reads, one or more file names (separated by commas, no spaces)
#
#  Or, if unpaired reads:
#      --single <string>   :single reads, one or more file names, comma-delimited (note, if single file contains pairs, can use flag: --run_as_paired )
#
#  Or,
#      --samples_file <string>         tab-delimited text file indicating biological replicate relationships.
#                                   ex.
#                                        cond_A    cond_A_rep1    A_rep1_left.fq    A_rep1_right.fq
#                                        cond_A    cond_A_rep2    A_rep2_left.fq    A_rep2_right.fq
#                                        cond_B    cond_B_rep1    B_rep1_left.fq    B_rep1_right.fq
#                                        cond_B    cond_B_rep2    B_rep2_left.fq    B_rep2_right.fq
#
#                      # if single-end instead of paired-end, then leave the 4th column above empty.
#
####################################
##  Misc:  #########################
#
#  --include_supertranscripts      :yield supertranscripts fasta and gtf files as outputs.
#
#  --SS_lib_type <string>          :Strand-specific RNA-Seq read orientation.
#                                   if paired: RF or FR,
#                                   if single: F or R.   (dUTP method = RF)
#                                   See web documentation.
#
#  --CPU <int>                     :number of CPUs to use, default: 2
#  --min_contig_length <int>       :minimum assembled contig length to report
#                                   (def=200)
#
#  --long_reads <string>           :fasta file containing error-corrected or circular consensus (CCS) pac bio reads
#                                   (** note: experimental parameter **, this functionality continues to be under development)
#
#  --genome_guided_bam <string>    :genome guided mode, provide path to coordinate-sorted bam file.
#                                   (see genome-guided param section under --show_full_usage_info)
#
#  --long_reads_bam <string>       :long reads to include for genome-guided Trinity
#                                  (bam file consists of error-corrected or circular consensus (CCS) pac bio read aligned to the genome)
#
#  --jaccard_clip                  :option, set if you have paired reads and
#                                   you expect high gene density with UTR
#                                   overlap (use FASTQ input file format
#                                   for reads).
#                                   (note: jaccard_clip is an expensive
#                                   operation, so avoid using it unless
#                                   necessary due to finding excessive fusion
#                                   transcripts w/o it.)
#
#  --trimmomatic                   :run Trimmomatic to quality trim reads
#                                        see '--quality_trimming_params' under full usage info for tailored settings.
#
#  --output <string>               :name of directory for output (will be
#                                   created if it doesn't already exist)
#                                   default( your current working directory: "/nfs4/core/scratch/davised/code/trinityrnaseq.wiki/trinity_out_dir"
#                                    note: must include 'trinity' in the name as a safety precaution! )
#
#  --full_cleanup                  :only retain the Trinity fasta file, rename as ${output_dir}.Trinity.fasta
#
#  --cite                          :show the Trinity literature citation
#
#  --verbose                       :provide additional job status info during the run.
#
#  --version                       :reports Trinity version (Trinity-v2.13.2) and exits.
#
#  --show_full_usage_info          :show the many many more options available for running Trinity (expert usage).
#
#
###############################################################################
#
#  *Note, a typical Trinity command might be:
#
#        Trinity --seqType fq --max_memory 50G --left reads_1.fq  --right reads_2.fq --CPU 6
#
#            (if you have multiple samples, use --samples_file ... see above for details)
#
#    and for Genome-guided Trinity, provide a coordinate-sorted bam:
#
#        Trinity --genome_guided_bam rnaseq_alignments.csorted.bam --max_memory 50G
#                --genome_guided_max_intron 10000 --CPU 6
#
#     see: /local/cluster/trinity-2.13.2/opt/trinity-2.13.2/sample_data/test_Trinity_Assembly/
#          for sample data and 'runMe.sh' for example Trinity execution
#
#     For more details, visit: http://trinityrnaseq.github.io
#
###############################################################################
```

software ref: <https://github.com/trinityrnaseq/trinityrnaseq>  
software ref: <https://github.com/trinityrnaseq/trinityrnaseq/wiki>
research ref: <https://doi.org/10.1038/nbt.1883>