# STAR 2.7.9a ## STAR 2.7.9a - Spliced Transcripts Alignment to a Reference ### Abstract **Motivation:** Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. **Results:** To align our large (>80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of >50 in mapping speed, aligning to the human genome 550 million 2 × 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80-90% success rate, corroborating the high precision of the STAR mapping strategy. Location and version: ```console $ which STAR /local/cluster/bin/STAR $ STAR --version 2.7.9a ``` help message (abbreviated): ```console $ STAR --help Usage: STAR [options]... --genomeDir /path/to/genome/index/ --readFilesIn R1.fq R2.fq Spliced Transcripts Alignment to a Reference (c) Alexander Dobin, 2009-2020 STAR version=2.7.9a STAR compilation time,server,dir=2021-05-04T09:43:56-0400 vega:/home/dobin/data/STAR/STARcode/STAR.master/source For more details see: ### versions versionGenome 2.7.4a string: earliest genome index version compatible with this STAR release. Please do not change this value! ### Parameter Files parametersFiles - string: name of a user-defined parameters file, "-": none. Can only be defined on the command line. ### System sysShell - string: path to the shell binary, preferably bash, e.g. /bin/bash. - ... the default shell is executed, typically /bin/sh. This was reported to fail on some Ubuntu systems - then you need to specify path to bash. ### Run Parameters runMode alignReads string: type of the run. alignReads ... map reads genomeGenerate ... generate genome files inputAlignmentsFromBAM ... input alignments from BAM. Presently only works with --outWigType and --bamRemoveDuplicates. liftOver ... lift-over of GTF files (--sjdbGTFfile) between genome assemblies using chain file(s) from --genomeChainFiles. soloCellFiltering ... STARsolo cell filtering ("calling") without remapping, followed by the path to raw count directory and output (filtered) prefix ``` STAR is useful for spliced alignments of RNA-Seq reads to an assembled genome. software ref: research ref: