1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
|
$ STAR --help
Usage: STAR [options]... --genomeDir /path/to/genome/index/ --readFilesIn R1.fq R2.fq
Spliced Transcripts Alignment to a Reference (c) Alexander Dobin, 2009-2020
STAR version=2.7.10a
STAR compilation time,server,dir=2022-01-14T18:50:00-05:00 :/home/dobin/data/STAR/STARcode/STAR.master/source
For more details see:
<https://github.com/alexdobin/STAR>
<https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf>
### versions
versionGenome 2.7.4a
string: earliest genome index version compatible with this STAR release. Please do not change this value!
### Parameter Files
parametersFiles -
string: name of a user-defined parameters file, "-": none. Can only be defined on the command line.
### System
sysShell -
string: path to the shell binary, preferably bash, e.g. /bin/bash.
- ... the default shell is executed, typically /bin/sh. This was reported to fail on some Ubuntu systems - then you need to specify path to bash.
### Run Parameters
runMode alignReads
string: type of the run.
alignReads ... map reads
genomeGenerate ... generate genome files
inputAlignmentsFromBAM ... input alignments from BAM. Presently only works with --outWigType and --bamRemoveDuplicates options.
liftOver ... lift-over of GTF files (--sjdbGTFfile) between genome assemblies using chain file(s) from --genomeChainFiles.
soloCellFiltering </path/to/raw/count/dir/> </path/to/output/prefix> ... STARsolo cell filtering ("calling") without remapping, followed by the path to raw count directory and output (filtered) prefix
runThreadN 1
int: number of threads to run STAR
runDirPerm User_RWX
string: permissions for the directories created at the run-time.
User_RWX ... user-read/write/execute
All_RWX ... all-read/write/execute (same as chmod 777)
runRNGseed 777
int: random number generator seed.
### Genome Parameters
genomeDir ./GenomeDir/
string: path to the directory where genome files are stored (for --runMode alignReads) or will be generated (for --runMode generateGenome)
genomeLoad NoSharedMemory
string: mode of shared memory usage for the genome files. Only used with --runMode alignReads.
LoadAndKeep ... load genome into shared and keep it in memory after run
LoadAndRemove ... load genome into shared but remove it after run
LoadAndExit ... load genome into shared memory and exit, keeping the genome in memory for future runs
Remove ... do not map anything, just remove loaded genome from memory
NoSharedMemory ... do not use shared memory, each job will have its own private copy ofthe genome
genomeFastaFiles -
string(s): path(s) to the fasta files with the genome sequences, separated by spaces. These files should be plain text FASTA files, they *cannot* be zipped.
Required for the genome generation (--runMode genomeGenerate). Can also be used in themapping (--runMode alignReads) to add extra (new) sequences to the genome (e.g. spike-ins).
genomeChainFiles -
string: chain files for genomic liftover. Only used with --runMode liftOver .
genomeFileSizes 0
uint(s)>0: genome files exact sizes in bytes. Typically, this should not be defined by the user.
genomeTransformOutput None
string(s) which output to transform back to original genome
SAM ... SAM/BAM alignments
SJ ... splice junctions (SJ.out.tab)
None ... no transformation of the output
genomeChrSetMitochondrial chrM M MT
string(s) names of the mitochondrial chromosomes. Presently only used for STARsolo statisics output/
### Genome Indexing Parameters - only used with --runMode genomeGenerate
genomeChrBinNbits 18
int: =log2(chrBin), where chrBin is the size of the bins for genome storage: each chromosome will occupy an integer number of bins. For a genome with large number of contigs, it is recommended to scale this parameter as min(18, log2[max(GenomeLength/NumberOfReferences,ReadLength)]).
genomeSAindexNbases 14
int: length (bases) of the SA pre-indexing string. Typically between 10 and 15. Longer strings will use much more memory, but allow faster searches. For small genomes, the parameter --genomeSAindexNbases must be scaled down to min(14, log2(GenomeLength)/2 - 1).
genomeSAsparseD 1
int>0: suffux array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction
genomeSuffixLengthMax -1
int: maximum length of the suffixes, has to be longer than read length. -1 = infinite.
genomeTransformType None
string: type of genome transformation
None ... no transformation
Haploid ... replace reference alleles with alternative alleles from VCF file (e.g. consensus allele)
Diploid ... create two haplotypes for each chromosome listed in VCF file, for genotypes 1|2, assumes perfect phasing (e.g. personal genome)
genomeTransformVCF -
string: path to VCF file for genome transformation
#####UnderDevelopment_begin : not supported - do not use
genomeType Full
string: type of genome to generate
Full ... full (normal) genome
Transcriptome ... genome consists of transcript sequences
SuperTransriptome ... genome consists of superTranscript sequences
#####UnderDevelopment_end
# DEPRECATED: please use --genomeTransformVCF and --genomeTransformType options instead.
#genomeConsensusFile -
# string: VCF file with consensus SNPs (i.e. alternative allele is the major (AF>0.5) allele)
# DEPRECATED
### Splice Junctions Database
sjdbFileChrStartEnd -
string(s): path to the files with genomic coordinates (chr <tab> start <tab> end <tab> strand) for the splice junction introns. Multiple files can be supplied wand will be concatenated.
sjdbGTFfile -
string: path to the GTF file with annotations
sjdbGTFchrPrefix -
string: prefix for chromosome names in a GTF file (e.g. 'chr' for using ENSMEBL annotations with UCSC genomes)
sjdbGTFfeatureExon exon
string: feature type in GTF file to be used as exons for building transcripts
sjdbGTFtagExonParentTranscript transcript_id
string: GTF attribute name for parent transcript ID (default "transcript_id" works for GTF files)
sjdbGTFtagExonParentGene gene_id
string: GTF attribute name for parent gene ID (default "gene_id" works for GTF files)
sjdbGTFtagExonParentGeneName gene_name
string(s): GTF attrbute name for parent gene name
sjdbGTFtagExonParentGeneType gene_type gene_biotype
string(s): GTF attrbute name for parent gene type
sjdbOverhang 100
int>0: length of the donor/acceptor sequence on each side of the junctions, ideally = (mate_length - 1)
sjdbScore 2
int: extra alignment score for alignments that cross database junctions
sjdbInsertSave Basic
string: which files to save when sjdb junctions are inserted on the fly at the mapping step
Basic ... only small junction / transcript files
All ... all files including big Genome, SA and SAindex - this will create a complete genome directory
### Variation parameters
varVCFfile -
string: path to the VCF file that contains variation data. The 10th column should contain the genotype information, e.g. 0/1
### Input Files
inputBAMfile -
string: path to BAM input file, to be used with --runMode inputAlignmentsFromBAM
### Read Parameters
readFilesType Fastx
string: format of input read files
Fastx ... FASTA or FASTQ
SAM SE ... SAM or BAM single-end reads; for BAM use --readFilesCommand samtools view
SAM PE ... SAM or BAM paired-end reads; for BAM use --readFilesCommand samtools view
readFilesSAMattrKeep All
string(s): for --readFilesType SAM SE/PE, which SAM tags to keep in the output BAM, e.g.: --readFilesSAMtagsKeep RG PL
All ... keep all tags
None ... do not keep any tags
readFilesIn Read1 Read2
string(s): paths to files that contain input read1 (and, if needed, read2)
readFilesManifest -
string: path to the "manifest" file with the names of read files. The manifest file should contain 3 tab-separated columns:
paired-end reads: read1_file_name $tab$ read2_file_name $tab$ read_group_line.
single-end reads: read1_file_name $tab$ - $tab$ read_group_line.
Spaces, but not tabs are allowed in file names.
If read_group_line does not start with ID:, it can only contain one ID field, and ID: will be added toit.
If read_group_line starts with ID:, it can contain several fields separated by $tab$, and all fields will be be copied verbatim into SAM @RG header line.
readFilesPrefix -
string: prefix for the read files names, i.e. it will be added in front of the strings in --readFilesIn
readFilesCommand -
string(s): command line to execute for each of the input file. This command should generate FASTA or FASTQ text and send it to stdout
For example: zcat - to uncompress .gz files, bzcat - to uncompress .bz2 files, etc.
readMapNumber -1
int: number of reads to map from the beginning of the file
-1: map all reads
readMatesLengthsIn NotEqual
string: Equal/NotEqual - lengths of names,sequences,qualities for both mates are the same / not the same. NotEqual is safe in all situations.
readNameSeparator /
string(s): character(s) separating the part of the read names that will be trimmed in output (read name after space is always trimmed)
readQualityScoreBase 33
int>=0: number to be subtracted from the ASCII code to get Phred quality score
### Read Clipping
clipAdapterType Hamming
string: adapter clipping type
Hamming ... adapter clipping based on Hamming distance, with the number of mismatches controlled by --clip5pAdapterMMp
CellRanger4 ... 5p and 3p adapter clipping similar to CellRanger4. Utilizes Opal package by Martin Šošić: https://github.com/Martinsos/opal
None ... no adapter clipping, all other clip* parameters are disregarded
clip3pNbases 0
int(s): number(s) of bases to clip from 3p of each mate. If one value is given, it will be assumed the same for both mates.
clip3pAdapterSeq -
string(s): adapter sequences to clip from 3p of each mate. If one value is given, it will be assumed the samefor both mates.
polyA ... polyA sequence with the length equal to read length
clip3pAdapterMMp 0.1
double(s): max proportion of mismatches for 3p adapter clipping for each mate. If one value is given, it willbe assumed the same for both mates.
clip3pAfterAdapterNbases 0
int(s): number of bases to clip from 3p of each mate after the adapter clipping. If one value is given, it will be assumed the same for both mates.
clip5pNbases 0
int(s): number(s) of bases to clip from 5p of each mate. If one value is given, it will be assumed the same for both mates.
#####UnderDevelopment_begin : not supported - do not use
clip5pAdapterSeq -
string(s): adapter sequences to clip from 5p of each mate, separated by space.
clip5pAdapterMMp 0.1
double(s): max proportion of mismatches for 5p adapter clipping for each mate, separated by space
clip5pAfterAdapterNbases 0
int(s): number of bases to clip from 5p of each mate after the adapter clipping, separated by space.
#####UnderDevelopment_end
### Limits
limitGenomeGenerateRAM 31000000000
int>0: maximum available RAM (bytes) for genome generation
limitIObufferSize 30000000 50000000
int>0: max available buffers size (bytes) for input/output, per thread
limitOutSAMoneReadBytes 100000
int>0: max size of the SAM record (bytes) for one read. Recommended value: >(2*(LengthMate1+LengthMate2+100)*outFilterMultimapNmax
limitOutSJoneRead 1000
int>0: max number of junctions for one read (including all multi-mappers)
limitOutSJcollapsed 1000000
int>0: max number of collapsed junctions
limitBAMsortRAM 0
int>=0: maximum available RAM (bytes) for sorting BAM. If =0, it will be set to the genome index size. 0 valuecan only be used with --genomeLoad NoSharedMemory option.
limitSjdbInsertNsj 1000000
int>=0: maximum number of junction to be inserted to the genome on the fly at the mapping stage, including those from annotations and those detected in the 1st step of the 2-pass run
limitNreadsSoft -1
int: soft limit on the number of reads
### Output: general
outFileNamePrefix ./
string: output files name prefix (including full or relative path). Can only be defined on the command line.
outTmpDir -
string: path to a directory that will be used as temporary by STAR. All contents of this directory will be removed!
- the temp directory will default to outFileNamePrefix_STARtmp
outTmpKeep None
string: whether to keep the tempporary files after STAR runs is finished
None ... remove all temporary files
All .. keep all files
outStd Log
string: which output will be directed to stdout (standard out)
Log ... log messages
SAM ... alignments in SAM format (which normally are output to Aligned.out.sam file), normal standard output will go into Log.std.out
BAM_Unsorted ... alignments in BAM format, unsorted. Requires --outSAMtype BAM Unsorted
BAM_SortedByCoordinate ... alignments in BAM format, sorted by coordinate. Requires --outSAMtype BAM SortedByCoordinate
BAM_Quant ... alignments to transcriptome in BAM format, unsorted. Requires --quantMode TranscriptomeSAM
outReadsUnmapped None
string: output of unmapped and partially mapped (i.e. mapped only one mate of a paired end read) reads in separate file(s).
None ... no output
Fastx ... output in separate fasta/fastq files, Unmapped.out.mate1/2
outQSconversionAdd 0
int: add this number to the quality score (e.g. to convert from Illumina to Sanger, use -31)
outMultimapperOrder Old_2.4
string: order of multimapping alignments in the output files
Old_2.4 ... quasi-random order used before 2.5.0
Random ... random order of alignments for each multi-mapper. Read mates (pairs) are always adjacent, all alignment for each read stay together. This option will become default in the future releases.
### Output: SAM and BAM
outSAMtype SAM
strings: type of SAM/BAM output
1st word:
BAM ... output BAM without sorting
SAM ... output SAM without sorting
None ... no SAM/BAM output
2nd, 3rd:
Unsorted ... standard unsorted
SortedByCoordinate ... sorted by coordinate. This option will allocate extra memory for sorting which can be specified by --limitBAMsortRAM.
outSAMmode Full
string: mode of SAM output
None ... no SAM output
Full ... full SAM output
NoQS ... full SAM but without quality scores
outSAMstrandField None
string: Cufflinks-like strand field flag
None ... not used
intronMotif ... strand derived from the intron motif. This option changes the output alignments: reads with inconsistent and/or non-canonical introns are filtered out.
outSAMattributes Standard
string: a string of desired SAM attributes, in the order desired for the output SAM. Tags can be listed in anycombination/order.
***Presets:
None ... no attributes
Standard ... NH HI AS nM
All ... NH HI AS nM NM MD jM jI MC ch
***Alignment:
NH ... number of loci the reads maps to: =1 for unique mappers, >1 for multimappers. Standard SAM tag.
HI ... multiple alignment index, starts with --outSAMattrIHstart (=1 by default). Standard SAM tag.
AS ... local alignment score, +1/-1 for matches/mismateches, score* penalties for indels and gaps. For PE reads, total score for two mates. Stadnard SAM tag.
nM ... number of mismatches. For PE reads, sum over two mates.
NM ... edit distance to the reference (number of mismatched + inserted + deleted bases) for each mate. Standard SAM tag.
MD ... string encoding mismatched and deleted reference bases (see standard SAM specifications). Standard SAM tag.
jM ... intron motifs for all junctions (i.e. N in CIGAR): 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT. If splice junctions database is used, and a junctionis annotated, 20 is added to its motif value.
jI ... start and end of introns for all junctions (1-based).
XS ... alignment strand according to --outSAMstrandField.
MC ... mate's CIGAR string. Standard SAM tag.
ch ... marks all segment of all chimeric alingments for --chimOutType WithinBAM output.
cN ... number of bases clipped from the read ends: 5' and 3'
***Variation:
vA ... variant allele
vG ... genomic coordinate of the variant overlapped by the read.
vW ... 1 - alignment passes WASP filtering; 2,3,4,5,6,7 - alignment does not pass WASP filtering. Requires --waspOutputMode SAMtag.
***STARsolo:
CR CY UR UY ... sequences and quality scores of cell barcodes and UMIs for the solo* demultiplexing.
GX GN ... gene ID and gene name for unique-gene reads.
gx gn ... gene IDs and gene names for unique- and multi-gene reads.
CB UB ... error-corrected cell barcodes and UMIs for solo* demultiplexing. Requires --outSAMtype BAM SortedByCoordinate.
sM ... assessment of CB and UMI.
sS ... sequence of the entire barcode (CB,UMI,adapter).
sQ ... quality of the entire barcode.
***Unsupported/undocumented:
ha ... haplotype (1/2) when mapping to the diploid genome. Requires genome generated with --genomeTransformType Diploid .
rB ... alignment block read/genomic coordinates.
vR ... read coordinate of the variant.
outSAMattrIHstart 1
int>=0: start value for the IH attribute. 0 may be required by some downstream software, such as Cufflinks or StringTie.
outSAMunmapped None
string(s): output of unmapped reads in the SAM format
1st word:
None ... no output
Within ... output unmapped reads within the main SAM file (i.e. Aligned.out.sam)
2nd word:
KeepPairs ... record unmapped mate for each alignment, and, in case of unsorted output, keep it adjacent to its mapped mate. Only affects multi-mapping reads.
outSAMorder Paired
string: type of sorting for the SAM output
Paired: one mate after the other for all paired alignments
PairedKeepInputOrder: one mate after the other for all paired alignments, the order is kept the same as in the input FASTQ files
outSAMprimaryFlag OneBestScore
string: which alignments are considered primary - all others will be marked with 0x100 bit in the FLAG
OneBestScore ... only one alignment with the best score is primary
AllBestScore ... all alignments with the best score are primary
outSAMreadID Standard
string: read ID record type
Standard ... first word (until space) from the FASTx read ID line, removing /1,/2 from the end
Number ... read number (index) in the FASTx file
outSAMmapqUnique 255
int: 0 to 255: the MAPQ value for unique mappers
outSAMflagOR 0
int: 0 to 65535: sam FLAG will be bitwise OR'd with this value, i.e. FLAG=FLAG | outSAMflagOR. This is appliedafter all flags have been set by STAR, and after outSAMflagAND. Can be used to set specific bits that are not set otherwise.
outSAMflagAND 65535
int: 0 to 65535: sam FLAG will be bitwise AND'd with this value, i.e. FLAG=FLAG & outSAMflagOR. This is applied after all flags have been set by STAR, but before outSAMflagOR. Can be used to unset specific bits that are not set otherwise.
outSAMattrRGline -
string(s): SAM/BAM read group line. The first word contains the read group identifier and must start with "ID:", e.g. --outSAMattrRGline ID:xxx CN:yy "DS:z z z".
xxx will be added as RG tag to each output alignment. Any spaces in the tag values have to be double quoted.
Comma separated RG lines correspons to different (comma separated) input files in --readFilesIn. Commas have to be surrounded by spaces, e.g.
--outSAMattrRGline ID:xxx , ID:zzz "DS:z z" , ID:yyy DS:yyyy
outSAMheaderHD -
strings: @HD (header) line of the SAM header
outSAMheaderPG -
strings: extra @PG (software) line of the SAM header (in addition to STAR)
outSAMheaderCommentFile -
string: path to the file with @CO (comment) lines of the SAM header
outSAMfilter None
string(s): filter the output into main SAM/BAM files
KeepOnlyAddedReferences ... only keep the reads for which all alignments are to the extra reference sequences added with --genomeFastaFiles at the mapping stage.
KeepAllAddedReferences ... keep all alignments to the extra reference sequences added with --genomeFastaFiles at the mapping stage.
outSAMmultNmax -1
int: max number of multiple alignments for a read that will be output to the SAM/BAM files. Note that if this value is not equal to -1, the top scoring alignment will be output first
-1 ... all alignments (up to --outFilterMultimapNmax) will be output
outSAMtlen 1
int: calculation method for the TLEN field in the SAM/BAM files
1 ... leftmost base of the (+)strand mate to rightmost base of the (-)mate. (+)sign for the (+)strand mate
2 ... leftmost base of any mate to rightmost base of any mate. (+)sign for the mate with the leftmost base. This is different from 1 for overlapping mates with protruding ends
outBAMcompression 1
int: -1 to 10 BAM compression level, -1=default compression (6?), 0=no compression, 10=maximum compression
outBAMsortingThreadN 0
int: >=0: number of threads for BAM sorting. 0 will default to min(6,--runThreadN).
outBAMsortingBinsN 50
int: >0: number of genome bins fo coordinate-sorting
### BAM processing
bamRemoveDuplicatesType -
string: mark duplicates in the BAM file, for now only works with (i) sorted BAM fed with inputBAMfile, and (ii) for paired-end alignments only
- ... no duplicate removal/marking
UniqueIdentical ... mark all multimappers, and duplicate unique mappers. The coordinates, FLAG, CIGAR must be identical
UniqueIdenticalNotMulti ... mark duplicate unique mappers but not multimappers.
bamRemoveDuplicatesMate2basesN 0
int>0: number of bases from the 5' of mate 2 to use in collapsing (e.g. for RAMPAGE)
### Output Wiggle
outWigType None
string(s): type of signal output, e.g. "bedGraph" OR "bedGraph read1_5p". Requires sorted BAM: --outSAMtype BAM SortedByCoordinate .
1st word:
None ... no signal output
bedGraph ... bedGraph format
wiggle ... wiggle format
2nd word:
read1_5p ... signal from only 5' of the 1st read, useful for CAGE/RAMPAGE etc
read2 ... signal from only 2nd read
outWigStrand Stranded
string: strandedness of wiggle/bedGraph output
Stranded ... separate strands, str1 and str2
Unstranded ... collapsed strands
outWigReferencesPrefix -
string: prefix matching reference names to include in the output wiggle file, e.g. "chr", default "-" - include all references
outWigNorm RPM
string: type of normalization for the signal
RPM ... reads per million of mapped reads
None ... no normalization, "raw" counts
### Output Filtering
outFilterType Normal
string: type of filtering
Normal ... standard filtering using only current alignment
BySJout ... keep only those reads that contain junctions that passed filtering into SJ.out.tab
outFilterMultimapScoreRange 1
int: the score range below the maximum score for multimapping alignments
outFilterMultimapNmax 10
int: maximum number of loci the read is allowed to map to. Alignments (all of them) will be output only if theread maps to no more loci than this value.
Otherwise no alignments will be output, and the read will be counted as "mapped to too many loci" in the Log.final.out .
outFilterMismatchNmax 10
int: alignment will be output only if it has no more mismatches than this value.
outFilterMismatchNoverLmax 0.3
real: alignment will be output only if its ratio of mismatches to *mapped* length is less than or equal to this value.
outFilterMismatchNoverReadLmax 1.0
real: alignment will be output only if its ratio of mismatches to *read* length is less than or equal to this value.
outFilterScoreMin 0
int: alignment will be output only if its score is higher than or equal to this value.
outFilterScoreMinOverLread 0.66
real: same as outFilterScoreMin, but normalized to read length (sum of mates' lengths for paired-end reads)
outFilterMatchNmin 0
int: alignment will be output only if the number of matched bases is higher than or equal to this value.
outFilterMatchNminOverLread 0.66
real: sam as outFilterMatchNmin, but normalized to the read length (sum of mates' lengths for paired-end reads).
outFilterIntronMotifs None
string: filter alignment using their motifs
None ... no filtering
RemoveNoncanonical ... filter out alignments that contain non-canonical junctions
RemoveNoncanonicalUnannotated ... filter out alignments that contain non-canonical unannotated junctions when using annotated splice junctions database. The annotated non-canonical junctions will be kept.
outFilterIntronStrands RemoveInconsistentStrands
string: filter alignments
RemoveInconsistentStrands ... remove alignments that have junctions with inconsistent strands
None ... no filtering
### Output splice junctions (SJ.out.tab)
outSJtype Standard
string: type of splice junction output
Standard ... standard SJ.out.tab output
None ... no splice junction output
### Output Filtering: Splice Junctions
outSJfilterReads All
string: which reads to consider for collapsed splice junctions output
All ... all reads, unique- and multi-mappers
Unique ... uniquely mapping reads only
outSJfilterOverhangMin 30 12 12 12
4 integers: minimum overhang length for splice junctions on both sides for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif
does not apply to annotated junctions
outSJfilterCountUniqueMin 3 1 1 1
4 integers: minimum uniquely mapping read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif
Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied
does not apply to annotated junctions
outSJfilterCountTotalMin 3 1 1 1
4 integers: minimum total (multi-mapping+unique) read count per junction for: (1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif. -1 means no output for that motif
Junctions are output if one of outSJfilterCountUniqueMin OR outSJfilterCountTotalMin conditions are satisfied
does not apply to annotated junctions
outSJfilterDistToOtherSJmin 10 0 5 10
4 integers>=0: minimum allowed distance to other junctions' donor/acceptor
does not apply to annotated junctions
outSJfilterIntronMaxVsReadN 50000 100000 200000
N integers>=0: maximum gap allowed for junctions supported by 1,2,3,,,N reads
i.e. by default junctions supported by 1 read can have gaps <=50000b, by 2 reads: <=100000b, by 3 reads: <=200000. by >=4 reads any gap <=alignIntronMax
does not apply to annotated junctions
### Scoring
scoreGap 0
int: splice junction penalty (independent on intron motif)
scoreGapNoncan -8
int: non-canonical junction penalty (in addition to scoreGap)
scoreGapGCAG -4
GC/AG and CT/GC junction penalty (in addition to scoreGap)
scoreGapATAC -8
AT/AC and GT/AT junction penalty (in addition to scoreGap)
scoreGenomicLengthLog2scale -0.25
extra score logarithmically scaled with genomic length of the alignment: scoreGenomicLengthLog2scale*log2(genomicLength)
scoreDelOpen -2
deletion open penalty
scoreDelBase -2
deletion extension penalty per base (in addition to scoreDelOpen)
scoreInsOpen -2
insertion open penalty
scoreInsBase -2
insertion extension penalty per base (in addition to scoreInsOpen)
scoreStitchSJshift 1
maximum score reduction while searching for SJ boundaries in the stitching step
### Alignments and Seeding
seedSearchStartLmax 50
int>0: defines the search start point through the read - the read is split into pieces no longer than this value
seedSearchStartLmaxOverLread 1.0
real: seedSearchStartLmax normalized to read length (sum of mates' lengths for paired-end reads)
seedSearchLmax 0
int>=0: defines the maximum length of the seeds, if =0 seed length is not limited
seedMultimapNmax 10000
int>0: only pieces that map fewer than this value are utilized in the stitching procedure
seedPerReadNmax 1000
int>0: max number of seeds per read
seedPerWindowNmax 50
int>0: max number of seeds per window
seedNoneLociPerWindow 10
int>0: max number of one seed loci per window
seedSplitMin 12
int>0: min length of the seed sequences split by Ns or mate gap
seedMapMin 5
int>0: min length of seeds to be mapped
alignIntronMin 21
minimum intron size: genomic gap is considered intron if its length>=alignIntronMin, otherwise it is considered Deletion
alignIntronMax 0
maximum intron size, if 0, max intron size will be determined by (2^winBinNbits)*winAnchorDistNbins
alignMatesGapMax 0
maximum gap between two mates, if 0, max intron gap will be determined by (2^winBinNbits)*winAnchorDistNbins
alignSJoverhangMin 5
int>0: minimum overhang (i.e. block size) for spliced alignments
alignSJstitchMismatchNmax 0 -1 0 0
4*int>=0: maximum number of mismatches for stitching of the splice junctions (-1: no limit).
(1) non-canonical motifs, (2) GT/AG and CT/AC motif, (3) GC/AG and CT/GC motif, (4) AT/AC and GT/AT motif.
alignSJDBoverhangMin 3
int>0: minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments
alignSplicedMateMapLmin 0
int>0: minimum mapped length for a read mate that is spliced
alignSplicedMateMapLminOverLmate 0.66
real>0: alignSplicedMateMapLmin normalized to mate length
alignWindowsPerReadNmax 10000
int>0: max number of windows per read
alignTranscriptsPerWindowNmax 100
int>0: max number of transcripts per window
alignTranscriptsPerReadNmax 10000
int>0: max number of different alignments per read to consider
alignEndsType Local
string: type of read ends alignment
Local ... standard local alignment with soft-clipping allowed
EndToEnd ... force end-to-end read alignment, do not soft-clip
Extend5pOfRead1 ... fully extend only the 5p of the read1, all other ends: local alignment
Extend5pOfReads12 ... fully extend only the 5p of the both read1 and read2, all other ends: local alignment
alignEndsProtrude 0 ConcordantPair
int, string: allow protrusion of alignment ends, i.e. start (end) of the +strand mate downstream of thestart (end) of the -strand mate
1st word: int: maximum number of protrusion bases allowed
2nd word: string:
ConcordantPair ... report alignments with non-zero protrusion as concordant pairs
DiscordantPair ... report alignments with non-zero protrusion as discordant pairs
alignSoftClipAtReferenceEnds Yes
string: allow the soft-clipping of the alignments past the end of the chromosomes
Yes ... allow
No ... prohibit, useful for compatibility with Cufflinks
alignInsertionFlush None
string: how to flush ambiguous insertion positions
None ... insertions are not flushed
Right ... insertions are flushed to the right
### Paired-End reads
peOverlapNbasesMin 0
int>=0: minimum number of overlap bases to trigger mates merging and realignment. Specify >0 valueto switch on the "merginf of overlapping mates" algorithm.
peOverlapMMp 0.01
real, >=0 & <1: maximum proportion of mismatched bases in the overlap area
### Windows, Anchors, Binning
winAnchorMultimapNmax 50
int>0: max number of loci anchors are allowed to map to
winBinNbits 16
int>0: =log2(winBin), where winBin is the size of the bin for the windows/clustering, each window will occupy an integer number of bins.
winAnchorDistNbins 9
int>0: max number of bins between two anchors that allows aggregation of anchors into one window
winFlankNbins 4
int>0: log2(winFlank), where win Flank is the size of the left and right flanking regions for each window
winReadCoverageRelativeMin 0.5
real>=0: minimum relative coverage of the read sequence by the seeds in a window, for STARlong algorithm only.
winReadCoverageBasesMin 0
int>0: minimum number of bases covered by the seeds in a window , for STARlong algorithm only.
### Chimeric Alignments
chimOutType Junctions
string(s): type of chimeric output
Junctions ... Chimeric.out.junction
SeparateSAMold ... output old SAM into separate Chimeric.out.sam file
WithinBAM ... output into main aligned BAM files (Aligned.*.bam)
WithinBAM HardClip ... (default) hard-clipping in the CIGAR for supplemental chimericalignments (default if no 2nd word is present)
WithinBAM SoftClip ... soft-clipping in the CIGAR for supplemental chimeric alignments
chimSegmentMin 0
int>=0: minimum length of chimeric segment length, if ==0, no chimeric output
chimScoreMin 0
int>=0: minimum total (summed) score of the chimeric segments
chimScoreDropMax 20
int>=0: max drop (difference) of chimeric score (the sum of scores of all chimeric segments) from the read length
chimScoreSeparation 10
int>=0: minimum difference (separation) between the best chimeric score and the next one
chimScoreJunctionNonGTAG -1
int: penalty for a non-GT/AG chimeric junction
chimJunctionOverhangMin 20
int>=0: minimum overhang for a chimeric junction
chimSegmentReadGapMax 0
int>=0: maximum gap in the read sequence between chimeric segments
chimFilter banGenomicN
string(s): different filters for chimeric alignments
None ... no filtering
banGenomicN ... Ns are not allowed in the genome sequence around the chimeric junction
chimMainSegmentMultNmax 10
int>=1: maximum number of multi-alignments for the main chimeric segment. =1 will prohibit multimapping main segments.
chimMultimapNmax 0
int>=0: maximum number of chimeric multi-alignments
0 ... use the old scheme for chimeric detection which only considered unique alignments
chimMultimapScoreRange 1
int>=0: the score range for multi-mapping chimeras below the best chimeric score. Only works with --chimMultimapNmax > 1
chimNonchimScoreDropMin 20
int>=0: to trigger chimeric detection, the drop in the best non-chimeric alignment score with respect to the read length has to be greater than this value
chimOutJunctionFormat 0
int: formatting type for the Chimeric.out.junction file
0 ... no comment lines/headers
1 ... comment lines at the end of the file: command line and Nreads: total, unique/multi-mapping
### Quantification of Annotations
quantMode -
string(s): types of quantification requested
- ... none
TranscriptomeSAM ... output SAM/BAM alignments to transcriptome into a separate file
GeneCounts ... count reads per gene
quantTranscriptomeBAMcompression 1 1
int: -2 to 10 transcriptome BAM compression level
-2 ... no BAM output
-1 ... default compression (6?)
0 ... no compression
10 ... maximum compression
quantTranscriptomeBan IndelSoftclipSingleend
string: prohibit various alignment type
IndelSoftclipSingleend ... prohibit indels, soft clipping and single-end alignments -compatible with RSEM
Singleend ... prohibit single-end alignments
### 2-pass Mapping
twopassMode None
string: 2-pass mapping mode.
None ... 1-pass mapping
Basic ... basic 2-pass mapping, with all 1st pass junctions inserted into the genome indices on the fly
twopass1readsN -1
int: number of reads to process for the 1st step. Use very large number (or default -1) to map all reads in the first step.
### WASP parameters
waspOutputMode None
string: WASP allele-specific output type. This is re-implementation of the original WASP mappability filteringby Bryce van de Geijn, Graham McVicker, Yoav Gilad & Jonathan K Pritchard. Please cite the original WASP paper: Nature Methods 12, 1061–1063 (2015), https://www.nature.com/articles/nmeth.3582 .
SAMtag ... add WASP tags to the alignments that pass WASP filtering
### STARsolo (single cell RNA-seq) parameters
soloType None
string(s): type of single-cell RNA-seq
CB_UMI_Simple ... (a.k.a. Droplet) one UMI and one Cell Barcode of fixed length in read2, e.g. Drop-seq and 10X Chromium.
CB_UMI_Complex ... multiple Cell Barcodes of varying length, one UMI of fixed length and one adapter sequence of fixed length are allowed in read2 only (e.g. inDrop, ddSeq).
CB_samTagOut ... output Cell Barcode as CR and/or CB SAm tag. No UMI counting. --readFilesIn cDNA_read1 [cDNA_read2 if paired-end] CellBarcode_read . Requires --outSAMtype BAM Unsorted [and/or SortedByCoordinate]
SmartSeq ... Smart-seq: each cell in a separate FASTQ (paired- or single-end), barcodes are corresponding read-groups, no UMI sequences, alignments deduplicated according to alignment start and end (after extending soft-clipped bases)
soloCBwhitelist -
string(s): file(s) with whitelist(s) of cell barcodes. Only --soloType CB_UMI_Complex allows more than one whitelist file.
None ... no whitelist: all cell barcodes are allowed
soloCBstart 1
int>0: cell barcode start base
soloCBlen 16
int>0: cell barcode length
soloUMIstart 17
int>0: UMI start base
soloUMIlen 10
int>0: UMI length
soloBarcodeReadLength 1
int: length of the barcode read
1 ... equal to sum of soloCBlen+soloUMIlen
0 ... not defined, do not check
soloBarcodeMate 0
int: identifies which read mate contains the barcode (CB+UMI) sequence
0 ... barcode sequence is on separate read, which should always be the last file in the --readFilesIn listed
1 ... barcode sequence is a part of mate 1
2 ... barcode sequence is a part of mate 2
soloCBposition -
strings(s) position of Cell Barcode(s) on the barcode read.
Presently only works with --soloType CB_UMI_Complex, and barcodes are assumed to be onRead2.
Format for each barcode: startAnchor_startPosition_endAnchor_endPosition
start(end)Anchor defines the Anchor Base for the CB: 0: read start; 1: read end; 2: adapter start; 3: adapter end
start(end)Position is the 0-based position with of the CB start(end) with respect to the Anchor Base
String for different barcodes are separated by space.
Example: inDrop (Zilionis et al, Nat. Protocols, 2017):
--soloCBposition 0_0_2_-1 3_1_3_8
soloUMIposition -
string position of the UMI on the barcode read, same as soloCBposition
Example: inDrop (Zilionis et al, Nat. Protocols, 2017):
--soloCBposition 3_9_3_14
soloAdapterSequence -
string: adapter sequence to anchor barcodes. Only one adapter sequence is allowed.
soloAdapterMismatchesNmax 1
int>0: maximum number of mismatches allowed in adapter sequence.
soloCBmatchWLtype 1MM_multi
string: matching the Cell Barcodes to the WhiteList
Exact ... only exact matches allowed
1MM ... only one match in whitelist with 1 mismatched baseallowed. Allowed CBs have to have at least one read with exact match.
1MM_multi ... multiple matches in whitelist with 1 mismatched base allowed, posterior probability calculation is used choose one of the matches.
Allowed CBs have to have at least one read with exact match. This option matches best with CellRanger 2.2.0
1MM_multi_pseudocounts ... same as 1MM_Multi, but pseudocounts of 1 are addedto all whitelist barcodes.
1MM_multi_Nbase_pseudocounts ... same as 1MM_multi_pseudocounts, multimatching to WL is allowed for CBs with N-bases. This option matches best with CellRanger >= 3.0.0
EditDist_2 ... allow up to edit distance of 3 fpr each of the barcodes. May include one deletion + one insertion. Only works with --soloType CB_UMI_Complex. Matches to multiple passlist barcdoes are not allowed. Similar to ParseBio Split-seq pipeline.
soloInputSAMattrBarcodeSeq -
string(s): when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributesmark the barcode sequence (in proper order).
For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeSeq CRUR .
This parameter is required when running STARsolo with input from SAM.
soloInputSAMattrBarcodeQual -
string(s): when inputting reads from a SAM file (--readsFileType SAM SE/PE), these SAM attributesmark the barcode qualities (in proper order).
For instance, for 10X CellRanger or STARsolo BAMs, use --soloInputSAMattrBarcodeQual CY UY .
If this parameter is '-' (default), the quality 'H' will be assigned to all bases.
soloStrand Forward
string: strandedness of the solo libraries:
Unstranded ... no strand information
Forward ... read strand same as the original RNA molecule
Reverse ... read strand opposite to the original RNA molecule
soloFeatures Gene
string(s): genomic features for which the UMI counts per Cell Barcode are collected
Gene ... genes: reads match the gene transcript
SJ ... splice junctions: reported in SJ.out.tab
GeneFull ... full gene (pre-mRNA): count all reads overlapping genes' exons andintrons
GeneFull_ExonOverIntron ... full gene (pre-mRNA): count all reads overlapping genes' exons and introns: prioritize 100% overlap with exons
GeneFull_Ex50pAS ... full gene (pre-RNA): count all reads overlapping genes' exons and introns: prioritize >50% overlap with exons. Do not count reads with 100% exonic overlap in the antisense direction.
#####UnderDevelopment_begin : not supported - do not use
Transcript3p ... quantification of transcript for 3' protocols
#####UnderDevelopment_end
soloMultiMappers Unique
string(s): counting method for reads mapping to multiple genes
Unique ... count only reads that map to unique genes
Uniform ... uniformly distribute multi-genic UMIs to all genes
Rescue ... distribute UMIs proportionally to unique+uniform counts (~ first iteration of EM)
PropUnique ... distribute UMIs proportionally to unique mappers, if present, and uniformly if not.
EM ... multi-gene UMIs are distributed using Expectation Maximization algorithm
soloUMIdedup 1MM_All
string(s): type of UMI deduplication (collapsing) algorithm
1MM_All ... all UMIs with 1 mismatch distance to each other are collapsed (i.e. counted once).
1MM_Directional_UMItools ... follows the "directional" method from the UMI-tools bySmith, Heger and Sudbery (Genome Research 2017).
1MM_Directional ... same as 1MM_Directional_UMItools, but with more stringent criteria for duplicate UMIs
Exact ... only exactly matching UMIs are collapsed.
NoDedup ... no deduplication of UMIs, count all reads.
1MM_CR ... CellRanger2-4 algorithm for 1MM UMI collapsing.
soloUMIfiltering -
string(s) type of UMI filtering (for reads uniquely mapping to genes)
- ... basic filtering: remove UMIs with N and homopolymers (similar to CellRanger 2.2.0).
MultiGeneUMI ... basic + remove lower-count UMIs that map to more than one gene.
MultiGeneUMI_All ... basic + remove all UMIs that map to more than one gene.
MultiGeneUMI_CR ... basic + remove lower-count UMIs that map to more than one gene,matching CellRanger > 3.0.0 .
Only works with --soloUMIdedup 1MM_CR
soloOutFileNames Solo.out/ features.tsv barcodes.tsv matrix.mtx
string(s) file names for STARsolo output:
file_name_prefix gene_names barcode_sequences cell_feature_count_matrix
soloCellFilter CellRanger2.2 3000 0.99 10
string(s): cell filtering type and parameters
None ... do not output filtered cells
TopCells ... only report top cells by UMI count, followed by the exact number of cells
CellRanger2.2 ... simple filtering of CellRanger 2.2.
Can be followed by numbers: number of expected cells, robust maximum percentile for UMI count, maximum to minimum ratio for UMI count
The harcoded values are from CellRanger: nExpectedCells=3000; maxPercentile=0.99; maxMinRatio=10
EmptyDrops_CR ... EmptyDrops filtering in CellRanger flavor. Please cite the original EmptyDrops paper: A.T.L Lun et al, Genome Biology, 20, 63 (2019): https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1662-y
Can be followed by 10 numeric parameters: nExpectedCells maxPercentile maxMinRatio indMin indMax umiMin umiMinFracMedian candMaxN FDR simN
The harcoded values are from CellRanger: 3000 0.99 10 45000 90000 500 0.01 20000 0.01 10000
soloOutFormatFeaturesGeneField3 "Gene Expression"
string(s): field 3 in the Gene features.tsv file. If "-", then no 3rd field is output.
soloCellReadStats None
string: Output reads statistics for each CB
Standard ... standard output
#####UnderDevelopment_begin : not supported - do not use
soloClusterCBfile -
string: file containing the cluster information for cell barcodes, two columns: CB cluster_index. Only used with --soloFeatures Transcript3p
#####UnderDevelopment_end
|