Installed
This software should be available with no extra configuration.
vsearch-2.22.1
The aim of this project is to create an alternative to the
USEARCH tool developed by Robert C. Edgar
(2010). The new tool should:
- have open source code with an appropriate open source license
- be free of charge, gratis
- have a 64-bit design that handles very large databases and much more than 4GB of memory
- be as accurate or more accurate than usearch
- be as fast or faster than usearch
We have implemented a tool called VSEARCH which supports de novo and
reference based chimera detection, clustering, full-length and prefix
dereplication, rereplication, reverse complementation, masking, all-vs-all
pairwise global alignment, exact and global alignment searching, shuffling,
subsampling and sorting. It also supports FASTQ file analysis, filtering,
conversion and merging of paired-end reads.
VSEARCH stands for vectorized search, as the tool takes advantage of
parallelism in the form of SIMD vectorization as well as multiple threads to
perform accurate alignments at high speed. VSEARCH uses an optimal global
aligner (full dynamic programming Needleman-Wunsch), in contrast to USEARCH
which by default uses a heuristic seed and extend aligner. This usually
results in more accurate alignments and overall improved sensitivity (recall)
with VSEARCH, especially for alignments with gaps.
VSEARCH binaries are
provided for GNU/Linux on three 64-bit processor architectures: x86-64, POWER8
(ppc64le) and ARMv8 (aarch64). Binaries are also provided for MacOS (version
10.9 Mavericks or later) on Intel (x86-64) and Apple Silicon (ARMv8), as well
as Windows (64-bit, version 7 or higher, on x86_64). VSEARCH contains
dedicated SIMD code for the three processor architectures (SSE2/SSSE3,
AltiVec/VMX/VSX, Neon).
CPU \ OS |
GNU/Linux |
MacOS |
Windows |
x86_64 |
✔ |
✔ |
✔ |
ARMv8 |
✔ |
✔ |
|
POWER8 |
✔ |
|
|
VSEARCH can directly read input query and database files that are compressed
using gzip and bzip2 (.gz and .bz2) if the zlib and bzip2 libraries are
available.
Most of the nucleotide based commands and options in USEARCH version 7 are
supported, as well as some in version 8. The same option names as in USEARCH
version 7 has been used in order to make VSEARCH an almost drop-in
replacement. VSEARCH does not support amino acid sequences or local
alignments. These features may be added in the future.
Getting Help
If you can’t find an answer in the VSEARCH
documentation,
please visit the VSEARCH Web
Forum to post a
question or start a discussion.
Example
In the example below, VSEARCH will identify sequences in the file database.fsa
that are at least 90% identical on the plus strand to the query sequences in
the file queries.fsa and write the results to the file alnout.txt.
./vsearch --usearch_global queries.fsa --db database.fsa --id 0.9 --alnout alnout.txt
Location and version
1
2
3
4
5
6
7
8
9
10
11
12
13
|
$ which vsearch
/local/cluster/bin/vsearch
$ vsearch --version
vsearch v2.22.1_linux_x86_64, 995.4GB RAM, 64 cores
https://github.com/torognes/vsearch
Rognes T, Flouri T, Nichols B, Quince C, Mahe F (2016)
VSEARCH: a versatile open source tool for metagenomics
PeerJ 4:e2584 doi: 10.7717/peerj.2584 https://doi.org/10.7717/peerj.2584
Compiled with support for gzip-compressed files, and the library is loaded.
zlib version 1.2.8, compile flags a9
Compiled with support for bzip2-compressed files, and the library is loaded.
|
help message
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
|
$ vsearch --help
vsearch v2.22.1_linux_x86_64, 995.4GB RAM, 64 cores
https://github.com/torognes/vsearch
Rognes T, Flouri T, Nichols B, Quince C, Mahe F (2016)
VSEARCH: a versatile open source tool for metagenomics
PeerJ 4:e2584 doi: 10.7717/peerj.2584 https://doi.org/10.7717/peerj.2584
Usage: vsearch [OPTIONS]
General options
--bzip2_decompress decompress input with bzip2 (required if pipe)
--fasta_width INT width of FASTA seq lines, 0 for no wrap (80)
--gzip_decompress decompress input with gzip (required if pipe)
--help | -h display help information
--log FILENAME write messages, timing and memory info to file
--maxseqlength INT maximum sequence length (50000)
--minseqlength INT min seq length (clust/derep/search: 32, other:1)
--no_progress do not show progress indicator
--notrunclabels do not truncate labels at first space
--quiet output just warnings and fatal errors to stderr
--threads INT number of threads to use, zero for all cores (0)
--version | -v display version information
Chimera detection
--uchime_denovo FILENAME detect chimeras de novo
--uchime2_denovo FILENAME detect chimeras de novo in denoised amplicons
--uchime3_denovo FILENAME detect chimeras de novo in denoised amplicons
--uchime_ref FILENAME detect chimeras using a reference database
Data
--db FILENAME reference database for --uchime_ref
Parameters
--abskew REAL minimum abundance ratio (2.0, 16.0 for uchime3)
--dn REAL 'no' vote pseudo-count (1.4)
--mindiffs INT minimum number of differences in segment (3) *
--mindiv REAL minimum divergence from closest parent (0.8) *
--minh REAL minimum score (0.28) * ignored in uchime2/3
--sizein propagate abundance annotation from input
--self exclude identical labels for --uchime_ref
--selfid exclude identical sequences for --uchime_ref
--xn REAL 'no' vote weight (8.0)
Output
--alignwidth INT width of alignment in uchimealn output (80)
--borderline FILENAME output borderline chimeric sequences to file
--chimeras FILENAME output chimeric sequences to file
--fasta_score include chimera score in fasta output
--nonchimeras FILENAME output non-chimeric sequences to file
--relabel STRING relabel nonchimeras with this prefix string
--relabel_keep keep the old label after the new when relabelling
--relabel_md5 relabel with md5 digest of normalized sequence
--relabel_self relabel with the sequence itself as label
--relabel_sha1 relabel with sha1 digest of normalized sequence
--sizeout include abundance information when relabelling
--uchimealns FILENAME output chimera alignments to file
--uchimeout FILENAME output to chimera info to tab-separated file
--uchimeout5 make output compatible with uchime version 5
--xsize strip abundance information in output
Clustering
--cluster_fast FILENAME cluster sequences after sorting by length
--cluster_size FILENAME cluster sequences after sorting by abundance
--cluster_smallmem FILENAME cluster already sorted sequences (see -usersort)
--cluster_unoise FILENAME denoise Illumina amplicon reads
Parameters (most searching options also apply)
--cons_truncate do not ignore terminal gaps in MSA for consensus
--id REAL reject if identity lower, accepted values: 0-1.0
--iddef INT id definition, 0-4=CD-HIT,all,int,MBL,BLAST (2)
--qmask none|dust|soft mask seqs with dust, soft or no method (dust)
--sizein propagate abundance annotation from input
--strand plus|both cluster using plus or both strands (plus)
--usersort indicate sequences not pre-sorted by length
--minsize INT minimum abundance (unoise only) (8)
--unoise_alpha REAL alpha parameter (unoise only) (2.0)
Output
--biomout FILENAME filename for OTU table output in biom 1.0 format
--centroids FILENAME output centroid sequences to FASTA file
--clusterout_id add cluster id info to consout and profile files
--clusterout_sort order msaout, consout, profile by decr abundance
--clusters STRING output each cluster to a separate FASTA file
--consout FILENAME output cluster consensus sequences to FASTA file
--mothur_shared_out FN filename for OTU table output in mothur format
--msaout FILENAME output multiple seq. alignments to FASTA file
--otutabout FILENAME filename for OTU table output in classic format
--profile FILENAME output sequence profile of each cluster to file
--relabel STRING relabel centroids with this prefix string
--relabel_keep keep the old label after the new when relabelling
--relabel_md5 relabel with md5 digest of normalized sequence
--relabel_self relabel with the sequence itself as label
--relabel_sha1 relabel with sha1 digest of normalized sequence
--sizeorder sort accepted centroids by abundance, AGC
--sizeout write cluster abundances to centroid file
--uc FILENAME specify filename for UCLUST-like output
--xsize strip abundance information in output
Convert SFF to FASTQ
--sff_convert FILENAME convert given SFF file to FASTQ format
Parameters
--sff_clip clip ends of sequences as indicated in file (no)
--fastq_asciiout INT FASTQ output quality score ASCII base char (33)
--fastq_qmaxout INT maximum base quality value for FASTQ output (41)
--fastq_qminout INT minimum base quality value for FASTQ output (0)
Output
--fastqout FILENAME output converted sequences to given FASTQ file
Dereplication and rereplication
--derep_fulllength FILENAME dereplicate sequences in the given FASTA file
--derep_id FILENAME dereplicate using both identifiers and sequences
--derep_prefix FILENAME dereplicate sequences in file based on prefixes
--derep_smallmem FILENAME dereplicate sequences in file using less memory
--fastx_uniques FILENAME dereplicate sequences in the FASTA/FASTQ file
--rereplicate FILENAME rereplicate sequences in the given FASTA file
Parameters
--maxuniquesize INT maximum abundance for output from dereplication
--minuniquesize INT minimum abundance for output from dereplication
--sizein propagate abundance annotation from input
--strand plus|both dereplicate plus or both strands (plus)
Output
--fastq_ascii INT FASTQ input quality score ASCII base char (33)
--fastq_qmax INT maximum base quality value for FASTQ input (41)
--fastq_qmaxout INT maximum base quality value for FASTQ output (41)
--fastq_qmin INT minimum base quality value for FASTQ input (0)
--fastq_qminout INT minimum base quality value for FASTQ output (0)
--fastaout FILENAME output FASTA file (for fastx_uniques)
--fastqout FILENAME output FASTQ file (for fastx_uniques)
--output FILENAME output FASTA file (not for fastx_uniques)
--relabel STRING relabel with this prefix string
--relabel_keep keep the old label after the new when relabelling
--relabel_md5 relabel with md5 digest of normalized sequence
--relabel_self relabel with the sequence itself as label
--relabel_sha1 relabel with sha1 digest of normalized sequence
--sizeout write abundance annotation to output
--tabbedout FILENAME write cluster info to tsv file for fastx_uniques
--topn INT output only n most abundant sequences after derep
--uc FILENAME filename for UCLUST-like dereplication output
--xsize strip abundance information in derep output
FASTA to FASTQ conversion
--fasta2fastq FILENAME convert from FASTA to FASTQ, fake quality scores
Parameters
--fastq_asciiout INT FASTQ output quality score ASCII base char (33)
--fastq_qmaxout INT fake quality score for FASTQ output (41)
Output
--fastqout FILENAME FASTQ output filename for converted sequences
FASTQ format conversion
--fastq_convert FILENAME convert between FASTQ file formats
Parameters
--fastq_ascii INT FASTQ input quality score ASCII base char (33)
--fastq_asciiout INT FASTQ output quality score ASCII base char (33)
--fastq_qmax INT maximum base quality value for FASTQ input (41)
--fastq_qmaxout INT maximum base quality value for FASTQ output (41)
--fastq_qmin INT minimum base quality value for FASTQ input (0)
--fastq_qminout INT minimum base quality value for FASTQ output (0)
Output
--fastqout FILENAME FASTQ output filename for converted sequences
FASTQ format detection and quality analysis
--fastq_chars FILENAME analyse FASTQ file for version and quality range
Parameters
--fastq_tail INT min length of tails to count for fastq_chars (4)
FASTQ quality statistics
--fastq_stats FILENAME report statistics on FASTQ file
--fastq_eestats FILENAME quality score and expected error statistics
--fastq_eestats2 FILENAME expected error and length cutoff statistics
Parameters
--ee_cutoffs REAL,... fastq_eestats2 expected error cutoffs (0.5,1,2)
--fastq_ascii INT FASTQ input quality score ASCII base char (33)
--fastq_qmax INT maximum base quality value for FASTQ input (41)
--fastq_qmin INT minimum base quality value for FASTQ input (0)
--length_cutoffs INT,INT,INT fastq_eestats2 length (min,max,incr) (50,*,50)
Output
--log FILENAME output file for fastq_stats statistics
--output FILENAME output file for fastq_eestats(2) statistics
Masking (new)
--fastx_mask FILENAME mask sequences in the given FASTA or FASTQ file
Parameters
--fastq_ascii INT FASTQ input quality score ASCII base char (33)
--fastq_qmax INT maximum base quality value for FASTQ input (41)
--fastq_qmin INT minimum base quality value for FASTQ input (0)
--hardmask mask by replacing with N instead of lower case
--max_unmasked_pct max unmasked % of sequences to keep (100.0)
--min_unmasked_pct min unmasked % of sequences to keep (0.0)
--qmask none|dust|soft mask seqs with dust, soft or no method (dust)
Output
--fastaout FILENAME output to specified FASTA file
--fastqout FILENAME output to specified FASTQ file
Masking (old)
--maskfasta FILENAME mask sequences in the given FASTA file
Parameters
--hardmask mask by replacing with N instead of lower case
--qmask none|dust|soft mask seqs with dust, soft or no method (dust)
Output
--output FILENAME output to specified FASTA file
Orient sequences in forward or reverse direction
--orient FILENAME orient sequences in given FASTA/FASTQ file
Data
--db FILENAME database of sequences in correct orientation
--dbmask none|dust|soft mask db seqs with dust, soft or no method (dust)
--qmask none|dust|soft mask query with dust, soft or no method (dust)
--wordlength INT length of words used for matching 3-15 (12)
Output
--fastaout FILENAME FASTA output filename for oriented sequences
--fastqout FILENAME FASTQ output filenamr for oriented sequences
--notmatched FILENAME output filename for undetermined sequences
--tabbedout FILENAME output filename for result information
Paired-end reads joining
--fastq_join FILENAME join paired-end reads into one sequence with gap
Data
--reverse FILENAME specify FASTQ file with reverse reads
--join_padgap STRING sequence string used for padding (NNNNNNNN)
--join_padgapq STRING quality string used for padding (IIIIIIII)
Output
--fastaout FILENAME FASTA output filename for joined sequences
--fastqout FILENAME FASTQ output filename for joined sequences
Paired-end reads merging
--fastq_mergepairs FILENAME merge paired-end reads into one sequence
Data
--reverse FILENAME specify FASTQ file with reverse reads
Parameters
--fastq_allowmergestagger allow merging of staggered reads
--fastq_ascii INT FASTQ input quality score ASCII base char (33)
--fastq_maxdiffpct REAL maximum percentage diff. bases in overlap (100.0)
--fastq_maxdiffs INT maximum number of different bases in overlap (10)
--fastq_maxee REAL maximum expected error value for merged sequence
--fastq_maxmergelen maximum length of entire merged sequence
--fastq_maxns INT maximum number of N's
--fastq_minlen INT minimum input read length after truncation (1)
--fastq_minmergelen minimum length of entire merged sequence
--fastq_minovlen minimum length of overlap between reads (10)
--fastq_nostagger disallow merging of staggered reads (default)
--fastq_qmax INT maximum base quality value for FASTQ input (41)
--fastq_qmaxout INT maximum base quality value for FASTQ output (41)
--fastq_qmin INT minimum base quality value for FASTQ input (0)
--fastq_qminout INT minimum base quality value for FASTQ output (0)
--fastq_truncqual INT base quality value for truncation
Output
--eetabbedout FILENAME output error statistics to specified file
--fastaout FILENAME FASTA output filename for merged sequences
--fastaout_notmerged_fwd FN FASTA filename for non-merged forward sequences
--fastaout_notmerged_rev FN FASTA filename for non-merged reverse sequences
--fastq_eeout include expected errors (ee) in FASTQ output
--fastqout FILENAME FASTQ output filename for merged sequences
--fastqout_notmerged_fwd FN FASTQ filename for non-merged forward sequences
--fastqout_notmerged_rev FN FASTQ filename for non-merged reverse sequences
--label_suffix STRING suffix to append to label of merged sequences
--xee remove expected errors (ee) info from output
Pairwise alignment
--allpairs_global FILENAME perform global alignment of all sequence pairs
Output (most searching options also apply)
--alnout FILENAME filename for human-readable alignment output
--acceptall output all pairwise alignments
Restriction site cutting
--cut FILENAME filename of FASTA formatted input sequences
Parameters
--cut_pattern STRING pattern to match with ^ and _ at cut sites
Output
--fastaout FILENAME FASTA filename for fragments on forward strand
--fastaout_rev FILENAME FASTA filename for fragments on reverse strand
--fastaout_discarded FN FASTA filename for non-matching sequences
--fastaout_discarded_rev FN FASTA filename for non-matching, reverse compl.
Reverse complementation
--fastx_revcomp FILENAME reverse-complement seqs in FASTA or FASTQ file
Parameters
--fastq_ascii INT FASTQ input quality score ASCII base char (33)
--fastq_qmax INT maximum base quality value for FASTQ input (41)
--fastq_qmin INT minimum base quality value for FASTQ input (0)
Output
--fastaout FILENAME FASTA output filename
--fastqout FILENAME FASTQ output filename
--label_suffix STRING label to append to identifier in the output
Searching
--search_exact FILENAME filename of queries for exact match search
--usearch_global FILENAME filename of queries for global alignment search
Data
--db FILENAME name of UDB or FASTA database for search
Parameters
--dbmask none|dust|soft mask db with dust, soft or no method (dust)
--fulldp full dynamic programming alignment (always on)
--gapext STRING penalties for gap extension (2I/1E)
--gapopen STRING penalties for gap opening (20I/2E)
--hardmask mask by replacing with N instead of lower case
--id REAL reject if identity lower
--iddef INT id definition, 0-4=CD-HIT,all,int,MBL,BLAST (2)
--idprefix INT reject if first n nucleotides do not match
--idsuffix INT reject if last n nucleotides do not match
--lca_cutoff REAL fraction of matching hits required for LCA (1.0)
--leftjust reject if terminal gaps at alignment left end
--match INT score for match (2)
--maxaccepts INT number of hits to accept and show per strand (1)
--maxdiffs INT reject if more substitutions or indels
--maxgaps INT reject if more indels
--maxhits INT maximum number of hits to show (unlimited)
--maxid REAL reject if identity higher
--maxqsize INT reject if query abundance larger
--maxqt REAL reject if query/target length ratio higher
--maxrejects INT number of non-matching hits to consider (32)
--maxsizeratio REAL reject if query/target abundance ratio higher
--maxsl REAL reject if shorter/longer length ratio higher
--maxsubs INT reject if more substitutions
--mid REAL reject if percent identity lower, ignoring gaps
--mincols INT reject if alignment length shorter
--minqt REAL reject if query/target length ratio lower
--minsizeratio REAL reject if query/target abundance ratio lower
--minsl REAL reject if shorter/longer length ratio lower
--mintsize INT reject if target abundance lower
--minwordmatches INT minimum number of word matches required (12)
--mismatch INT score for mismatch (-4)
--pattern STRING option is ignored
--qmask none|dust|soft mask query with dust, soft or no method (dust)
--query_cov REAL reject if fraction of query seq. aligned lower
--rightjust reject if terminal gaps at alignment right end
--sizein propagate abundance annotation from input
--self reject if labels identical
--selfid reject if sequences identical
--slots INT option is ignored
--strand plus|both search plus or both strands (plus)
--target_cov REAL reject if fraction of target seq. aligned lower
--weak_id REAL include aligned hits with >= id; continue search
--wordlength INT length of words for database index 3-15 (8)
Output
--alnout FILENAME filename for human-readable alignment output
--biomout FILENAME filename for OTU table output in biom 1.0 format
--blast6out FILENAME filename for blast-like tab-separated output
--dbmatched FILENAME FASTA file for matching database sequences
--dbnotmatched FILENAME FASTA file for non-matching database sequences
--fastapairs FILENAME FASTA file with pairs of query and target
--lcaout FILENAME output LCA of matching sequences to file
--matched FILENAME FASTA file for matching query sequences
--mothur_shared_out FN filename for OTU table output in mothur format
--notmatched FILENAME FASTA file for non-matching query sequences
--otutabout FILENAME filename for OTU table output in classic format
--output_no_hits output non-matching queries to output files
--rowlen INT width of alignment lines in alnout output (64)
--samheader include a header in the SAM output file
--samout FILENAME filename for SAM format output
--sizeout write abundance annotation to dbmatched file
--top_hits_only output only hits with identity equal to the best
--uc FILENAME filename for UCLUST-like output
--uc_allhits show all, not just top hit with uc output
--userfields STRING fields to output in userout file
--userout FILENAME filename for user-defined tab-separated output
Shuffling and sorting
--shuffle FILENAME shuffle order of sequences in FASTA file randomly
--sortbylength FILENAME sort sequences by length in given FASTA file
--sortbysize FILENAME abundance sort sequences in given FASTA file
Parameters
--maxsize INT maximum abundance for sortbysize
--minsize INT minimum abundance for sortbysize
--randseed INT seed for PRNG, zero to use random data source (0)
--sizein propagate abundance annotation from input
Output
--output FILENAME output to specified FASTA file
--relabel STRING relabel sequences with this prefix string
--relabel_keep keep the old label after the new when relabelling
--relabel_md5 relabel with md5 digest of normalized sequence
--relabel_self relabel with the sequence itself as label
--relabel_sha1 relabel with sha1 digest of normalized sequence
--sizeout include abundance information when relabelling
--topn INT output just first n sequences
--xsize strip abundance information in output
Subsampling
--fastx_subsample FILENAME subsample sequences from given FASTA/FASTQ file
Parameters
--fastq_ascii INT FASTQ input quality score ASCII base char (33)
--fastq_qmax INT maximum base quality value for FASTQ input (41)
--fastq_qmin INT minimum base quality value for FASTQ input (0)
--randseed INT seed for PRNG, zero to use random data source (0)
--sample_pct REAL sampling percentage between 0.0 and 100.0
--sample_size INT sampling size
--sizein consider abundance info from input, do not ignore
Output
--fastaout FILENAME output subsampled sequences to FASTA file
--fastaout_discarded FILE output non-subsampled sequences to FASTA file
--fastqout FILENAME output subsampled sequences to FASTQ file
--fastqout_discarded output non-subsampled sequences to FASTQ file
--relabel STRING relabel sequences with this prefix string
--relabel_keep keep the old label after the new when relabelling
--relabel_md5 relabel with md5 digest of normalized sequence
--relabel_self relabel with the sequence itself as label
--relabel_sha1 relabel with sha1 digest of normalized sequence
--sizeout update abundance information in output
--xsize strip abundance information in output
Taxonomic classification
--sintax FILENAME classify sequences in given FASTA/FASTQ file
Parameters
--db FILENAME taxonomic reference db in given FASTA or UDB file
--sintax_cutoff REAL confidence value cutoff level (0.0)
Output
--tabbedout FILENAME write results to given tab-delimited file
Trimming and filtering
--fastx_filter FILENAME trim and filter sequences in FASTA/FASTQ file
--fastq_filter FILENAME trim and filter sequences in FASTQ file
--reverse FILENAME FASTQ file with other end of paired-end reads
Parameters
--fastq_ascii INT FASTQ input quality score ASCII base char (33)
--fastq_maxee REAL discard if expected error value is higher
--fastq_maxee_rate REAL discard if expected error rate is higher
--fastq_maxlen INT discard if length of sequence is longer
--fastq_maxns INT discard if number of N's is higher
--fastq_minlen INT discard if length of sequence is shorter
--fastq_qmax INT maximum base quality value for FASTQ input (41)
--fastq_qmin INT minimum base quality value for FASTQ input (0)
--fastq_stripleft INT delete given number of bases from the 5' end
--fastq_stripright INT delete given number of bases from the 3' end
--fastq_truncee REAL truncate to given maximum expected error
--fastq_trunclen INT truncate to given length (discard if shorter)
--fastq_trunclen_keep INT truncate to given length (keep if shorter)
--fastq_truncqual INT truncate to given minimum base quality
--maxsize INT discard if abundance of sequence is above
--minsize INT discard if abundance of sequence is below
Output
--eeout include expected errors in output
--fastaout FN FASTA filename for passed sequences
--fastaout_discarded FN FASTA filename for discarded sequences
--fastaout_discarded_rev FN FASTA filename for discarded reverse sequences
--fastaout_rev FN FASTA filename for passed reverse sequences
--fastqout FN FASTQ filename for passed sequences
--fastqout_discarded FN FASTQ filename for discarded sequences
--fastqout_discarded_rev FN FASTQ filename for discarded reverse sequences
--fastqout_rev FN FASTQ filename for passed reverse sequences
--relabel STRING relabel filtered sequences with given prefix
--relabel_keep keep the old label after the new when relabelling
--relabel_md5 relabel filtered sequences with md5 digest
--relabel_self relabel with the sequence itself as label
--relabel_sha1 relabel filtered sequences with sha1 digest
--sizeout include abundance information when relabelling
--xee remove expected errors (ee) info from output
--xsize strip abundance information in output
UDB files
--makeudb_usearch FILENAME make UDB file from given FASTA file
--udb2fasta FILENAME output FASTA file from given UDB file
--udbinfo FILENAME show information about UDB file
--udbstats FILENAME report statistics about indexed words in UDB file
Parameters
--dbmask none|dust|soft mask db with dust, soft or no method (dust)
--hardmask mask by replacing with N instead of lower case
--wordlength INT length of words for database index 3-15 (8)
Output
--output FILENAME UDB or FASTA output file
|
software ref: https://github.com/torognes/vsearch
research ref: https://doi.org/10.7717/peerj.2584