MMseqs2: ultra fast and sensitive sequence search and clustering suite
MMseqs2 (Many-against-Many sequence searching) is a software suite to search
and cluster huge protein and nucleotide sequence sets. MMseqs2 is open source
GPL-licensed software implemented in C++ for Linux, MacOS, and (as beta
version, via cygwin) Windows. The software is designed to run on multiple
cores and servers and exhibits very good scalability. MMseqs2 can run 10000
times faster than BLAST. At 100 times its speed it achieves almost the same
sensitivity. It can perform profile searches with the same sensitivity as
PSI-BLAST at over 400 times its speed.
Documentation
The MMseqs2 user guide is available in our GitHub
Wiki or as a PDF
file (Thanks to
pandoc!). The wiki also contains
tutorials to learn how
to use MMseqs2 with real data. For questions please open an issue on
GitHub or ask in our
chat.
Keep posted about MMseqs2/Linclust updates by following Martin on
Twitter.
Location and version:
1
2
3
4
|
[Linux@chrom1 bin]$ which mmseqs
/local/cluster/bin/mmseqs
[Linux@chrom1 bin]$ mmseqs version
75af0c82edf34587548bacc865cfa1d2261a9696
|
help message:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
|
$ mmseqs
MMseqs2 (Many against Many sequence searching) is an open-source software suite for very fast,
parallelized protein sequence searches and clustering of huge protein sequence data sets.
Please cite: M. Steinegger and J. Soding. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, doi:10.1038/nbt.3988 (2017).
MMseqs2 Version: 75af0c82edf34587548bacc865cfa1d2261a9696
© Martin Steinegger (martin.steinegger@snu.ac.kr)
usage: mmseqs <command> [<args>]
Easy workflows for plain text input/output
easy-search Sensitive homology search
easy-cluster Slower, sensitive clustering
easy-linclust Fast linear time cluster, less sensitive clustering
easy-taxonomy Taxonomic classification
easy-rbh Find reciprocal best hit
Main workflows for database input/output
search Sensitive homology search
map Map nearly identical sequences
rbh Reciprocal best hit search
linclust Fast, less sensitive clustering
cluster Slower, sensitive clustering
clusterupdate Update previous clustering with new sequences
taxonomy Taxonomic classification
Input database creation
databases List and download databases
createdb Convert FASTA/Q file(s) to a sequence DB
createindex Store precomputed index on disk to reduce search overhead
convertmsa Convert Stockholm/PFAM MSA file to a MSA DB
msa2profile Convert a MSA DB to a profile DB
Format conversion for downstream processing
convertalis Convert alignment DB to BLAST-tab, SAM or custom format
createtsv Convert result DB to tab-separated flat file
convert2fasta Convert sequence DB to FASTA format
taxonomyreport Create a taxonomy report in Kraken or Krona format
An extended list of all modules can be obtained by calling 'mmseqs -h'.
Bash completion for modules and parameters can be installed by adding "source MMSEQS_HOME/util/bash-completion.sh" to your "$HOME/.bash_profile".
|
Publications
Steinegger M and Soeding J. MMseqs2 enables sensitive protein sequence
searching for the analysis of massive data sets. Nature Biotechnology, doi:
10.1038/nbt.3988 (2017).
Steinegger M and Soeding J. Clustering huge protein sequence sets in linear
time. Nature Communications, doi: 10.1038/s41467-018-04964-5
(2018).
Mirdita M, Steinegger M and Soeding J. MMseqs2 desktop and local web server
app for fast, interactive sequence searches. Bioinformatics, doi:
10.1093/bioinformatics/bty1057
(2019).
Mirdita M, Steinegger M, Breitwieser F, Soding J, Levy Karin E: Fast and
sensitive taxonomic assignment to metagenomic contigs. Bioinformatics, doi:
10.1093/bioinformatics/btab184
(2021).
software ref: https://github.com/soedinglab/mmseqs2
software ref: https://github.com/soedinglab/MMseqs2/wiki
research ref: See above