MMseqs2 75af0

2022-01-13 559 words 3 minutes

MMseqs2: ultra fast and sensitive sequence search and clustering suite

MMseqs2 (Many-against-Many sequence searching) is a software suite to search and cluster huge protein and nucleotide sequence sets. MMseqs2 is open source GPL-licensed software implemented in C++ for Linux, MacOS, and (as beta version, via cygwin) Windows. The software is designed to run on multiple cores and servers and exhibits very good scalability. MMseqs2 can run 10000 times faster than BLAST. At 100 times its speed it achieves almost the same sensitivity. It can perform profile searches with the same sensitivity as PSI-BLAST at over 400 times its speed.

Documentation

The MMseqs2 user guide is available in our GitHub Wiki or as a PDF file (Thanks to pandoc!). The wiki also contains tutorials to learn how to use MMseqs2 with real data. For questions please open an issue on GitHub or ask in our chat.

Keep posted about MMseqs2/Linclust updates by following Martin on Twitter.

Location and version:

1
2
3
4


[Linux@chrom1 bin]$ which mmseqs
/local/cluster/bin/mmseqs
[Linux@chrom1 bin]$ mmseqs version
75af0c82edf34587548bacc865cfa1d2261a9696

help message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43


$ mmseqs
MMseqs2 (Many against Many sequence searching) is an open-source software suite for very fast,
parallelized protein sequence searches and clustering of huge protein sequence data sets.

Please cite: M. Steinegger and J. Soding. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, doi:10.1038/nbt.3988 (2017).

MMseqs2 Version: 75af0c82edf34587548bacc865cfa1d2261a9696
© Martin Steinegger (martin.steinegger@snu.ac.kr)

usage: mmseqs <command> [<args>]

Easy workflows for plain text input/output
  easy-search       	Sensitive homology search
  easy-cluster      	Slower, sensitive clustering
  easy-linclust     	Fast linear time cluster, less sensitive clustering
  easy-taxonomy     	Taxonomic classification
  easy-rbh          	Find reciprocal best hit

Main workflows for database input/output
  search            	Sensitive homology search
  map               	Map nearly identical sequences
  rbh               	Reciprocal best hit search
  linclust          	Fast, less sensitive clustering
  cluster           	Slower, sensitive clustering
  clusterupdate     	Update previous clustering with new sequences
  taxonomy          	Taxonomic classification

Input database creation
  databases         	List and download databases
  createdb          	Convert FASTA/Q file(s) to a sequence DB
  createindex       	Store precomputed index on disk to reduce search overhead
  convertmsa        	Convert Stockholm/PFAM MSA file to a MSA DB
  msa2profile       	Convert a MSA DB to a profile DB

Format conversion for downstream processing
  convertalis       	Convert alignment DB to BLAST-tab, SAM or custom format
  createtsv         	Convert result DB to tab-separated flat file
  convert2fasta     	Convert sequence DB to FASTA format
  taxonomyreport    	Create a taxonomy report in Kraken or Krona format

An extended list of all modules can be obtained by calling 'mmseqs -h'.

Bash completion for modules and parameters can be installed by adding "source MMSEQS_HOME/util/bash-completion.sh" to your "$HOME/.bash_profile".

Publications

Steinegger M and Soeding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nature Biotechnology, doi: 10.1038/nbt.3988 (2017).

Steinegger M and Soeding J. Clustering huge protein sequence sets in linear time. Nature Communications, doi: 10.1038/s41467-018-04964-5 (2018).

Mirdita M, Steinegger M and Soeding J. MMseqs2 desktop and local web server app for fast, interactive sequence searches. Bioinformatics, doi: 10.1093/bioinformatics/bty1057 (2019).

Mirdita M, Steinegger M, Breitwieser F, Soding J, Levy Karin E: Fast and sensitive taxonomic assignment to metagenomic contigs. Bioinformatics, doi: 10.1093/bioinformatics/btab184 (2021).

software ref: https://github.com/soedinglab/mmseqs2
software ref: https://github.com/soedinglab/MMseqs2/wiki
research ref: See above