mummer 4

2022-07-18 688 words 4 minutes

MUMmer

MUMmer is a system for rapidly aligning DNA and protein sequences. The nucmer aligner in the current version (release 4.x) can align two mammalian genomes in about 3 hours on a typical 32+ core workstation with 64+Gb RAM; smaller genomes such as bacteria or small eukaryotes are aligned in seconds or minutes. The promer utility generates alignments based upon the six-frame translations of both input sequences. promer permits the alignment of genomes for which the proteins are similar but the DNA sequence is too divergent to detect similarity. See the nucmer and promer readme files in the “docs/” subdirectory for more details. MUMmer is open source, and we ask that you cite our most recent paper in any publications that use this system:

(The latest Version 4.x citation)

Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A. MUMmer4: A fast and versatile genome alignment system. PLoS computational biology. 2018 Jan 26;14(1):e1005944.

(Version 3.x citation)

Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL. Versatile and open software for comparing large genomes. Genome biology. 2004 Jan 1;5(2):R12.

(Version 2.1 citation)

Delcher AL, Phillippy A, Carlton J, Salzberg SL. Fast algorithms for large-scale genome alignment and comparison. Nucleic acids research. 2002 Jun 1;30(11):2478-83.

(Version 1.0 citation)

Delcher AL, Kasif S, Fleischmann RD, Peterson J, White O, Salzberg SL. Alignment of whole genomes. Nucleic acids research. 1999 Jan 1;27(11):2369-76.

Note

This software is a significant departure from mummer version 3.

Find the version 3 executable at /local/cluster/MUMmer3.23/

Location and version

1
2
3
4


$ which mummer
/local/cluster/mummer/bin/mummer
$ mummer --version
4.0.0rc1

help message

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45


$ mummer --help
mummer: unrecognized option '--help'
Invalid parameters.
Usage: mummer [options] <reference-file> <query file1> . . . [query file32]
Implemented MUMmer v3 options:
-mum           compute maximal matches that are unique in both sequences
-mumreference  compute maximal matches that are unique in the reference-
               sequence but not necessarily in the query-sequence (default)
-mumcand       same as -mumreference
-maxmatch      compute all maximal matches regardless of their uniqueness
-l             set the minimum length of a match
               if not set, the default value is 20
-b             compute forward and reverse complement matches
-F             force 4 column output format regardless of the number of
               reference sequence inputs
-n             match only the characters a, c, g, or t
-L             print length of query sequence in header of matches
-r             compute only reverse complement matches
-s             print first 53 characters of the matching substring
-c             Report the query position of a reverse complement match relative to the forward strand of the querysequence

Additional options:
-k             sampled suffix positions (one by default)
-threads       number of threads to use for -maxmatch, only valid k > 1
-qthreads      number of threads to use for queries
-suflink       use suffix links (1=yes or 0=no) in the index and during search [auto]
-child         use child table (1=yes or 0=no) in the index and during search [auto]
-skip          sparsify the MEM-finding algorithm even more, performing jumps of skip*k [auto (l-10)/k]
               this is a performance parameter that trade-offs SA traversal with checking of right-maximal MEMs
-kmer          use kmer table containing sa-intervals (speeds up searching first k characters) in the index and during search [int value, auto]
-save (string) save index to file to use again later (string)
-load (string) load index from file

Example usage:

./mummer -maxmatch -l 20 -b -n -k 3 -threads 3 ref.fa query.fa
Find all maximal matches on forward and reverse strands
of length 20 or greater, matching only a, c, t, or g.
Index every 3rd position in the ref.fa and use 3 threads to find MEMs.
Fastest method for one long query sequence.

./mummer -maxmatch -l 20 -b -n -k 3 -qthreads 3 ref.fa query.fa
Same as above, but now use a single thread for every query sequence in
query.fa. Fastest for many small query sequences.

software ref: https://github.com/mummer4/mummer
research ref: https://doi.org/10.1371/journal.pcbi.1005944