MEGAHIT
MEGAHIT is an ultra-fast and memory-efficient NGS assembler. It is optimized
for metagenomes, but also works well on generic single genome assembly (small
or mammalian size) and single-cell assembly.
Usage
Basic usage
1
2
3
4
|
megahit -1 pe_1.fq -2 pe_2.fq -o out # 1 paired-end library
megahit --12 interleaved.fq -o out # one paired & interleaved paired-end library
megahit -1 a1.fq,b1.fq,c1.fq -2 a2.fq,b2.fq,c2.fq -r se1.fq,se2.fq -o out # 3 paired-end libraries + 2 SE libraries
megahit_core contig2fastg 119 out/intermediate_contigs/k119.contig.fa > k119.fastg # get FASTG from the intermediate contigs of k=119
|
The contigs can be found final.contigs.fa
in the output directory.
Advanced usage
--kmin-1pass
: if sequencing depth is low and too much memory used when
build the graph of k_min
--presets meta-large
: if the metagenome is complex (i.e., bio-diversity is
high, for example soil metagenomes)
--cleaning-rounds 1 --disconnect-ratio 0
: get less pruned assembly
(usually shorter contigs)
--continue -o out
: resume an interrupted job from out
To see the full manual, run megahit
without parameters or with -h
.
Also, our wiki may be helpful.
Location and version:
1
2
3
4
|
$ which megahit
/local/cluster/bin/megahit
$ megahit --version
MEGAHIT v1.2.9
|
help message:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
|
$ megahit --help
MEGAHIT v1.2.9
contact: Dinghua Li <voutcn@gmail.com>
Usage:
megahit [options] {-1 <pe1> -2 <pe2> | --12 <pe12> | -r <se>} [-o <out_dir>]
Input options that can be specified for multiple times (supporting plain text and gz/bz2 extensions)
-1 <pe1> comma-separated list of fasta/q paired-end #1 files, paired with filesin <pe2>
-2 <pe2> comma-separated list of fasta/q paired-end #2 files, paired with filesin <pe1>
--12 <pe12> comma-separated list of interleaved fasta/q paired-end files
-r/--read <se> comma-separated list of fasta/q single-end files
Optional Arguments:
Basic assembly options:
--min-count <int> minimum multiplicity for filtering (k_min+1)-mers [2]
--k-list <int,int,..> comma-separated list of kmer size
all must be odd, in the range 15-255, increment <= 28)
[21,29,39,59,79,99,119,141]
Another way to set --k-list (overrides --k-list if one of them set):
--k-min <int> minimum kmer size (<= 255), must be odd number [21]
--k-max <int> maximum kmer size (<= 255), must be odd number [141]
--k-step <int> increment of kmer size of each iteration (<= 28), must be even number [12]
Advanced assembly options:
--no-mercy do not add mercy kmers
--bubble-level <int> intensity of bubble merging (0-2), 0 to disable [2]
--merge-level <l,s> merge complex bubbles of length <= l*kmer_size and similarity >= s [20,0.95]
--prune-level <int> strength of low depth pruning (0-3) [2]
--prune-depth <int> remove unitigs with avg kmer depth less than this value [2]
--disconnect-ratio <float> disconnect unitigs if its depth is less than this ratio times
the total depth of itself and its siblings [0.1]
--low-local-ratio <float> remove unitigs if its depth is less than this ratio times
the average depth of the neighborhoods [0.2]
--max-tip-len <int> remove tips less than this value [2*k]
--cleaning-rounds <int> number of rounds for graph cleanning [5]
--no-local disable local assembly
--kmin-1pass use 1pass mode to build SdBG of k_min
Presets parameters:
--presets <str> override a group of parameters; possible values:
meta-sensitive: '--min-count 1 --k-list 21,29,39,49,...,129,141'
meta-large: '--k-min 27 --k-max 127 --k-step 10'
(large & complex metagenomes, like soil)
Hardware options:
-m/--memory <float> max memory in byte to be used in SdBG construction
(if set between 0-1, fraction of the machine's total memory) [0.9]
--mem-flag <int> SdBG builder memory mode. 0: minimum; 1: moderate;
others: use all memory specified by '-m/--memory' [1]
-t/--num-cpu-threads <int> number of CPU threads [# of logical processors]
--no-hw-accel run MEGAHIT without BMI2 and POPCNT hardware instructions
Output options:
-o/--out-dir <string> output directory [./megahit_out]
--out-prefix <string> output prefix (the contig file will be OUT_DIR/OUT_PREFIX.contigs.fa)
--min-contig-len <int> minimum length of contigs to output [200]
--keep-tmp-files keep all temporary files
--tmp-dir <string> set temp directory
Other Arguments:
--continue continue a MEGAHIT run from its last available check point.
please set the output directory correctly when using this option.
--test run MEGAHIT on a toy test dataset
-h/--help print the usage message
-v/--version print version
|
software ref: https://github.com/voutcn/megahit
research ref: https://doi.org/10.1093/bioinformatics/btv033