wfmash 0.6

2021-08-17 345 words 2 minutes

wfmash - A DNA sequence read mapper based on mash distances and the wavefront alignment algorithm

wfmash is a fork of MashMap that implements base-level alignment using WFA, via the wflign tiled wavefront global alignment algorithm. It completes MashMap with a high-performance alignment module capable of computing base-level alignments for very large sequences.

Location and version:

1
2


$ which wfmash
/local/cluster/bin/wfmash

help message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84


$ wfmash -h
  wfmash [target] [queries...] {OPTIONS}

    wfmash: base-accurate alignments using mashmap2 and the wavefront algorithm

  OPTIONS:

      -h, --help                        display this help menu
      -t[N], --threads=[N]              use this many threads during parallel
                                        steps
      target                            alignment target or reference sequence
                                        file
      -L[targets],
      --target-file-list=[targets]      alignment target file list
      queries...                        query sequences
      -Q[queries],
      --query-file-list=[queries]       alignment query file list
      -s[N], --segment-length=[N]       segment length for mapping (1k = 1000,
                                        1m = 10^6, 1g = 10^9) [default: 5000]
      -l[N], --block-length-min=[N]     keep mappings with at least this block
                                        length (1k = 1000, 1m = 10^6, 1g = 10^9)
                                        [default: 3*segment-length]
      -k[N], --kmer=[N]                 kmer size <= 16 [default: 16]
      -N, --no-split                    disable splitting of input sequences
                                        during mapping [enabled by default]
      -p[%], --map-pct-id=[%]           use this percent identity in the mashmap
                                        step [default: 95]
      -K, --keep-low-map-id             keep mappings with estimated identity
                                        below --map-pct-id=%
      -O, --keep-low-align-id           keep alignments with gap-compressed
                                        identity below --map-pct-id=%
      -f, --no-filter                   disable mapping filtering
      -n[N],
      --num-mappings-for-segment=[N]    number of mappings to retain for each
                                        segment [default: 1]
      -S[N],
      --num-mappings-for-short-seq=[N]  number of mappings to retain for each
                                        sequence shorter than segment length
                                        [default: 1]
      -X, --skip-self                   skip self mappings when the query and
                                        target name is the same (for all-vs-all
                                        mode)
      -Y[C], --skip-prefix=[C]          skip mappings when the query and target
                                        have the same prefix before the given
                                        character C
      -m, --approx-map                  skip base-level alignment, producing an
                                        approximate mapping in PAF
      -M, --no-merge                    don't merge consecutive segment-level
                                        mappings
      -w[N], --window-size=[N]          window size for sketching. If 0, it
                                        computes the best window size applying 0
                                        as p-value cutoff [default:
                                        automatically computed applying 1e-120
                                        as p-value cutoff]
      -e[spaced-seed],
      --spaced-seed=[spaced-seed]       Params to generate spaced seeds
                                        <weight_of_seed> <number_of_seeds>
                                        <similarity> <region_length> e.g "10 5
                                        0.75 20"
      -i[FILE], --input-paf=[FILE]      derive precise alignments for this input
                                        PAF
      -W[N], --wflamda-segment=[N]      wflambda segment length: size (in bp) of
                                        segment mapped in hierarchical WFA
                                        problem [default: 256]
      -A[N], --wflamda-min=[N]          minimum wavefront length (width) to
                                        trigger reduction [default: 100]
      -D[N], --wflambda-diff=[N]        maximum distance that a wavefront may be
                                        behind the best wavefront [default:
                                        100000]
      -C[N], --max-patch-major=[N]      maximum length to patch in the major
                                        axis [default: 512*segment-length]
      -F[N], --max-patch-minor=[N]      maximum length to patch in the minor
                                        axis [default: 128*segment-length]
      -E[N], --erode-math-mismatch=[N]  maximum length of match/mismatch islands
                                        to erode before patching [default: 13]
      -d, --md-tag                      output the MD tag
      -a, --sam-format                  output in the SAM format (PAF by
                                        default)
      -B[PATH], --tmp-base=[PATH]       base name for temporary files [default:
                                        `pwd`]
      -T, --keep-temp                   keep intermediate files generated during
                                        mapping and alignment
      "--" can be used to terminate flag options and force all following
      arguments to be treated as positional options

software ref: https://github.com/ekg/wfmash
research ref: https://doi.org/10.1093/bioinformatics/btaa777