# minimap2 2.23 ## Minimap2 Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. Typical use cases include: (1) mapping PacBio or Oxford Nanopore genomic reads to the human genome; (2) finding overlaps between long reads with error rate up to ~15%; (3) splice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads against a reference genome; (4) aligning Illumina single- or paired-end reads; (5) assembly-to-assembly alignment; (6) full-genome alignment between two closely related species with divergence below ~15%. For ~10kb noisy reads sequences, minimap2 is tens of times faster than mainstream long-read mappers such as BLASR, BWA-MEM, NGMLR and GMAP. It is more accurate on simulated long reads and produces biologically meaningful alignment ready for downstream analyses. For >100bp Illumina short reads, minimap2 is three times as fast as BWA-MEM and Bowtie2, and as accurate on simulated data. Detailed evaluations are available from the [minimap2 paper](https://doi.org/10.1093/bioinformatics/bty191) or the [preprint](https://arxiv.org/abs/1708.01492). Location and version: ```console $ which minimap2 /local/cluster/bin/minimap2 $ minimap2 --version 2.23-r1111 ``` help message: ```console $ minimap2 Usage: minimap2 [options] | [query.fa] [...] Options: Indexing: -H use homopolymer-compressed k-mer (preferrable for PacBio) -k INT k-mer size (no larger than 28) [15] -w INT minimizer window size [10] -I NUM split index for every ~NUM input bases [4G] -d FILE dump index to FILE [] Mapping: -f FLOAT filter out top FLOAT fraction of repetitive minimizers [0.0002] -g NUM stop chain enlongation if there are no minimizers in INT-bp [5000] -G NUM max intron length (effective with -xsplice; changing -r) [200k] -F NUM max fragment length (effective with -xsr or in the fragment mode) [800] -r NUM[,NUM] chaining/alignment bandwidth and long-join bandwidth [500,20000] -n INT minimal number of minimizers on a chain [3] -m INT minimal chaining score (matching bases minus log gap penalty) [40] -X skip self and dual mappings (for the all-vs-all mode) -p FLOAT min secondary-to-primary score ratio [0.8] -N INT retain at most INT secondary alignments [5] Alignment: -A INT matching score [2] -B INT mismatch penalty (larger value for lower divergence) [4] -O INT[,INT] gap open penalty [4,24] -E INT[,INT] gap extension penalty; a k-long gap costs min{O1+k*E1,O2+k*E2} [2,1] -z INT[,INT] Z-drop score and inversion Z-drop score [400,200] -s INT minimal peak DP alignment score [80] -u CHAR how to find GT-AG. f:transcript strand, b:both strands, n:don't match GT-AG [n] Input/Output: -a output in the SAM format (PAF by default) -o FILE output alignments to FILE [stdout] -L write CIGAR with >65535 ops at the CG tag -R STR SAM read group line in a format like '@RG\tID:foo\tSM:bar' [] -c output CIGAR in PAF --cs[=STR] output the cs tag; STR is 'short' (if absent) or 'long' [none] --MD output the MD tag --eqx write =/X CIGAR operators -Y use soft clipping for supplementary alignments -t INT number of threads [3] -K NUM minibatch size for mapping [500M] --version show version number Preset: -x STR preset (always applied before other options; see minimap2.1 for details) [] - map-pb/map-ont - PacBio CLR/Nanopore vs reference mapping - map-hifi - PacBio HiFi reads vs reference mapping - ava-pb/ava-ont - PacBio/Nanopore read overlap - asm5/asm10/asm20 - asm-to-ref mapping, for ~0.1/1/5% sequence divergence - splice/splice:hq - long-read/Pacbio-CCS spliced alignment - sr - genomic short-read mapping See `man ./minimap2.1' for detailed description of these and other advanced command-line options. ``` software ref: research ref: