# minimap2 2.26 {{< admonition success "Installed" true >}} This software should be available with no extra configuration. {{< /admonition >}} ## minimap2-2.26 Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database. Typical use cases include: (1) mapping PacBio or Oxford Nanopore genomic reads to the human genome; (2) finding overlaps between long reads with error rate up to ~15%; (3) splice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads against a reference genome; (4) aligning Illumina single- or paired-end reads; (5) assembly-to-assembly alignment; (6) full-genome alignment between two closely related species with divergence below ~15%. For ~10kb noisy reads sequences, minimap2 is tens of times faster than mainstream long-read mappers such as BLASR, BWA-MEM, NGMLR and GMAP. It is more accurate on simulated long reads and produces biologically meaningful alignment ready for downstream analyses. For >100bp Illumina short reads, minimap2 is three times as fast as BWA-MEM and Bowtie2, and as accurate on simulated data. Detailed evaluations are available from the [minimap2 paper](https://doi.org/10.1093/bioinformatics/bty191) or the [preprint](https://arxiv.org/abs/1708.01492). ### Getting started ```console # long sequences against a reference genome ./minimap2 -a test/MT-human.fa test/MT-orang.fa > test.sam # create an index first and then map ./minimap2 -x map-ont -d MT-human-ont.mmi test/MT-human.fa ./minimap2 -a MT-human-ont.mmi test/MT-orang.fa > test.sam # use presets (no test data) ./minimap2 -ax map-pb ref.fa pacbio.fq.gz > aln.sam # PacBio CLR genomic reads ./minimap2 -ax map-ont ref.fa ont.fq.gz > aln.sam # Oxford Nanopore genomic reads ./minimap2 -ax map-hifi ref.fa pacbio-ccs.fq.gz > aln.sam # PacBio HiFi/CCS genomic reads (v2.19 or later) ./minimap2 -ax asm20 ref.fa pacbio-ccs.fq.gz > aln.sam # PacBio HiFi/CCS genomic reads (v2.18 or earlier) ./minimap2 -ax sr ref.fa read1.fa read2.fa > aln.sam # short genomic paired-end reads ./minimap2 -ax splice ref.fa rna-reads.fa > aln.sam # spliced long reads (strand unknown) ./minimap2 -ax splice -uf -k14 ref.fa reads.fa > aln.sam # noisy Nanopore Direct RNA-seq ./minimap2 -ax splice:hq -uf ref.fa query.fa > aln.sam # Final PacBio Iso-seq or traditional cDNA ./minimap2 -ax splice --junc-bed anno.bed12 ref.fa query.fa > aln.sam # prioritize on annotated junctions ./minimap2 -cx asm5 asm1.fa asm2.fa > aln.paf # intra-species asm-to-asm alignment ./minimap2 -x ava-pb reads.fa reads.fa > overlaps.paf # PacBio read overlap ./minimap2 -x ava-ont reads.fa reads.fa > overlaps.paf # Nanopore read overlap ``` ------------------------------------------------------------------------------- ## Location and version ```console $ which minimap2 /local/cluster/bin/minimap2 $ minimap2 --version 2.26-r1175 ``` ## help message ```console $ minimap2 --help Usage: minimap2 [options] | [query.fa] [...] Options: Indexing: -H use homopolymer-compressed k-mer (preferrable for PacBio) -k INT k-mer size (no larger than 28) [15] -w INT minimizer window size [10] -I NUM split index for every ~NUM input bases [8G] -d FILE dump index to FILE [] Mapping: -f FLOAT filter out top FLOAT fraction of repetitive minimizers [0.0002] -g NUM stop chain enlongation if there are no minimizers in INT-bp [5000] -G NUM max intron length (effective with -xsplice; changing -r) [200k] -F NUM max fragment length (effective with -xsr or in the fragment mode) [800] -r NUM[,NUM] chaining/alignment bandwidth and long-join bandwidth [500,20000] -n INT minimal number of minimizers on a chain [3] -m INT minimal chaining score (matching bases minus log gap penalty) [40] -X skip self and dual mappings (for the all-vs-all mode) -p FLOAT min secondary-to-primary score ratio [0.8] -N INT retain at most INT secondary alignments [5] Alignment: -A INT matching score [2] -B INT mismatch penalty (larger value for lower divergence) [4] -O INT[,INT] gap open penalty [4,24] -E INT[,INT] gap extension penalty; a k-long gap costs min{O1+k*E1,O2+k*E2} [2,1] -z INT[,INT] Z-drop score and inversion Z-drop score [400,200] -s INT minimal peak DP alignment score [80] -u CHAR how to find GT-AG. f:transcript strand, b:both strands, n:don't match GT-AG [n] -J INT splice mode. 0: original minimap2 model; 1: miniprot model [1] Input/Output: -a output in the SAM format (PAF by default) -o FILE output alignments to FILE [stdout] -L write CIGAR with >65535 ops at the CG tag -R STR SAM read group line in a format like '@RG\tID:foo\tSM:bar' [] -c output CIGAR in PAF --cs[=STR] output the cs tag; STR is 'short' (if absent) or 'long' [none] --MD output the MD tag --eqx write =/X CIGAR operators -Y use soft clipping for supplementary alignments -t INT number of threads [3] -K NUM minibatch size for mapping [500M] --version show version number Preset: -x STR preset (always applied before other options; see minimap2.1 for details) [] - map-pb/map-ont - PacBio CLR/Nanopore vs reference mapping - map-hifi - PacBio HiFi reads vs reference mapping - ava-pb/ava-ont - PacBio/Nanopore read overlap - asm5/asm10/asm20 - asm-to-ref mapping, for ~0.1/1/5% sequence divergence - splice/splice:hq - long-read/Pacbio-CCS spliced alignment - sr - genomic short-read mapping See `man ./minimap2.1' for detailed description of these and other advanced command-line options. ``` software ref: research ref: