# Kraken2 2.1.2


## Kraken2 taxonomic sequence classification system

Kraken is a taxonomic sequence classifier that assigns taxonomic labels to DNA
sequences. Kraken examines the k-mers within a query sequence and uses the
information within those k-mers to query a database. That database maps k-mers
to the lowest common ancestor (LCA) of all genomes known to contain a given
k-mer.

The first version of Kraken used a large indexed and sorted list of k-mer/LCA
pairs as its database. While fast, the large memory requirements posed some
problems for users, and so Kraken 2 was created to provide a solution to those
problems.

Kraken 2 differs from Kraken 1 in several important ways:

1. Only minimizers of the k-mers in the query sequences are used as database
   queries. Similarly, only minimizers of the k-mers in the reference sequences
   in the database's genomic library are stored in the database. We will also
   refer to the minimizers as ℓ-mers, where ℓ ≤ k. All k-mers are considered to
   have the same LCA as their minimizer's database LCA value.
2. Kraken 2 uses a compact hash table that is a probabilistic data structure.
   This means that occasionally, database queries will fail by either returning
   the wrong LCA, or by not resulting in a search failure when a queried
   minimizer was never actually stored in the database. By incurring the risk of
   these false positives in the data structure, Kraken 2 is able to achieve
   faster speeds and lower memory requirements. Users should be aware that
   database false positive errors occur in less than 1% of queries, and can be
   compensated for by use of confidence scoring thresholds.
3. Kraken 2 has the ability to build a database from amino acid sequences and
   perform a translated search of the query sequences against that database.
4. Kraken 2 utilizes spaced seeds in the storage and querying of minimizers to
   improve classification accuracy.
5. Kraken 2 provides support for "special" databases that are not based on
   NCBI's taxonomy. These are currently limited to three popular 16S databases.


Location and version:

```console
$ which kraken2
/local/cluster/bin/kraken2
$ kraken2 --version
Kraken version 2.1.2
Copyright 2013-2021, Derrick Wood (dwood@cs.jhu.edu)
```

help message:

```console
$ kraken2 --help
Usage: kraken2 [options] <filename(s)>

Options:
  --db NAME               Name for Kraken 2 DB
                          (default: none)
  --threads NUM           Number of threads (default: 1)
  --quick                 Quick operation (use first hit or hits)
  --unclassified-out FILENAME
                          Print unclassified sequences to filename
  --classified-out FILENAME
                          Print classified sequences to filename
  --output FILENAME       Print output to filename (default: stdout); "-" will
                          suppress normal output
  --confidence FLOAT      Confidence score threshold (default: 0.0); must be
                          in [0, 1].
  --minimum-base-quality NUM
                          Minimum base quality used in classification (def: 0,
                          only effective with FASTQ input).
  --report FILENAME       Print a report with aggregrate counts/clade to file
  --use-mpa-style         With --report, format report output like Kraken 1's
                          kraken-mpa-report
  --report-zero-counts    With --report, report counts for ALL taxa, even if
                          counts are zero
  --report-minimizer-data With --report, report minimizer and distinct minimizer
                          count information in addition to normal Kraken report
  --memory-mapping        Avoids loading database into RAM
  --paired                The filenames provided have paired-end reads
  --use-names             Print scientific names instead of just taxids
  --gzip-compressed       Input files are compressed with gzip
  --bzip2-compressed      Input files are compressed with bzip2
  --minimum-hit-groups NUM
                          Minimum number of hit groups (overlapping k-mers
                          sharing the same minimizer) needed to make a call
                          (default: 2)
  --help                  Print this message
  --version               Print version information

If none of the *-compressed flags are specified, and the filename provided
is a regular file, automatic format detection is attempted.

```

Databases will be maintained in: `/nfs1/CGRB/databases/kraken2/current/`

[Link to manual](../docs/kraken2/MANUAL.html)

software ref: <https://github.com/DerrickWood/kraken2>  
software ref: <https://github.com/DerrickWood/kraken2/wiki>  
research ref: <https://doi.org/10.1186/s13059-019-1891-0>