Kraken2 2.1.2
Kraken2 taxonomic sequence classification system
Kraken is a taxonomic sequence classifier that assigns taxonomic labels to DNA sequences. Kraken examines the k-mers within a query sequence and uses the information within those k-mers to query a database. That database maps k-mers to the lowest common ancestor (LCA) of all genomes known to contain a given k-mer.
The first version of Kraken used a large indexed and sorted list of k-mer/LCA pairs as its database. While fast, the large memory requirements posed some problems for users, and so Kraken 2 was created to provide a solution to those problems.
Kraken 2 differs from Kraken 1 in several important ways:
- Only minimizers of the k-mers in the query sequences are used as database queries. Similarly, only minimizers of the k-mers in the reference sequences in the database’s genomic library are stored in the database. We will also refer to the minimizers as ℓ-mers, where ℓ ≤ k. All k-mers are considered to have the same LCA as their minimizer’s database LCA value.
- Kraken 2 uses a compact hash table that is a probabilistic data structure. This means that occasionally, database queries will fail by either returning the wrong LCA, or by not resulting in a search failure when a queried minimizer was never actually stored in the database. By incurring the risk of these false positives in the data structure, Kraken 2 is able to achieve faster speeds and lower memory requirements. Users should be aware that database false positive errors occur in less than 1% of queries, and can be compensated for by use of confidence scoring thresholds.
- Kraken 2 has the ability to build a database from amino acid sequences and perform a translated search of the query sequences against that database.
- Kraken 2 utilizes spaced seeds in the storage and querying of minimizers to improve classification accuracy.
- Kraken 2 provides support for “special” databases that are not based on NCBI’s taxonomy. These are currently limited to three popular 16S databases.
Location and version:
|
|
help message:
|
|
Databases will be maintained in: /nfs1/CGRB/databases/kraken2/current/
software ref: https://github.com/DerrickWood/kraken2
software ref: https://github.com/DerrickWood/kraken2/wiki
research ref: https://doi.org/10.1186/s13059-019-1891-0