KAT 2.4.2

KAT - The K-mer Analysis Toolkit

KAT is a suite of tools that analyse jellyfish hashes or sequence files (fasta or fastq) using kmer counts. The following tools are currently available in KAT:

  • hist: Create an histogram of k-mer occurrences from a sequence file. Adds metadata in output for easy plotting.
  • gcp: K-mer GC Processor. Creates a matrix of the number of K-mers found given a GC count and a K-mer count.
  • comp: K-mer comparison tool. Creates a matrix of shared K-mers between two (or three) sequence files or hashes.
  • sect: SEquence Coverage estimator Tool. Estimates the coverage of each sequence in a file using K-mers from another sequence file.
  • blob: Given, reads and an assembly, calculates both the read and assembly K-mer coverage along with GC% for each sequence in the assembly.SEquence Coverage estimator Tool.
  • filter: Filtering tools. Contains tools for filtering k-mer hashes and FastQ/A files:
    • kmer: Produces a k-mer hash containing only k-mers within specified coverage and GC tolerances.
    • seq: Filters a sequence file based on whether or not the sequences contain k-mers within a provided hash.
  • plot: Plotting tools. Contains several plotting tools to visualise K-mer and compare distributions. The following plot tools are available:
    • density: Creates a density plot from a matrix created with the “comp” tool. Typically this is used to compare two K-mer hashes produced by different NGS reads.
    • profile: Creates a K-mer coverage plot for a single sequence. Takes in fasta coverage output coverage from the “sect” tool
    • spectra-cn: Creates a stacked histogram using a matrix created with the “comp” tool. Typically this is used to compare a jellyfish hash produced from a read set to a jellyfish hash produced from an assembly. The plot shows the amount of distinct K-mers absent, as well as the copy number variation present within the assembly.
    • spectra-hist: Creates a K-mer spectra plot for a set of K-mer histograms produced either by jellyfish-histo or kat-histo.
    • spectra-mx: Creates a K-mer spectra plot for a set of K-mer histograms that are derived from selected rows or columns in a matrix produced by the “comp”.

In addition, KAT contains a python script for analysing the mathematical distributions present in the K-mer spectra in order to determine how much content is present in each peak.

This README only contains some brief details of how to install and use KAT. For more extensive documentation please visit: https://kat.readthedocs.org/en/latest/

To activate:

1
2
bash
source /local/cluster/kat/activate.sh

Location and version:

1
2
3
4
5
6
$ which kat
/local/cluster/kat/bin/kat
(/local/cluster/kat)
$ kat --version
kat 2.4.2
(/local/cluster/kat)

help message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
$ kat --help
The K-mer Analysis Toolkit (KAT) contains a number of tools that analyse jellyfish K-mer hashes.

The First argument should be the tool/mode you wish to use:

   * hist:   Create an histogram of k-mer occurrences from a sequence file.  Similar to
             jellyfish histogram sub command but adds metadata in output for easy plotting,
             also actually runs multi-threaded.
   * gcp:    K-mer GC Processor.  Creates a matrix of the number of K-mers found given a GC
             count and a K-mer count.
   * comp:   K-mer comparison tool.  Creates a matrix of shared K-mers between two (or three)
             sequence files.
   * sect:   SEquence Coverage estimator Tool.  Estimates the coverage of each sequence in
             a file using K-mers from another sequence file.
   * cold:   Given, reads and an assembly, calculates both the read and assembly K-mer
             coverage along with GC% for each sequence in the assembly.
             a file using K-mers from another sequence file.
   * filter: Filtering tools.  Contains tools for filtering k-mers and sequences based on
             user-defined GC and coverage limits.
   * plot:   Plotting tools.  Contains several plotting tools to visualise K-mer and compare
             distributions.

Options:
  -v [ --verbose ]      Print extra information
  --version             Print version string
  --help                Produce help message

software ref: https://github.com/TGAC/KAT
research ref: https://doi.org/10.1093/bioinformatics/btw663