Bakta: Rapid & standardized annotation of bacterial genomes & plasmids
Bakta is a tool for the rapid & standardized annotation of bacterial genomes
& plasmids. It provides dbxref-rich and sORF-including annotations in
machine-readable JSON & bioinformatics standard file formats for automatic
downstream analysis.
Activate your environment; qrsh
to an available node:
1
2
|
bash
source /local/cluster/bakta/activate.sh
|
Location (of binary and database) and version:
1
2
3
4
5
6
|
$ which bakta
/local/cluster/bakta/bin/bakta
$ bakta --version
bakta 1.0
$ echo $BAKTA_DB
/nfs1/CGRB/databases/bakta/latest
|
help message:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
|
$ bakta --help
usage: bakta [--db DB] [--min-contig-length MIN_CONTIG_LENGTH] [--prefix PREFIX] [--output OUTPUT] [--genus GENUS] [--species SPECIES] [--strain STRAIN] [--plasmid PLASMID] [--complete] [--prodigal-tf PRODIGAL_TF]
[--translation-table {11,4}] [--gram {+,-,?}] [--locus LOCUS] [--locus-tag LOCUS_TAG] [--keep-contig-headers] [--replicons REPLICONS] [--skip-trna] [--skip-tmrna] [--skip-rrna] [--skip-ncrna]
[--skip-ncrna-region] [--skip-crispr] [--skip-cds] [--skip-sorf] [--skip-gap] [--skip-ori] [--help] [--verbose] [--threads THREADS] [--tmp-dir TMP_DIR] [--version]
<genome>
Rapid & standardized annotation of bacterial genomes & plasmids.
positional arguments:
<genome> Genome sequences in (zipped) fasta format
Input / Output:
--db DB, -d DB Database path (default = <bakta_path>/db). Can also be provided as BAKTA_DB environment variable.
--min-contig-length MIN_CONTIG_LENGTH, -m MIN_CONTIG_LENGTH
Minimum contig size (default = 1)
--prefix PREFIX, -p PREFIX
Prefix for output files
--output OUTPUT, -o OUTPUT
Output directory (default = current working directory)
Organism:
--genus GENUS Genus name
--species SPECIES Species name
--strain STRAIN Strain name
--plasmid PLASMID Plasmid name
Annotation:
--complete All sequences are complete replicons (chromosome/plasmid[s])
--prodigal-tf PRODIGAL_TF
Path to existing Prodigal training file to use for CDS prediction
--translation-table {11,4}
Translation table: 11/4 (default = 11)
--gram {+,-,?} Gram type: +/-/? (default = '?')
--locus LOCUS Locus prefix (default = 'contig')
--locus-tag LOCUS_TAG
Locus tag prefix (default = autogenerated)
--keep-contig-headers
Keep original contig headers
--replicons REPLICONS, -r REPLICONS
Replicon information table (tsv/csv)
Workflow:
--skip-trna Skip tRNA detection & annotation
--skip-tmrna Skip tmRNA detection & annotation
--skip-rrna Skip rRNA detection & annotation
--skip-ncrna Skip ncRNA detection & annotation
--skip-ncrna-region Skip ncRNA region detection & annotation
--skip-crispr Skip CRISPR array detection & annotation
--skip-cds Skip CDS detection & annotation
--skip-sorf Skip sORF detection & annotation
--skip-gap Skip gap detection & annotation
--skip-ori Skip oriC/oriT detection & annotation
General:
--help, -h Show this help message and exit
--verbose, -v Print verbose information
--threads THREADS, -t THREADS
Number of threads to use (default = number of available CPUs)
--tmp-dir TMP_DIR Location for temporary files (default = system dependent auto detection)
--version show program's version number and exit
Citation:
Schwengers O., Goesmann A. (2021)
Bakta: comprehensive and rapid annotation of bacterial genomes.
GitHub https://github.com/oschwengers/bakta
|
FAQ
- AMRFinder fails If AMRFinder constantly crashes even on fresh setups and
Bakta’s database was downloaded manually, then AMRFinder needs to setup its
own internal database. This is required only once:
amrfinder_update --force_update --database <bakta-db>/amrfinderplus-db
. You could also try
Bakta’s internal database download logic automatically taking care of this:
bakta_db download --output <bakta-db>
- Nice, but I’m mising XYZ… Bakta is quite new and we’re keen to constantly
improve it and further expand its feature set. In case there’s anything
missing, please do not hesitate to open an issue and ask for it!
- Bakta is running too long without CPU load… why? Bakta takes advantage
of an SQLite DB which results in high storage IO loads. If this DB is stored
on a remote / network volume, the lookup of IPS/PSC annotations might take a
long time. In these cases, please, consider moving the DB to a local volume
or hard drive.
software ref: https://github.com/oschwengers/bakta
research ref: https://github.com/oschwengers/bakta#citation