# Bakta 1.0 ## Bakta: Rapid & standardized annotation of bacterial genomes & plasmids Bakta is a tool for the rapid & standardized annotation of bacterial genomes & plasmids. It provides dbxref-rich and sORF-including annotations in machine-readable JSON & bioinformatics standard file formats for automatic downstream analysis. Activate your environment; `qrsh` to an available node: ```console bash source /local/cluster/bakta/activate.sh ``` Location (of binary and database) and version: ```console $ which bakta /local/cluster/bakta/bin/bakta $ bakta --version bakta 1.0 $ echo $BAKTA_DB /nfs1/CGRB/databases/bakta/latest ``` help message: ```console $ bakta --help usage: bakta [--db DB] [--min-contig-length MIN_CONTIG_LENGTH] [--prefix PREFIX] [--output OUTPUT] [--genus GENUS] [--species SPECIES] [--strain STRAIN] [--plasmid PLASMID] [--complete] [--prodigal-tf PRODIGAL_TF] [--translation-table {11,4}] [--gram {+,-,?}] [--locus LOCUS] [--locus-tag LOCUS_TAG] [--keep-contig-headers] [--replicons REPLICONS] [--skip-trna] [--skip-tmrna] [--skip-rrna] [--skip-ncrna] [--skip-ncrna-region] [--skip-crispr] [--skip-cds] [--skip-sorf] [--skip-gap] [--skip-ori] [--help] [--verbose] [--threads THREADS] [--tmp-dir TMP_DIR] [--version] Rapid & standardized annotation of bacterial genomes & plasmids. positional arguments: Genome sequences in (zipped) fasta format Input / Output: --db DB, -d DB Database path (default = /db). Can also be provided as BAKTA_DB environment variable. --min-contig-length MIN_CONTIG_LENGTH, -m MIN_CONTIG_LENGTH Minimum contig size (default = 1) --prefix PREFIX, -p PREFIX Prefix for output files --output OUTPUT, -o OUTPUT Output directory (default = current working directory) Organism: --genus GENUS Genus name --species SPECIES Species name --strain STRAIN Strain name --plasmid PLASMID Plasmid name Annotation: --complete All sequences are complete replicons (chromosome/plasmid[s]) --prodigal-tf PRODIGAL_TF Path to existing Prodigal training file to use for CDS prediction --translation-table {11,4} Translation table: 11/4 (default = 11) --gram {+,-,?} Gram type: +/-/? (default = '?') --locus LOCUS Locus prefix (default = 'contig') --locus-tag LOCUS_TAG Locus tag prefix (default = autogenerated) --keep-contig-headers Keep original contig headers --replicons REPLICONS, -r REPLICONS Replicon information table (tsv/csv) Workflow: --skip-trna Skip tRNA detection & annotation --skip-tmrna Skip tmRNA detection & annotation --skip-rrna Skip rRNA detection & annotation --skip-ncrna Skip ncRNA detection & annotation --skip-ncrna-region Skip ncRNA region detection & annotation --skip-crispr Skip CRISPR array detection & annotation --skip-cds Skip CDS detection & annotation --skip-sorf Skip sORF detection & annotation --skip-gap Skip gap detection & annotation --skip-ori Skip oriC/oriT detection & annotation General: --help, -h Show this help message and exit --verbose, -v Print verbose information --threads THREADS, -t THREADS Number of threads to use (default = number of available CPUs) --tmp-dir TMP_DIR Location for temporary files (default = system dependent auto detection) --version show program's version number and exit Citation: Schwengers O., Goesmann A. (2021) Bakta: comprehensive and rapid annotation of bacterial genomes. GitHub https://github.com/oschwengers/bakta ``` ### FAQ * __AMRFinder fails__ If AMRFinder constantly crashes even on fresh setups and Bakta's database was downloaded manually, then AMRFinder needs to setup its own internal database. This is required only once: `amrfinder_update --force_update --database /amrfinderplus-db`. You could also try Bakta's internal database download logic automatically taking care of this: `bakta_db download --output ` * __Nice, but I'm mising XYZ...__ Bakta is quite new and we're keen to constantly improve it and further expand its feature set. In case there's anything missing, please do not hesitate to open an issue and ask for it! * __Bakta is running too long without CPU load... why?__ Bakta takes advantage of an SQLite DB which results in high storage IO loads. If this DB is stored on a remote / network volume, the lookup of IPS/PSC annotations might take a long time. In these cases, please, consider moving the DB to a local volume or hard drive. software ref: research ref: