Bactopia 2.1.0

../images/bactopia/bactopia-logo.png

Bactopia

Bactopia is a flexible pipeline for complete analysis of bacterial genomes. The goal of Bactopia is to process your data with a broad set of tools, so that you can get to the fun part of analyses quicker!

Bactopia can be split into three main parts: Bactopia Datasets, Bactopia Analysis Pipeline, and Bactopia Tools.

../images/bactopia/bactopia-overview.png

Bactopia Datasets provide a framework for including many existing public datasets, as well as private datasets, into your analysis The process of downloading, building, and (or) configuring these datasets for Bactopia has been automated.

Bactopia Analysis Pipeline is the main per-isolate workflow in Bactopia. Built with Nextflow, input FASTQs (local or available from SRA/ENA) are put through numerous analyses including: quality control, assembly, annotation, reference mapping, variant calling, minmer sketch queries, blast alignments, insertion site prediction, sequence typing, and more. The Bactopia Analysis Pipeline automatically selects which analyses to include based on the available Bactopia Datasets.

../images/bactopia/bactopia-workflow.png

Bactopia Tools are a set a independent workflows for comparative analyses. The comparative analyses may include summary reports, pan-genome, or phylogenetic tree construction. Using the predictable output structure of Bactopia you can pick and choose which samples to include for processing with a Bactopia Tool.

Bactopia was inspired by Staphopia, a workflow we (Tim Read and myself) released that targets Staphylococcus aureus genomes. Using what we learned from Staphopia and user feedback, Bactopia was developed from scratch with usability, portability, and speed in mind from the start.

Documentation

Documentation for Bactopia is available at https://bactopia.github.io/. The documentation includes a tutorial replicating Staphopia and a complete overview of Bactopia. I highly encourage you check it out!

Activating the conda env

1
2
bash
source /local/cluster/bactopia/activate.sh

To use in SGE, add the source line above to your shell script before the bactopia commands.

Accessing base databases at the CQLS

The latest dbs can be found at:

1
2
$ echo $BACTOPIADB
/nfs1/CGRB/databases/bactopia/latest

To use these, specify bactopia --datasets $BACTOPIADB ....

Alternatively, generate your own for your own projects.

Quick Start

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
bactopia datasets

# Paired-end
bactopia --R1 R1.fastq.gz --R2 R2.fastq.gz --sample SAMPLE_NAME \
         --datasets datasets/ --outdir OUTDIR

# Single-End
bactopia --SE SAMPLE.fastq.gz --sample SAMPLE --datasets datasets/ --outdir OUTDIR

# Multiple Samples
bactopia prepare MY-FASTQS/ > fastqs.txt
bactopia --fastqs fastqs.txt --datasets datasets --outdir OUTDIR

# Single ENA/SRA Experiment
bactopia --accession SRX000000 --datasets datasets --outdir OUTDIR

# Multiple ENA/SRA Experiments
bactopia search "staphylococcus aureus" > accessions.txt
bactopia --accessions accessions.txt --dataset datasets --outdir ${OUTDIR}

Location and version

1
2
3
4
$ which bactopia
/local/cluster/bactopia/bin/bactopia
$ bactopia --version
bactopia 2.1.0

help message

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
$ bactopia --help
N E X T F L O W  ~  version 22.04.0
Launching `/local/cluster/bactopia/share/bactopia-2.1.x/main.nf` [backstabbing_visvesvaraya] DSL2 - revision: bd75c553f9


---------------------------------------------
   _                _              _
  | |__   __ _  ___| |_ ___  _ __ (_) __ _
  | '_ \ / _` |/ __| __/ _ \| '_ \| |/ _` |
  | |_) | (_| | (__| || (_) | |_) | | (_| |
  |_.__/ \__,_|\___|\__\___/| .__/|_|\__,_|
                            |_|
  bactopia v2.1.0
  Bactopia is a flexible pipeline for complete analysis of bacterial genomes.
---------------------------------------------
Typical pipeline command:

  bactopia --fastqs samples.txt --datasets datasets/ --species 'Staphylococcus aureus' -profile singularity

Required Parameters
  ### For Procesessing Multiple Samples
  --samples                           [string]  A FOFN (via bactopia prepare) with sample names and paths to FASTQ/FASTAs to process

  ### For Processing A Single Sample
  --R1                                [string]  First set of compressed (gzip) paired-end FASTQ reads (requires --R2 and --sample)
  --R2                                [string]  Second set of compressed (gzip) paired-end FASTQ reads (requires --R1 and --sample)
  --SE                                [string]  Compressed (gzip) single-end FASTQ reads  (requires --sample)
  --ont                               [boolean] Treat `--SE` or `--accession` as long reads for analysis. (requires --sample if using --SE)
  --hybrid                            [boolean] Treat `--SE` as long reads for hybrid assembly.  (requires --R1, --R2, --SE and --sample)
  --short_polish                      [boolean] Treat `--SE` as long reads for long-read assembly and short read polishing.  (requires --R1, --R2, --SE and
                                                --sample)
  --sample                            [string]  Sample name to use for the input sequences

  ### For Downloading from SRA/ENA or NCBI Assembly
  **Note: Downloaded assemblies will have error free Illumina reads simulated for processing.**
  --accessions                        [string]  A file containing ENA/SRA Experiment accessions or NCBI Assembly accessions to processed
  --accession                         [string]  Sample name to use for the input sequences

  ### For Processing an Assembly
  **Note: Assemblies will have error free Illumina reads simulated for processing.**
  --assembly                          [string]  A assembled genome in compressed FASTA format. (requires --sample)
  --check_samples                     [boolean] Validate the input FOFN provided by --samples

Dataset Parameters
  --datasets                          [string]  The path to datasets that have already been set up
  --species                           [string]  Name of species for species-specific dataset to use
  --ask_merlin                        [boolean] Ask Merlin to execute species specific Bactopia tools based on Mash distances
  --coverage                          [integer] Reduce samples to a given coverage [default: 100]
  --genome_size                       [string]  Expected genome size (bp) for all samples, a value of '0' will disable read error correction and read
                                                subsampling, otherwise estimate with Mash [default: 0]

Annotate Genome Parameters
  --use_bakta                         [boolean] Use Bakta for genome annotation (requires --bakta_db)

Optional Parameters
  --outdir                            [string]  Base directory to write results to [default: ./]
  --run_name                          [string]  Name of the directory to hold results [default: bactopia]

Helpful Parameters
  --wf                                [string]  Specify which workflow or Bactopia Tool to execute [default: bactopia]
  --list_wfs                          [boolean] List the available workflows and Bactopia Tools to use with '--wf'
  --help_all                          [boolean] An alias for --help --show_hidden_params
  --version                           [boolean] Display version text.

!! Hiding 149 params, use --show_hidden_params (or --help_all) to show them !!
--------------------------------------------------------------------
If you use bactopia for your analysis please cite:

* Bactopia
  https://doi.org/10.1128/mSystems.00190-20

* The nf-core framework
  https://doi.org/10.1038/s41587-020-0439-x

* Software dependencies
  https://bactopia.github.io/acknowledgements/
--------------------------------------------------------------------

software ref: https://github.com/bactopia/bactopia
software ref: https://bactopia.github.io/
research ref: https://doi.org/10.1128/mSystems.00190-20