Bactopia
Bactopia is a flexible pipeline for complete analysis of bacterial genomes.
The goal of Bactopia is to process your data with a broad set of tools, so
that you can get to the fun part of analyses quicker!
Bactopia can be split into three main parts:
Bactopia Datasets,
Bactopia Analysis Pipeline,
and Bactopia Tools.
Bactopia Datasets provide a framework for including many existing public
datasets, as well as private datasets, into your analysis The process of
downloading, building, and (or) configuring these datasets for Bactopia has
been automated.
Bactopia Analysis Pipeline is the main per-isolate workflow in Bactopia.
Built with Nextflow, input FASTQs (local or
available from SRA/ENA) are put through numerous analyses including: quality
control, assembly, annotation, reference mapping, variant calling, minmer
sketch queries, blast alignments, insertion site prediction, sequence typing,
and more. The Bactopia Analysis Pipeline automatically selects which analyses
to include based on the available Bactopia Datasets.
Bactopia Tools are a set a independent workflows for comparative analyses. The
comparative analyses may include summary reports, pan-genome, or phylogenetic
tree construction. Using the
predictable output structure of
Bactopia you can pick and choose which samples to include for processing with
a Bactopia Tool.
Bactopia was inspired by Staphopia, a workflow
we (Tim Read and myself) released that targets Staphylococcus aureus
genomes. Using what we learned from Staphopia and user feedback, Bactopia was
developed from scratch with usability, portability, and speed in mind from the
start.
Documentation
Documentation for Bactopia is available at https://bactopia.github.io/. The
documentation includes a tutorial replicating
Staphopia and a complete overview of Bactopia.
I highly encourage you check it out!
Activating the conda env
1
2
|
bash
source /local/cluster/bactopia/activate.sh
|
To use in SGE, add the source line above to your shell script before the
bactopia commands.
Accessing base databases at the CQLS
The latest dbs can be found at:
1
2
|
$ echo $BACTOPIADB
/nfs1/CGRB/databases/bactopia/latest
|
To use these, specify bactopia --datasets $BACTOPIADB ...
.
Alternatively, generate your own for your own projects.
Quick Start
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
bactopia datasets
# Paired-end
bactopia --R1 R1.fastq.gz --R2 R2.fastq.gz --sample SAMPLE_NAME \
--datasets datasets/ --outdir OUTDIR
# Single-End
bactopia --SE SAMPLE.fastq.gz --sample SAMPLE --datasets datasets/ --outdir OUTDIR
# Multiple Samples
bactopia prepare MY-FASTQS/ > fastqs.txt
bactopia --fastqs fastqs.txt --datasets datasets --outdir OUTDIR
# Single ENA/SRA Experiment
bactopia --accession SRX000000 --datasets datasets --outdir OUTDIR
# Multiple ENA/SRA Experiments
bactopia search "staphylococcus aureus" > accessions.txt
bactopia --accessions accessions.txt --dataset datasets --outdir ${OUTDIR}
|
Location and version
1
2
3
4
|
$ which bactopia
/local/cluster/bactopia/bin/bactopia
$ bactopia --version
bactopia 2.1.0
|
help message
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
|
$ bactopia --help
N E X T F L O W ~ version 22.04.0
Launching `/local/cluster/bactopia/share/bactopia-2.1.x/main.nf` [backstabbing_visvesvaraya] DSL2 - revision: bd75c553f9
---------------------------------------------
_ _ _
| |__ __ _ ___| |_ ___ _ __ (_) __ _
| '_ \ / _` |/ __| __/ _ \| '_ \| |/ _` |
| |_) | (_| | (__| || (_) | |_) | | (_| |
|_.__/ \__,_|\___|\__\___/| .__/|_|\__,_|
|_|
bactopia v2.1.0
Bactopia is a flexible pipeline for complete analysis of bacterial genomes.
---------------------------------------------
Typical pipeline command:
bactopia --fastqs samples.txt --datasets datasets/ --species 'Staphylococcus aureus' -profile singularity
Required Parameters
### For Procesessing Multiple Samples
--samples [string] A FOFN (via bactopia prepare) with sample names and paths to FASTQ/FASTAs to process
### For Processing A Single Sample
--R1 [string] First set of compressed (gzip) paired-end FASTQ reads (requires --R2 and --sample)
--R2 [string] Second set of compressed (gzip) paired-end FASTQ reads (requires --R1 and --sample)
--SE [string] Compressed (gzip) single-end FASTQ reads (requires --sample)
--ont [boolean] Treat `--SE` or `--accession` as long reads for analysis. (requires --sample if using --SE)
--hybrid [boolean] Treat `--SE` as long reads for hybrid assembly. (requires --R1, --R2, --SE and --sample)
--short_polish [boolean] Treat `--SE` as long reads for long-read assembly and short read polishing. (requires --R1, --R2, --SE and
--sample)
--sample [string] Sample name to use for the input sequences
### For Downloading from SRA/ENA or NCBI Assembly
**Note: Downloaded assemblies will have error free Illumina reads simulated for processing.**
--accessions [string] A file containing ENA/SRA Experiment accessions or NCBI Assembly accessions to processed
--accession [string] Sample name to use for the input sequences
### For Processing an Assembly
**Note: Assemblies will have error free Illumina reads simulated for processing.**
--assembly [string] A assembled genome in compressed FASTA format. (requires --sample)
--check_samples [boolean] Validate the input FOFN provided by --samples
Dataset Parameters
--datasets [string] The path to datasets that have already been set up
--species [string] Name of species for species-specific dataset to use
--ask_merlin [boolean] Ask Merlin to execute species specific Bactopia tools based on Mash distances
--coverage [integer] Reduce samples to a given coverage [default: 100]
--genome_size [string] Expected genome size (bp) for all samples, a value of '0' will disable read error correction and read
subsampling, otherwise estimate with Mash [default: 0]
Annotate Genome Parameters
--use_bakta [boolean] Use Bakta for genome annotation (requires --bakta_db)
Optional Parameters
--outdir [string] Base directory to write results to [default: ./]
--run_name [string] Name of the directory to hold results [default: bactopia]
Helpful Parameters
--wf [string] Specify which workflow or Bactopia Tool to execute [default: bactopia]
--list_wfs [boolean] List the available workflows and Bactopia Tools to use with '--wf'
--help_all [boolean] An alias for --help --show_hidden_params
--version [boolean] Display version text.
!! Hiding 149 params, use --show_hidden_params (or --help_all) to show them !!
--------------------------------------------------------------------
If you use bactopia for your analysis please cite:
* Bactopia
https://doi.org/10.1128/mSystems.00190-20
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
https://bactopia.github.io/acknowledgements/
--------------------------------------------------------------------
|
software ref: https://github.com/bactopia/bactopia
software ref: https://bactopia.github.io/
research ref: https://doi.org/10.1128/mSystems.00190-20