# Bactopia 2.1.0 ![Bactopia Logo](../images/bactopia/bactopia-logo.png) # Bactopia Bactopia is a flexible pipeline for complete analysis of bacterial genomes. The goal of Bactopia is to process your data with a broad set of tools, so that you can get to the fun part of analyses quicker! Bactopia can be split into three main parts: [Bactopia Datasets](https://bactopia.github.io/datasets/), [Bactopia Analysis Pipeline](https://bactopia.github.io/#bactopia-workflow), and [Bactopia Tools](https://bactopia.github.io/bactopia-tools/). ![Bactopia Overview](../images/bactopia/bactopia-overview.png) Bactopia Datasets provide a framework for including many existing public datasets, as well as private datasets, into your analysis The process of downloading, building, and (or) configuring these datasets for Bactopia has been automated. Bactopia Analysis Pipeline is the main *per-isolate* workflow in Bactopia. Built with [Nextflow](https://www.nextflow.io/), input FASTQs (local or available from SRA/ENA) are put through numerous analyses including: quality control, assembly, annotation, reference mapping, variant calling, minmer sketch queries, blast alignments, insertion site prediction, sequence typing, and more. The Bactopia Analysis Pipeline automatically selects which analyses to include based on the available Bactopia Datasets. ![Bactopia Overview](../images/bactopia/bactopia-workflow.png) Bactopia Tools are a set a independent workflows for comparative analyses. The comparative analyses may include summary reports, pan-genome, or phylogenetic tree construction. Using the [predictable output structure](https://bactopia.github.io/output-overview/) of Bactopia you can pick and choose which samples to include for processing with a Bactopia Tool. Bactopia was inspired by [Staphopia](https://staphopia.emory.edu/), a workflow we (Tim Read and myself) released that targets *Staphylococcus aureus* genomes. Using what we learned from Staphopia and user feedback, Bactopia was developed from scratch with usability, portability, and speed in mind from the start. # Documentation Documentation for Bactopia is available at https://bactopia.github.io/. The documentation includes a tutorial replicating [Staphopia](https://staphopia.emory.edu) and a complete overview of Bactopia. I highly encourage you check it out! # Activating the conda env ```console bash source /local/cluster/bactopia/activate.sh ``` To use in SGE, add the source line above to your shell script before the bactopia commands. # Accessing base databases at the CQLS The latest dbs can be found at: ```console $ echo $BACTOPIADB /nfs1/CGRB/databases/bactopia/latest ``` To use these, specify `bactopia --datasets $BACTOPIADB ...`. Alternatively, generate your own for your own projects. # Quick Start ```console bactopia datasets # Paired-end bactopia --R1 R1.fastq.gz --R2 R2.fastq.gz --sample SAMPLE_NAME \ --datasets datasets/ --outdir OUTDIR # Single-End bactopia --SE SAMPLE.fastq.gz --sample SAMPLE --datasets datasets/ --outdir OUTDIR # Multiple Samples bactopia prepare MY-FASTQS/ > fastqs.txt bactopia --fastqs fastqs.txt --datasets datasets --outdir OUTDIR # Single ENA/SRA Experiment bactopia --accession SRX000000 --datasets datasets --outdir OUTDIR # Multiple ENA/SRA Experiments bactopia search "staphylococcus aureus" > accessions.txt bactopia --accessions accessions.txt --dataset datasets --outdir ${OUTDIR} ``` # Location and version ```console $ which bactopia /local/cluster/bactopia/bin/bactopia $ bactopia --version bactopia 2.1.0 ``` # help message ```console $ bactopia --help N E X T F L O W ~ version 22.04.0 Launching `/local/cluster/bactopia/share/bactopia-2.1.x/main.nf` [backstabbing_visvesvaraya] DSL2 - revision: bd75c553f9 --------------------------------------------- _ _ _ | |__ __ _ ___| |_ ___ _ __ (_) __ _ | '_ \ / _` |/ __| __/ _ \| '_ \| |/ _` | | |_) | (_| | (__| || (_) | |_) | | (_| | |_.__/ \__,_|\___|\__\___/| .__/|_|\__,_| |_| bactopia v2.1.0 Bactopia is a flexible pipeline for complete analysis of bacterial genomes. --------------------------------------------- Typical pipeline command: bactopia --fastqs samples.txt --datasets datasets/ --species 'Staphylococcus aureus' -profile singularity Required Parameters ### For Procesessing Multiple Samples --samples [string] A FOFN (via bactopia prepare) with sample names and paths to FASTQ/FASTAs to process ### For Processing A Single Sample --R1 [string] First set of compressed (gzip) paired-end FASTQ reads (requires --R2 and --sample) --R2 [string] Second set of compressed (gzip) paired-end FASTQ reads (requires --R1 and --sample) --SE [string] Compressed (gzip) single-end FASTQ reads (requires --sample) --ont [boolean] Treat `--SE` or `--accession` as long reads for analysis. (requires --sample if using --SE) --hybrid [boolean] Treat `--SE` as long reads for hybrid assembly. (requires --R1, --R2, --SE and --sample) --short_polish [boolean] Treat `--SE` as long reads for long-read assembly and short read polishing. (requires --R1, --R2, --SE and --sample) --sample [string] Sample name to use for the input sequences ### For Downloading from SRA/ENA or NCBI Assembly **Note: Downloaded assemblies will have error free Illumina reads simulated for processing.** --accessions [string] A file containing ENA/SRA Experiment accessions or NCBI Assembly accessions to processed --accession [string] Sample name to use for the input sequences ### For Processing an Assembly **Note: Assemblies will have error free Illumina reads simulated for processing.** --assembly [string] A assembled genome in compressed FASTA format. (requires --sample) --check_samples [boolean] Validate the input FOFN provided by --samples Dataset Parameters --datasets [string] The path to datasets that have already been set up --species [string] Name of species for species-specific dataset to use --ask_merlin [boolean] Ask Merlin to execute species specific Bactopia tools based on Mash distances --coverage [integer] Reduce samples to a given coverage [default: 100] --genome_size [string] Expected genome size (bp) for all samples, a value of '0' will disable read error correction and read subsampling, otherwise estimate with Mash [default: 0] Annotate Genome Parameters --use_bakta [boolean] Use Bakta for genome annotation (requires --bakta_db) Optional Parameters --outdir [string] Base directory to write results to [default: ./] --run_name [string] Name of the directory to hold results [default: bactopia] Helpful Parameters --wf [string] Specify which workflow or Bactopia Tool to execute [default: bactopia] --list_wfs [boolean] List the available workflows and Bactopia Tools to use with '--wf' --help_all [boolean] An alias for --help --show_hidden_params --version [boolean] Display version text. !! Hiding 149 params, use --show_hidden_params (or --help_all) to show them !! -------------------------------------------------------------------- If you use bactopia for your analysis please cite: * Bactopia https://doi.org/10.1128/mSystems.00190-20 * The nf-core framework https://doi.org/10.1038/s41587-020-0439-x * Software dependencies https://bactopia.github.io/acknowledgements/ -------------------------------------------------------------------- ``` software ref: software ref: research ref: