# GTDB-Tk 2.1.1 {{< admonition tip "Conda" true >}} See the 'activating the conda environment' section below to access this software. {{< /admonition >}} ## GTDBtk-2.1.1 [![PyPI](https://img.shields.io/pypi/v/gtdbtk.svg)](https://pypi.python.org/pypi/gtdbtk) [![PyPI Downloads](https://pepy.tech/badge/gtdbtk)](https://pepy.tech/project/gtdbtk) [![Bioconda](https://img.shields.io/conda/vn/bioconda/gtdbtk.svg?color=43b02a)](https://anaconda.org/bioconda/gtdbtk) [![BioConda Downloads](https://img.shields.io/conda/dn/bioconda/gtdbtk.svg?style=flag&label=downloads&color=43b02a)](https://anaconda.org/bioconda/gtdbtk) [![Docker Image Version (latest by date)](https://img.shields.io/docker/v/ecogenomic/gtdbtk?sort=date&color=299bec&label=docker)](https://hub.docker.com/r/ecogenomic/gtdbtk) [![Docker Pulls](https://img.shields.io/docker/pulls/ecogenomic/gtdbtk?color=299bec&label=pulls)](https://hub.docker.com/r/ecogenomic/gtdbtk) *[GTDB-Tk v2.1.0](https://ecogenomics.github.io/GTDBTk/announcements.html) was released on May 11, 2022. Upgrading is recommended.* *Please note v2.1.0+ is not compatible with GTDB-Tk package [R207_v1](https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_data.tar.gz). It is necessary to upgrade to GTDB-Tk package [R207_v2](https://data.gtdb.ecogenomic.org/releases/latest/auxillary_files/gtdbtk_v2_data.tar.gz).* GTDB-Tk is a software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes based on the Genome Database Taxonomy ([GTDB](https://gtdb.ecogenomic.org/)). It is designed to work with recent advances that allow hundreds or thousands of metagenome-assembled genomes (MAGs) to be obtained directly from environmental samples. It can also be applied to isolate and single-cell genomes. The GTDB-Tk is open source and released under the [GNU General Public License (Version 3)](https://www.gnu.org/licenses/gpl-3.0.en.html). Notifications about GTDB-Tk releases will be available through the [GTDB Twitter](https://twitter.com/ace_gtdb) account and the [GTDB Announcements Forum](https://forum.gtdb.ecogenomic.org/c/announcements/10). Please post questions and issues related to GTDB-Tk on the Issues section of the GitHub repository. Questions related to the [GTDB](https://gtdb.ecogenomic.org/) can be posted on the [GTDB Forum](https://forum.gtdb.ecogenomic.org/) or sent to the [GTDB team](https://gtdb.ecogenomic.org/about). ### New Features GTDB-Tk v2.1.0 includes the following new features: - GTDB-TK now uses a **divide-and-conquer** approach where the bacterial reference tree is split into multiple **class**-level subtrees. This reduces the memory requirements of GTDB-Tk from **320 GB** of RAM when using the full GTDB R07-RS207 reference tree to approximately **55 GB**. A manuscript describing this approach is in preparation. If you wish to continue using the full GTDB reference tree use the `--full-tree` flag. This is the main change from v2.0.0. The split tree approach has been modified from order-level trees to class-level trees to resolve specific classification issues (See [#383](https://github.com/Ecogenomics/GTDBTk/issues/383)). - Genomes that cannot be assigned to a domain (e.g. genomes with no bacterial or archaeal markers or genomes with no genes called by Prodigal) are now reported in the `gtdbtk.bac120.summary.tsv` as 'Unclassified' - Genomes filtered out during the alignment step are now reported in the `gtdbtk.bac120.summary.tsv` or `gtdbtk.ar53.summary.tsv` as 'Unclassified Bacteria/Archaea' - `--write_single_copy_genes` flag in now available in the `classify_wf` and `de_novo_wf` workflows. ### Documentation Documentation for GTDB-Tk can be found [here](https://ecogenomics.github.io/GTDBTk/). ------------------------------------------------------------------------------- ## Activating the conda environment Check out a node with `qrsh` and run: ```console bash source /local/cluster/gtdbtk/activate.sh ``` To run over SGE, add the source line above to your shell script prior to running your gtdbtk commands. ## Location, version, and DB location ```console $ which gtdbtk /local/cluster/gtdbtk/bin/gtdbtk $ gtdbtk --version gtdbtk: version 2.1.1 Copyright 2017 Pierre-Alain Chaumeil, Aaron Mussig and Donovan Parks $ echo $GTDBTK_DATA_PATH /nfs1/CGRB/databases/GTDB/current ``` ## help message ```console $ gtdbtk --help ...::: GTDB-Tk v2.1.1 :::... Workflows: classify_wf -> Classify genomes by placement in GTDB reference tree (identify -> align -> classify) de_novo_wf -> Infer de novo tree and decorate with GTDB taxonomy (identify -> align -> infer -> root -> decorate) Methods: identify -> Identify marker genes in genome align -> Create multiple sequence alignment classify -> Determine taxonomic classification of genomes infer -> Infer tree from multiple sequence alignment root -> Root tree using an outgroup decorate -> Decorate tree with GTDB taxonomy Tools: infer_ranks -> Establish taxonomic ranks of internal nodes using RED ani_rep -> Calculates ANI to GTDB representative genomes trim_msa -> Trim an untrimmed MSA file based on a mask export_msa -> Export the untrimmed archaeal or bacterial MSA file remove_labels -> Remove labels (bootstrap values, node labels) from an Newick tree convert_to_itol -> Convert a GTDB-Tk Newick tree to an iTOL tree Testing: test -> Validate the classify_wf pipeline with 3 archaeal genomes check_install -> Verify third party programs and GTDB reference package Use: gtdbtk -h for command specific help ``` software ref: research ref: research ref: