busco 5.2.2

BUSCOv5 - Benchmarking sets of Universal Single-Copy Orthologs.

For full documentation please consult the user guide: https://busco.ezlab.org/busco_userguide.html

Main changes in v5:

  • Metaeuk is used as default gene predictor for eukaryote pipeline. Augustus is maintained and can be used optionally instead of Metaeuk.
  • Introduction of batch mode: input argument can be a folder containing input files
  • The folder structure has changed, so if doing a manual installation, make sure to completely remove any previous versions of BUSCO before installing v5.

To activate:

1
2
bash
source /local/cluster/busco/activate.sh

To use busco in SGE_Batch, include the source /local/cluster/busco/activate.sh line in your shell scripts before the other commands.

Location and version:

1
2
3
4
$ which busco
/local/cluster/busco/bin/busco
$ busco --version
BUSCO 5.2.2

help message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
$ busco --help
usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]

Welcome to BUSCO 5.2.2: the Benchmarking Universal Single-Copy Ortholog assessment tool.
For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide. Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO

optional arguments:
  -i SEQUENCE_FILE, --in SEQUENCE_FILE
                        Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set. Also possible to use a path to a directory containing multiple input files.
  -o OUTPUT, --out OUTPUT
                        Give your analysis run a recognisable short name. Output folders and files willbe labelled with this name. WARNING: do not provide a path
  -m MODE, --mode MODE  Specify which BUSCO analysis mode to run.
                        There are three valid modes:
                        - geno or genome, for genome assemblies (DNA)
                        - tran or transcriptome, for transcriptome assemblies (DNA)
                        - prot or proteins, for annotated gene sets (protein)
  -l LINEAGE, --lineage_dataset LINEAGE
                        Specify the name of the BUSCO lineage to be used.
  --augustus            Use augustus gene predictor for eukaryote runs
  --augustus_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2"
                        Pass additional arguments to Augustus. All arguments should be contained withina single pair of quotation marks, separated by commas.
  --augustus_species AUGUSTUS_SPECIES
                        Specify a species for Augustus training.
  --auto-lineage        Run auto-lineage to find optimum lineage path
  --auto-lineage-euk    Run auto-placement just on eukaryote tree to find optimum lineage path
  --auto-lineage-prok   Run auto-lineage just on non-eukaryote trees to find optimum lineage path
  -c N, --cpu N         Specify the number (N=integer) of threads/cores to use.
  --config CONFIG_FILE  Provide a config file
  --datasets_version DATASETS_VERSION
                        Specify the version of BUSCO datasets, e.g. odb10
  --download [dataset [dataset ...]]
                        Download dataset. Possible values are a specific dataset name, "all", "prokaryota", "eukaryota", or "virus". If used together with other command line arguments, make sure to place this last.
  --download_base_url DOWNLOAD_BASE_URL
                        Set the url to the remote BUSCO dataset location
  --download_path DOWNLOAD_PATH
                        Specify local filepath for storing BUSCO dataset downloads
  -e N, --evalue N      E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)
  -f, --force           Force rewriting of existing files. Must be used when output files with the provided name already exist.
  -h, --help            Show this help message and exit
  --limit N             How many candidate regions (contig or transcript) to consider per BUSCO (default: 3)
  --list-datasets       Print the list of available BUSCO datasets
  --long                Optimization Augustus self-training mode (Default: Off); adds considerably to the run time, but can improve results for some non-model organisms
  --metaeuk_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2"
                        Pass additional arguments to Metaeuk for the first run. All arguments should becontained within a single pair of quotation marks, separated by commas.
  --metaeuk_rerun_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2"
                        Pass additional arguments to Metaeuk for the second run. All arguments should be contained within a single pair of quotation marks, separated by commas.
  --offline             To indicate that BUSCO cannot attempt to download files
  --out_path OUTPUT_PATH
                        Optional location for results folder, excluding results folder name. Default iscurrent working directory.
  -q, --quiet           Disable the info logs, displays only errors
  -r, --restart         Continue a run that had already partially completed.
  --tar                 Compress some subdirectories with many files to save space
  --update-data         Download and replace with last versions all lineages datasets and files necessary to their automated selection
  -v, --version         Show this version and exit

software ref: https://busco.ezlab.org/
software ref: https://busco.ezlab.org/busco_userguide.html
research ref: https://doi.org/10.1093/molbev/msab199