Conda
See the ‘activating the conda environment’ section below to access this
software.
Configuration required
See the relevant section below to configure this software before use.
busco-5.4.3 - Benchmarking sets of Universal Single-Copy Orthologs.
For full documentation please consult the user guide:
https://busco.ezlab.org/busco_userguide.html
Main changes in v5:
- Metaeuk is used as default gene predictor for eukaryote pipeline. Augustus
is maintained and can be used optionally instead of Metaeuk.
- Introduction of batch mode: input argument can be a folder containing input files
- The folder structure has changed, so if doing a manual installation, make
sure to completely remove any previous versions of BUSCO before installing
v5.
Configuring the conda environment
In order to use augustus with this software, you need to run the
/local/cluster/conda/setup_busco_config.sh
script and provide a path for the
augustus config to get copied to such that you can write to the directory.
If you don’t plan on using augustus for gene prediction, then you should not
have to do this additional step.
Then, you can run the command printed to the screen to activate the
environment, or check out a node with qrsh
and run
1
2
|
bash
source ~/activate_busco.sh
|
If you don’t plan on using augustus, you can activate the environment in this
way
1
2
|
bash
source /local/cluster/busco/activate.sh
|
In order to use busco over SGE, include the either appropriate source
command above prior to your BUSCO commands in a shell script.
Location and version
1
2
3
4
5
6
|
$ which busco
/local/cluster/busco-5.4.3/bin/busco
(/local/cluster/busco-5.4.3)
# davised:Linux @ chrom1 in /nfs4/core/home/davised/opt [14:35:49]
$ busco --version
BUSCO 5.4.3
|
help message
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
|
$ busco --help
usage: busco -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] [OTHER OPTIONS]
Welcome to BUSCO 5.4.3: the Benchmarking Universal Single-Copy Ortholog assessment tool.
For more detailed usage information, please review the README file provided with this distribution and the BUSCO user guide. Visit this page https://gitlab.com/ezlab/busco#how-to-cite-busco to see how to cite BUSCO
optional arguments:
-i SEQUENCE_FILE, --in SEQUENCE_FILE
Input sequence file in FASTA format. Can be an assembled genome or transcriptome (DNA), or protein sequences from an annotated gene set. Also possible to use a path to a directory containing multiple input files.
-o OUTPUT, --out OUTPUT
Give your analysis run a recognisable short name. Output folders and files will be labelled with this name. The path to the outputfolder is set with --out_path.
-m MODE, --mode MODE Specify which BUSCO analysis mode to run.
There are three valid modes:
- geno or genome, for genome assemblies (DNA)
- tran or transcriptome, for transcriptome assemblies (DNA)
- prot or proteins, for annotated gene sets (protein)
-l LINEAGE, --lineage_dataset LINEAGE
Specify the name of the BUSCO lineage to be used.
--augustus Use augustus gene predictor for eukaryote runs
--augustus_parameters --PARAM1=VALUE1,--PARAM2=VALUE2
Pass additional arguments to Augustus. All arguments should be contained within a single string with no white space, with each argument separated by a comma.
--augustus_species AUGUSTUS_SPECIES
Specify a species for Augustus training.
--auto-lineage Run auto-lineage to find optimum lineage path
--auto-lineage-euk Run auto-placement just on eukaryote tree to find optimum lineage path
--auto-lineage-prok Run auto-lineage just on non-eukaryote trees to find optimum lineage path
-c N, --cpu N Specify the number (N=integer) of threads/cores to use.
--config CONFIG_FILE Provide a config file
--contig_break n Number of contiguous Ns to signify a break between contigs. Default is n=10.
--datasets_version DATASETS_VERSION
Specify the version of BUSCO datasets, e.g. odb10
--download [dataset [dataset ...]]
Download dataset. Possible values are a specific dataset name, "all", "prokaryota", "eukaryota", or "virus". If used together withother command line arguments, make sure to place this last.
--download_base_url DOWNLOAD_BASE_URL
Set the url to the remote BUSCO dataset location
--download_path DOWNLOAD_PATH
Specify local filepath for storing BUSCO dataset downloads
-e N, --evalue N E-value cutoff for BLAST searches. Allowed formats, 0.001 or 1e-03 (Default: 1e-03)
-f, --force Force rewriting of existing files. Must be used when output files with the provided name already exist.
-h, --help Show this help message and exit
--limit N How many candidate regions (contig or transcript) to consider per BUSCO (default: 3)
--list-datasets Print the list of available BUSCO datasets
--long Optimization Augustus self-training mode (Default: Off); adds considerably to the run time, but can improve results for some non-model organisms
--metaeuk_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2"
Pass additional arguments to Metaeuk for the first run. All arguments should be contained within a single string with no white space, with each argument separated by a comma.
--metaeuk_rerun_parameters "--PARAM1=VALUE1,--PARAM2=VALUE2"
Pass additional arguments to Metaeuk for the second run. All arguments should be contained within a single string with no white space, with each argument separated by a comma.
--offline To indicate that BUSCO cannot attempt to download files
--out_path OUTPUT_PATH
Optional location for results folder, excluding results folder name. Default is current working directory.
-q, --quiet Disable the info logs, displays only errors
-r, --restart Continue a run that had already partially completed.
--scaffold_composition
Writes ACGTN content per scaffold to a file scaffold_composition.txt
--tar Compress some subdirectories with many files to save space
--update-data Download and replace with last versions all lineages datasets and files necessary to their automated selection
-v, --version Show this version and exit
|
software ref: https://busco.ezlab.org/busco_userguide.html
research ref: https://doi.org/10.1093/molbev/msab199