augustus 3.4.0

2022-01-06 659 words 4 minutes

AUGUSTUS

AUGUSTUS is a program to find genes and their structures in one or more genomes.

AUGUSTUS is a gene prediction program written or maintained by Mario Stanke, Oliver Keller, Stefanie König, Lizzy Gerischer, Katharina Hoff, Giovanna Migliorelli, Lars Gabriel, Anica Hoppe, Tonatiuh Peña Centeno, Henry Mehlan, Daniel Honsel and Steffen Herbold. It can be used as an ab initio program, which means it bases its prediction purely on the sequence. AUGUSTUS may also incorporate hints on the gene structure coming from extrinsic sources such as EST, MS/MS, protein alignments and syntenic genomic alignments. Since version 3.0 AUGUSTUS can also predict the genes simultaneously in several aligned genomes.

To activate:

1
2
3


bash
/local/cluster/conda/setup_augustus_config.sh /path/to/augustus/destination
source ~/activate_augustus.sh

The augustus setup script will copy the configuration directory to the dir specified and then provide a command that you need to run to activate with the proper $AUGUSTUS_CONFIG_DIR environment variable.

Include the source ~/activate_augustus.sh line in your scripts submitted to SGE_Batch to access augustus in SGE commands.

Previous installs of augustus, including v3.3.3 are still available outside of the conda environment. Please let us know if this causes problems during future augustus runs.

Location and version:

1
2
3
4
5


$ which augustus
/local/cluster/augustus-3.4.0/bin/augustus
$ augustus --version
AUGUSTUS (3.4.0) is a gene prediction tool.
Sources and documentation at https://github.com/Gaius-Augustus/Augustus

help message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58


$ augustus --help
usage:
augustus [parameters] --species=SPECIES queryfilename

'queryfilename' is the filename (including relative path) to the file containing the query sequence(s)
in fasta format.

SPECIES is an identifier for the species. Use --species=help to see a list.

parameters:
--strand=both, --strand=forward or --strand=backward
--genemodel=partial, --genemodel=intronless, --genemodel=complete, --genemodel=atleastone or --genemodel=exactlyone
  partial      : allow prediction of incomplete genes at the sequence boundaries (default)
  intronless   : only predict single-exon genes like in prokaryotes and some eukaryotes
  complete     : only predict complete genes
  atleastone   : predict at least one complete gene
  exactlyone   : predict exactly one complete gene
--singlestrand=true
  predict genes independently on each strand, allow overlapping genes on opposite strands
  This option is turned off by default.
--hintsfile=hintsfilename
  When this option is used the prediction considering hints (extrinsic information) is turned on.
  hintsfilename contains the hints in gff format.
--AUGUSTUS_CONFIG_PATH=path
  path to config directory (if not specified as environment variable)
--alternatives-from-evidence=true/false
  report alternative transcripts when they are suggested by hints
--alternatives-from-sampling=true/false
  report alternative transcripts generated through probabilistic sampling
--sample=n
--minexonintronprob=p
--minmeanexonintronprob=p
--maxtracks=n
  For a description of these parameters see section 4 of README.TXT.
--proteinprofile=filename
  When this option is used the prediction will consider the protein profile provided as parameter.
  The protein profile extension is described in section 7 of README.TXT.
--progress=true
  show a progressmeter
--gff3=on/off
  output in gff3 format
--predictionStart=A, --predictionEnd=B
  A and B define the range of the sequence for which predictions should be found.
--UTR=on/off
  predict the untranslated regions in addition to the coding sequence. This currently works only for a subset of species.
--noInFrameStop=true/false
  Do not report transcripts with in-frame stop codons. Otherwise, intron-spanning stop codons could occur. Default: false
--noprediction=true/false
  If true and input is in genbank format, no prediction is made. Useful for getting the annotated protein sequences.
--uniqueGeneId=true/false
  If true, output gene identifyers like this: seqname.gN
--/Testing/testMode=prepare, --/Testing/testMode=run, (disabled by default)
  prepare      : prepare a new minimal data set to test comparative Augustus
  intronless   : run prediction over some given minimal data set
  (*) a minimal data set is one retaining only the information need in prediction, usually very small (order of Mb) compared to full sequence data sets (ordet of Gb)

For a complete list of parameters, type "augustus --paramlist".
An exhaustive description can be found in the file README.TXT.

software ref: https://github.com/Gaius-Augustus/Augustus
research ref: https://doi.org/10.1093/bioinformatics/btn013