The Extensive de novo TE Annotator (EDTA)
This package is developed for automated whole-genome de-novo TE annotation and
benchmarking the annotation performance of TE libraries.
The EDTA package was designed to filter out false discoveries in raw TE
candidates and generate a high-quality non-redundant TE library for whole-genome
TE annotations. Selection of initial search programs were based on benckmarkings
on the annotation performance using a manually curated TE library in the rice
genome.
Activate conda env:
1
2
|
bash
source /local/cluster/EDTA/activate.sh
|
Location and dependency check:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
$ which EDTA.pl
/local/cluster/EDTA-1.9.6/bin/EDTA.pl
$ EDTA.pl --check_dependencies
########################################################
##### Extensive de-novo TE Annotator (EDTA) v1.9.4 ####
##### Shujun Ou (shujun.ou.1@gmail.com) ####
########################################################
Wed Sep 8 21:51:45 PDT 2021 Dependency checking:
All passed!
|
help message:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
|
$ EDTA.pl --help
########################################################
##### Extensive de-novo TE Annotator (EDTA) v1.9.4 ####
##### Shujun Ou (shujun.ou.1@gmail.com) ####
########################################################
This is the Extensive de-novo TE Annotator that generates a high-quality
structure-based TE library. Usage:
perl EDTA.pl [options]
--genome [File] The genome FASTA
--species [Rice|Maize|others] Specify the species for identification of TIR
candidates. Default: others
--step [all|filter|final|anno] Specify which steps you want to run EDTA.
all: run the entire pipeline (default)
filter: start from raw TEs to the end.
final: start from filtered TEs to finalizing the run.
anno: perform whole-genome annotation/analysis after
TE library construction.
--overwrite [0|1] If previous raw TE results are found, decide to overwrite
(1, rerun) or not (0, default).
--cds [File] Provide a FASTA file containing the coding sequence (no introns,
UTRs, nor TEs) of this genome or its close relative.
--curatedlib [File] Provided a curated library to keep consistant naming and
classification for known TEs. TEs in this file will be
trusted 100%, so please ONLY provide MANUALLY CURATED ones.
This option is not mandatory. It's totally OK if no file is
provided (default).
--sensitive [0|1] Use RepeatModeler to identify remaining TEs (1) ornot (0,
default). This step is slow but MAY help to recover some TEs.
--anno [0|1] Perform (1) or not perform (0, default) whole-genome TE annotation
after TE library construction.
--rmout [File] Provide your own homology-based TE annotation instead of using the
EDTA library for masking. File is in RepeatMasker .out format. This
file will be merged with the structural-based TE annotation. (--anno 1
required). Default: use the EDTA library for annotation.
--evaluate [0|1] Evaluate (1) classification consistency of the TE annotation.
(--anno 1 required). Default: 0. This step is slowand does
not change the annotation result.
--exclude [File] Exclude bed format regions from TE annotation. Default: undef.
(--anno 1 required).
--force [0|1] When no confident TE candidates are found: 0, interrupt and exit
(default); 1, use rice TEs to continue.
--repeatmodeler [path] The directory containing RepeatModeler (default: read from ENV)
--repeatmasker [path] The directory containing RepeatMasker (default: read from ENV)
--check_dependencies Check if dependencies are fullfiled and quit
--threads|-t [int] Number of theads to run this script (default: 4)
--debug [0|1] Retain intermediate files (default: 0)
--help|-h Display this help info
|
To run through SGE, generate a shell script that includes your commands.
Add a line that includes source /local/cluster/EDTA/activate.sh
before the
commands you are interested in running.
If you want to run this interactively, check out a node with qrsh
first.
software ref: https://github.com/oushujun/EDTA
research ref: https://doi.org/10.1186/s13059-019-1905-y