EDTA 1.9.6

2021-09-08 601 words 3 minutes

The Extensive de novo TE Annotator (EDTA)

This package is developed for automated whole-genome de-novo TE annotation and benchmarking the annotation performance of TE libraries.

The EDTA package was designed to filter out false discoveries in raw TE candidates and generate a high-quality non-redundant TE library for whole-genome TE annotations. Selection of initial search programs were based on benckmarkings on the annotation performance using a manually curated TE library in the rice genome.

Activate conda env:

1
2


bash
source /local/cluster/EDTA/activate.sh

Location and dependency check:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13


$ which EDTA.pl
/local/cluster/EDTA-1.9.6/bin/EDTA.pl
$ EDTA.pl --check_dependencies

########################################################
##### Extensive de-novo TE Annotator (EDTA) v1.9.4  ####
##### Shujun Ou (shujun.ou.1@gmail.com)             ####
########################################################



Wed Sep  8 21:51:45 PDT 2021	Dependency checking:
				All passed!

help message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53


$ EDTA.pl --help

########################################################
##### Extensive de-novo TE Annotator (EDTA) v1.9.4  ####
##### Shujun Ou (shujun.ou.1@gmail.com)             ####
########################################################




This is the Extensive de-novo TE Annotator that generates a high-quality
structure-based TE library. Usage:

perl EDTA.pl [options]
	--genome	[File]	The genome FASTA
	--species [Rice|Maize|others]	Specify the species for identification of TIR
					candidates. Default: others
	--step	[all|filter|final|anno] Specify which steps you want to run EDTA.
					all: run the entire pipeline (default)
					filter: start from raw TEs to the end.
					final: start from filtered TEs to finalizing the run.
					anno: perform whole-genome annotation/analysis after
						TE library construction.
	--overwrite	[0|1]	If previous raw TE results are found, decide to overwrite
				(1, rerun) or not (0, default).
	--cds	[File]	Provide a FASTA file containing the coding sequence (no introns,
			UTRs, nor TEs) of this genome or its close relative.
	--curatedlib	[File]	Provided a curated library to keep consistant naming and
				classification for known TEs. TEs in this file will be
				trusted 100%, so please ONLY provide MANUALLY CURATED ones.
				This option is not mandatory. It's totally OK if no file is
				provided (default).
	--sensitive	[0|1]	Use RepeatModeler to identify remaining TEs (1) ornot (0,
				default). This step is slow but MAY help to recover some TEs.
	--anno	[0|1]	Perform (1) or not perform (0, default) whole-genome TE annotation
			after TE library construction.
	--rmout	[File]	Provide your own homology-based TE annotation instead of using the
			EDTA library for masking. File is in RepeatMasker .out format. This
			file will be merged with the structural-based TE annotation. (--anno 1
			required). Default: use the EDTA library for annotation.
	--evaluate [0|1]	Evaluate (1) classification consistency of the TE annotation.
				(--anno 1 required). Default: 0. This step is slowand does
				not change the annotation result.
	--exclude	[File]	Exclude bed format regions from TE annotation. Default: undef.
				(--anno 1 required).
	--force	[0|1]	When no confident TE candidates are found: 0, interrupt and exit
			(default); 1, use rice TEs to continue.
	--repeatmodeler [path]	The directory containing RepeatModeler (default: read from ENV)
	--repeatmasker [path]	The directory containing RepeatMasker (default: read from ENV)
	--check_dependencies Check if dependencies are fullfiled and quit
	--threads|-t	[int]	Number of theads to run this script (default: 4)
	--debug		[0|1]	Retain intermediate files (default: 0)
	--help|-h	Display this help info

To run through SGE, generate a shell script that includes your commands.

Add a line that includes source /local/cluster/EDTA/activate.sh before the commands you are interested in running.

If you want to run this interactively, check out a node with qrsh first.

software ref: https://github.com/oushujun/EDTA
research ref: https://doi.org/10.1186/s13059-019-1905-y