1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
|
$ agat_convert_sp_gxf2gxf.pl --help
------------------------------------------------------------------------------
| Another GFF Analysis Toolkit (AGAT) - Version: v0.8.0 |
| https://github.com/NBISweden/AGAT |
| National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se |
------------------------------------------------------------------------------
Name:
agat_convert_sp_gxf2gxf.pl
Description:
This script fixes and/or standardizes any GTF/GFF file into full sorted
GFF3 file. The output GFF syntax is shaped by bioperl and choose among
the versions 1,2,2.5 (GTF equivalent) and 3. For a correct GTF file, it
is recommended to use agat_convert_sp_gff2gtf.pl
Without specifying an input GTF/GFF version, the Omniscient parser will
first detect automtically the most appropriate GFF parser to use from
bioperl (GFF1,GFF2,GFF3) in order to read you file properly. Then the
Omniscient parser removes duplicate features, fixes duplicated IDs, adds
missing ID and/or Parent attributes, deflates factorized attributes
(attributes with several parents are duplicated with uniq ID), add
missing features when possible (e.g. add exon if only CDS described, add
UTR if CDS and exon described), fix feature locations (e.g. check exon
is embedded in the parent features mRNA, gene), etc... All AGAT's
scripts with the _sp_ prefix use the same parser, before to perform
supplement tasks. With that script you can tuned the Omniscient parser
behaviour. I.e. you can decide to merge loci that have an overlap at
their CDS features (Only one top feature is kept (gene), and the mRNA
features become isoforms). This is not activated by default in case you
are working on a prokaryote annotation that often have overlaping loci.
The Omniscient parser defines relationship between features using 3
levels. e.g Level1=gene; Level2=mRNA,tRNA; Level3=exon,cds,utr. The
feature type information is stored within the 3rd column of a GTF/GFF
file. The parser need to know to which level a feature type is part of.
This information is stored by default in a json file coming with the
tool. We have implemented the most common feature types met in gff/gtf
files. If a feature type is not yet handle by the parser it will throw a
warning. You can easily inform the parser how to handle it (level1,
level2 or level3) by modifying the appropriate json file. How to access
the json files? Easy just use the --expose option and the json files
will appear in the working folder. By default, the Omniscient parser use
the json files from the working directory when any.
Omniscient parser phylosophy:
Parse by Parent/child relationship
ELSE Parse by a comomn tag (an attribute value shared by feature that must be grouped together.
By default we are using locus_tag and gene_id as locus tag, but you can specify the one of your choice
ELSE Parse sequentially (features are grouped in a bucket, and the bucket change at each level2 feature met, and bucket(s) are linked to the first l1 top feature met)
Usage:
agat_convert_sp_gxf2gxf.pl -g infile.gff [ -o outfile ]
agat_convert_sp_gxf2gxf.pl --help
Options:
-g, --gff or -ref
Input GTF/GFF file.
-c or --ct
When the features do not have Parent/ID relationships, the
parser will try to group features using a common/shared
attribute (i.e. a locus tag.). By default locus_tag and gene_id.
You can replace the default common/shared attributes by
providing your own(s) using this option. Use comma separated
list when providing several.
--efl or --expose
If you want to see, add or modified the feature relationships
you will have to use this option. It will copy past in you
working directory the json files used to define the relation
between feature types and their level organisation. Typical
level organisation: Level1 => gene; Level2 => mRNA; level3 =>
exon,cds,utrs If you get warning from the Omniscient parser that
a feature relationship is not defined, you can provide
information about it within the exposed json files. Indeed, if
the json files exists in your working directory, they will be
used by default.
--ml or --merge_loci
Merge loci parameter, default deactivated. You turn on the
parameter if you want to merge loci into one locus when they
overlap. (at CDS level for mRNA, at exon level for other level2
features. Strand has to be the same). Prokaryote can have
overlaping loci so it should not use it for prokaryote
annotation. In eukaryote, loci rarely overlap. Overlaps could be
due to error in the file, mRNA can be merged under the same
parent gene if you acticate the option.
-v Verbose option. To modify verbosity. Default is 1. 0 is quiet, 2
and 3 are increasing verbosity.
--nc or --no_check
To deacticate all check that can be performed by the parser (e.g
fixing UTR, exon, coordinates etc...)
--debug For debug purpose
-o or --output
Output GFF file. If no output file is specified, the output will
be written to STDOUT.
--gvi or --gff_version_input
If you don't want to use the autodection of the gff/gft version
you give as input, you can force the tool to use the parser of
the gff version you decide to use: 1,2,2.5 or 3. Remind: 2.5 is
suposed to be gtf.
--gvo or --gff_version_output
If you don't want to use the autodection of the gff/gft version
you give as input, you can force the tool to use the parser of
the gff version you decide to use: 1,2,2.5 or 3. Remind: 2.5 is
suposed to be gtf.
-h or --help
Display this helpful text.
Feedback:
Did you find a bug?:
Do not hesitate to report bugs to help us keep track of the bugs and
their resolution. Please use the GitHub issue tracking system available
at this address:
https://github.com/NBISweden/AGAT/issues
Ensure that the bug was not already reported by searching under Issues.
If you're unable to find an (open) issue addressing the problem, open a new one.
Try as much as possible to include in the issue when relevant:
- a clear description,
- as much relevant information as possible,
- the command used,
- a data sample,
- an explanation of the expected behaviour that is not occurring.
Do you want to contribute?:
You are very welcome, visit this address for the Contributing
guidelines:
https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md
|