# agat 0.8.0 ## AGAT - Another Gtf/Gff Analysis Toolkit AGAT has the power to check, fix, pad missing information (features/attributes) of any kind of GTF and GFF to create complete, sorted and standardised gff3 format. Over the years it has been enriched by many many tools to perform just about any tasks that is possible related to GTF/GFF format files (sanitizing, conversions, merging, modifying, filtering, FASTA sequence extraction, adding information, etc). Comparing to other methods AGAT is robust to even the most despicable GTF/GFF files. Activate: ```console bash source /local/cluster/agat/activate.sh ``` Location and version: ```console $ which agat_convert_sp_gxf2gxf.pl /local/cluster/agat/bin/agat_convert_sp_gxf2gxf.pl ``` help message (See the github page for more info - there are many tools): ```console $ agat_convert_sp_gxf2gxf.pl --help ------------------------------------------------------------------------------ | Another GFF Analysis Toolkit (AGAT) - Version: v0.8.0 | | https://github.com/NBISweden/AGAT | | National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se | ------------------------------------------------------------------------------ Name: agat_convert_sp_gxf2gxf.pl Description: This script fixes and/or standardizes any GTF/GFF file into full sorted GFF3 file. The output GFF syntax is shaped by bioperl and choose among the versions 1,2,2.5 (GTF equivalent) and 3. For a correct GTF file, it is recommended to use agat_convert_sp_gff2gtf.pl Without specifying an input GTF/GFF version, the Omniscient parser will first detect automtically the most appropriate GFF parser to use from bioperl (GFF1,GFF2,GFF3) in order to read you file properly. Then the Omniscient parser removes duplicate features, fixes duplicated IDs, adds missing ID and/or Parent attributes, deflates factorized attributes (attributes with several parents are duplicated with uniq ID), add missing features when possible (e.g. add exon if only CDS described, add UTR if CDS and exon described), fix feature locations (e.g. check exon is embedded in the parent features mRNA, gene), etc... All AGAT's scripts with the _sp_ prefix use the same parser, before to perform supplement tasks. With that script you can tuned the Omniscient parser behaviour. I.e. you can decide to merge loci that have an overlap at their CDS features (Only one top feature is kept (gene), and the mRNA features become isoforms). This is not activated by default in case you are working on a prokaryote annotation that often have overlaping loci. The Omniscient parser defines relationship between features using 3 levels. e.g Level1=gene; Level2=mRNA,tRNA; Level3=exon,cds,utr. The feature type information is stored within the 3rd column of a GTF/GFF file. The parser need to know to which level a feature type is part of. This information is stored by default in a json file coming with the tool. We have implemented the most common feature types met in gff/gtf files. If a feature type is not yet handle by the parser it will throw a warning. You can easily inform the parser how to handle it (level1, level2 or level3) by modifying the appropriate json file. How to access the json files? Easy just use the --expose option and the json files will appear in the working folder. By default, the Omniscient parser use the json files from the working directory when any. Omniscient parser phylosophy: Parse by Parent/child relationship ELSE Parse by a comomn tag (an attribute value shared by feature that must be grouped together. By default we are using locus_tag and gene_id as locus tag, but you can specify the one of your choice ELSE Parse sequentially (features are grouped in a bucket, and the bucket change at each level2 feature met, and bucket(s) are linked to the first l1 top feature met) Usage: agat_convert_sp_gxf2gxf.pl -g infile.gff [ -o outfile ] agat_convert_sp_gxf2gxf.pl --help Options: -g, --gff or -ref Input GTF/GFF file. -c or --ct When the features do not have Parent/ID relationships, the parser will try to group features using a common/shared attribute (i.e. a locus tag.). By default locus_tag and gene_id. You can replace the default common/shared attributes by providing your own(s) using this option. Use comma separated list when providing several. --efl or --expose If you want to see, add or modified the feature relationships you will have to use this option. It will copy past in you working directory the json files used to define the relation between feature types and their level organisation. Typical level organisation: Level1 => gene; Level2 => mRNA; level3 => exon,cds,utrs If you get warning from the Omniscient parser that a feature relationship is not defined, you can provide information about it within the exposed json files. Indeed, if the json files exists in your working directory, they will be used by default. --ml or --merge_loci Merge loci parameter, default deactivated. You turn on the parameter if you want to merge loci into one locus when they overlap. (at CDS level for mRNA, at exon level for other level2 features. Strand has to be the same). Prokaryote can have overlaping loci so it should not use it for prokaryote annotation. In eukaryote, loci rarely overlap. Overlaps could be due to error in the file, mRNA can be merged under the same parent gene if you acticate the option. -v Verbose option. To modify verbosity. Default is 1. 0 is quiet, 2 and 3 are increasing verbosity. --nc or --no_check To deacticate all check that can be performed by the parser (e.g fixing UTR, exon, coordinates etc...) --debug For debug purpose -o or --output Output GFF file. If no output file is specified, the output will be written to STDOUT. --gvi or --gff_version_input If you don't want to use the autodection of the gff/gft version you give as input, you can force the tool to use the parser of the gff version you decide to use: 1,2,2.5 or 3. Remind: 2.5 is suposed to be gtf. --gvo or --gff_version_output If you don't want to use the autodection of the gff/gft version you give as input, you can force the tool to use the parser of the gff version you decide to use: 1,2,2.5 or 3. Remind: 2.5 is suposed to be gtf. -h or --help Display this helpful text. Feedback: Did you find a bug?: Do not hesitate to report bugs to help us keep track of the bugs and their resolution. Please use the GitHub issue tracking system available at this address: https://github.com/NBISweden/AGAT/issues Ensure that the bug was not already reported by searching under Issues. If you're unable to find an (open) issue addressing the problem, open a new one. Try as much as possible to include in the issue when relevant: - a clear description, - as much relevant information as possible, - the command used, - a data sample, - an explanation of the expected behaviour that is not occurring. Do you want to contribute?: You are very welcome, visit this address for the Contributing guidelines: https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md ``` software ref: research ref: