agat 0.8.0

2021-10-07 1243 words 6 minutes

AGAT - Another Gtf/Gff Analysis Toolkit

AGAT has the power to check, fix, pad missing information (features/attributes) of any kind of GTF and GFF to create complete, sorted and standardised gff3 format. Over the years it has been enriched by many many tools to perform just about any tasks that is possible related to GTF/GFF format files (sanitizing, conversions, merging, modifying, filtering, FASTA sequence extraction, adding information, etc). Comparing to other methods AGAT is robust to even the most despicable GTF/GFF files.

Activate:

1
2


bash
source /local/cluster/agat/activate.sh

Location and version:

1
2


$ which agat_convert_sp_gxf2gxf.pl
/local/cluster/agat/bin/agat_convert_sp_gxf2gxf.pl

help message (See the github page for more info - there are many tools):

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140


$ agat_convert_sp_gxf2gxf.pl --help

 ------------------------------------------------------------------------------
|   Another GFF Analysis Toolkit (AGAT) - Version: v0.8.0                      |
|   https://github.com/NBISweden/AGAT                                          |
|   National Bioinformatics Infrastructure Sweden (NBIS) - www.nbis.se         |
 ------------------------------------------------------------------------------


Name:
    agat_convert_sp_gxf2gxf.pl

Description:
    This script fixes and/or standardizes any GTF/GFF file into full sorted
    GFF3 file. The output GFF syntax is shaped by bioperl and choose among
    the versions 1,2,2.5 (GTF equivalent) and 3. For a correct GTF file, it
    is recommended to use agat_convert_sp_gff2gtf.pl

    Without specifying an input GTF/GFF version, the Omniscient parser will
    first detect automtically the most appropriate GFF parser to use from
    bioperl (GFF1,GFF2,GFF3) in order to read you file properly. Then the
    Omniscient parser removes duplicate features, fixes duplicated IDs, adds
    missing ID and/or Parent attributes, deflates factorized attributes
    (attributes with several parents are duplicated with uniq ID), add
    missing features when possible (e.g. add exon if only CDS described, add
    UTR if CDS and exon described), fix feature locations (e.g. check exon
    is embedded in the parent features mRNA, gene), etc... All AGAT's
    scripts with the _sp_ prefix use the same parser, before to perform
    supplement tasks. With that script you can tuned the Omniscient parser
    behaviour. I.e. you can decide to merge loci that have an overlap at
    their CDS features (Only one top feature is kept (gene), and the mRNA
    features become isoforms). This is not activated by default in case you
    are working on a prokaryote annotation that often have overlaping loci.
    The Omniscient parser defines relationship between features using 3
    levels. e.g Level1=gene; Level2=mRNA,tRNA; Level3=exon,cds,utr. The
    feature type information is stored within the 3rd column of a GTF/GFF
    file. The parser need to know to which level a feature type is part of.
    This information is stored by default in a json file coming with the
    tool. We have implemented the most common feature types met in gff/gtf
    files. If a feature type is not yet handle by the parser it will throw a
    warning. You can easily inform the parser how to handle it (level1,
    level2 or level3) by modifying the appropriate json file. How to access
    the json files? Easy just use the --expose option and the json files
    will appear in the working folder. By default, the Omniscient parser use
    the json files from the working directory when any.

    Omniscient parser phylosophy:

     Parse by Parent/child relationship
       ELSE Parse by a comomn tag  (an attribute value shared by feature that must be grouped together.
            By default we are using locus_tag and gene_id as locus tag, but you can specify the one of your choice
         ELSE Parse sequentially (features are grouped in a bucket, and the bucket change at each level2 feature met, and bucket(s) are linked to the first l1 top feature met)

Usage:
        agat_convert_sp_gxf2gxf.pl -g infile.gff [ -o outfile ]
        agat_convert_sp_gxf2gxf.pl --help

Options:
    -g, --gff or -ref
            Input GTF/GFF file.

    -c or --ct
            When the features do not have Parent/ID relationships, the
            parser will try to group features using a common/shared
            attribute (i.e. a locus tag.). By default locus_tag and gene_id.
            You can replace the default common/shared attributes by
            providing your own(s) using this option. Use comma separated
            list when providing several.

    --efl or --expose
            If you want to see, add or modified the feature relationships
            you will have to use this option. It will copy past in you
            working directory the json files used to define the relation
            between feature types and their level organisation. Typical
            level organisation: Level1 => gene; Level2 => mRNA; level3 =>
            exon,cds,utrs If you get warning from the Omniscient parser that
            a feature relationship is not defined, you can provide
            information about it within the exposed json files. Indeed, if
            the json files exists in your working directory, they will be
            used by default.

    --ml or --merge_loci
            Merge loci parameter, default deactivated. You turn on the
            parameter if you want to merge loci into one locus when they
            overlap. (at CDS level for mRNA, at exon level for other level2
            features. Strand has to be the same). Prokaryote can have
            overlaping loci so it should not use it for prokaryote
            annotation. In eukaryote, loci rarely overlap. Overlaps could be
            due to error in the file, mRNA can be merged under the same
            parent gene if you acticate the option.

    -v      Verbose option. To modify verbosity. Default is 1. 0 is quiet, 2
            and 3 are increasing verbosity.

    --nc or --no_check
            To deacticate all check that can be performed by the parser (e.g
            fixing UTR, exon, coordinates etc...)

    --debug For debug purpose

    -o or --output
            Output GFF file. If no output file is specified, the output will
            be written to STDOUT.

    --gvi or --gff_version_input
            If you don't want to use the autodection of the gff/gft version
            you give as input, you can force the tool to use the parser of
            the gff version you decide to use: 1,2,2.5 or 3. Remind: 2.5 is
            suposed to be gtf.

    --gvo or --gff_version_output
            If you don't want to use the autodection of the gff/gft version
            you give as input, you can force the tool to use the parser of
            the gff version you decide to use: 1,2,2.5 or 3. Remind: 2.5 is
            suposed to be gtf.

    -h or --help
            Display this helpful text.

Feedback:
  Did you find a bug?:
    Do not hesitate to report bugs to help us keep track of the bugs and
    their resolution. Please use the GitHub issue tracking system available
    at this address:

                https://github.com/NBISweden/AGAT/issues

     Ensure that the bug was not already reported by searching under Issues.
     If you're unable to find an (open) issue addressing the problem, open a new one.
     Try as much as possible to include in the issue when relevant:
     - a clear description,
     - as much relevant information as possible,
     - the command used,
     - a data sample,
     - an explanation of the expected behaviour that is not occurring.

  Do you want to contribute?:
    You are very welcome, visit this address for the Contributing
    guidelines:
    https://github.com/NBISweden/AGAT/blob/master/CONTRIBUTING.md

software ref: https://github.com/NBISweden/AGAT
research ref: https://github.com/NBISweden/AGAT#how-to-cite