Contents

PGA

Installed
This software should be available with no extra configuration.

PGA-20230303

General Introduction to PGA

PGA (Plastid Genome Annotator), a standalone command line tool, can perform rapid, accurate, and flexible batch annotation of newly generated target plastomes based on well-annotated reference plastomes. In contrast to current existing tools, PGA uses reference plastomes as the query and unannotated target plastomes as the subject to locate genes, which we refer to as the reverse query-subject BLAST search approach. PGA accurately identifies gene and intron boundaries as well as intron loss. The program outputs GenBank-formatted files as well as a log file to assist users in verifying annotations.

We thank Rong Zhang, Ying-Ying Yang and Jian-Jun Jin from Kunming Institute of Botany Chinese Academy of Sciences, and Pin Gong from Institute of Botany Chinese Academy of Sciences for improving this tool.

Following six steps will be conducted to annotate plastomes: (1) Preparation of GenBank-formatted reference plastomes; (2) Preparation of FASTA-formatted target plastomes; (3) Reference database generation; (4) BLAST search; (5) Determining feature boundaries; (6) Generating GenBank and log files.

PGA flowchart

Location

1
2
$ which PGA.pl
/local/cluster/bin/PGA.pl

help message

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
$ PGA.pl -h
Usage:
        PGA.pl -r -t [-i -p -q -o -f -l]
        Copyright (C) 2020 Xiao-Jian Qu
        Please contact <quxiaojian@sdnu.edu.cn>, if you have any bugs or questions.

        [-h -help]         help information.
        [-r -reference]    required: (default: reference) input directory name containing GenBank-
                           formatted file(s) that from the same or close families.
        [-t -target]       required: (default: target) input directory name containing FASTA-
                           formatted file(s) that will be annotated.
        [-i -ir]           optional: (default: 1000) minimum allowed inverted-repeat (IR) length.
        [-p -pidentity]    optional: (default: 40) any PCGs with a TBLASTN percent identity less
                           than this value will be listed in the log file and will not be annotated.
        [-q -qcoverage]    optional: (default: 0.5,2) any PCGs with a query coverage per annotated
                           PCG less or greater than each of these two values (<1,>1) will be listed
                           in the log file.
        [-o -out]          optional: (default: gb) output directory name.
        [-f -form]         optional: (default: circular) circular or linear form for FASTA-formatted
                           file.
        [-l -log]          optional: (default: warning) log file name containing warning information
                           for annotated GenBank-formatted file(s).

software ref: https://github.com/quxiaojian/PGA
research ref: https://doi.org/10.1186/s13007-019-0435-7