PPanGGOLiN 1.1.136

PPanGGOLiN : Depicting microbial species diversity via a Partitioned PanGenome Graph Of Linked Neighbors

PPanGGOLiN is a software suite used to create and manipulate prokaryotic pangenomes from a set of either genomic DNA sequences or provided genome annotations. It is designed to scale up to tens of thousands of genomes. It has the specificity to partition the pangenome using a statistical approach rather than using fixed thresholds which gives it the ability to work with low-quality data such as Metagenomic Assembled Genomes (MAGs) or Single-cell Amplified Genomes (SAGs) thus taking advantage of large scale environmental studies and letting users study the pangenome of uncultivable species.

PPanGGOLiN builds pangenomes through a graphical model and a statistical method to partition gene families in persistent, shell and cloud genomes. It integrates both information on protein-coding genes and their genomic neighborhood to build a graph of gene families where each node is a gene family and each edge is a relation of genetic contiguity. The partitioning method promotes that two gene families that are consistent neighbors in the graph are more likely to belong to the same partition. It results in a Partitioned Pangenome Graph (PPG) made of persistent, shell and cloud nodes drawing genomes on rails like a subway map to help biologists navigate the great diversity of microbial life.

Moreover, the panRGP method (Bazin et al. 2020) included in PPanGGOLiN predicts, for each genome, Regions of Genome Plasticity (RGPs) that are clusters of genes made of shell and cloud genomes in the pangenome graph. Most of them arise from Horizontal gene transfer (HGT) and correspond to Genomic Islands (GIs). RGPs from different genomes are next grouped in spots of insertion based on their conserved flanking persistent genes.

To activate:

1
2
bash
source /local/cluster/ppanggolin/activate.sh

Location and version:

1
2
3
4
$ which ppanggolin
/local/cluster/ppanggolin/bin/ppanggolin
$ ppanggolin --version
ppanggolin 1.1.136

help message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
$ ppanggolin --help
usage: ppanggolin [-h] [-v]  ...

Depicting microbial species diversity via a Partitioned PanGenome Graph Of Linked Neighbors

optional arguments:
  -h, --help     show this help message and exit
  -v, --version  show program's version number and exit

subcommands:

  All of the following subcommands have their own set of options. To see them for a given subcommand, use it with -h or --help, as such:
    ppanggolin <subcommand> -h

    Basic:
      workflow      Easy workflow to run a pangenome analysis in one go
      panrgp        Easy workflow to run a pangenome analysis with genomic islands and spots of insertion detection

    Expert:
      annotate      Annotate genomes
      cluster       Cluster proteins in protein families
      graph         Create the pangenome graph
      partition     Partition the pangenome graph
      rarefaction   Compute the rarefaction curve of the pangenome
      msa           Compute Multiple Sequence Alignments for pangenome gene families

    Output:
      draw          Draw figures representing the pangenome through different aspects
      write         Writes 'flat' files representing the pangenome that can be used with other softwares
      fasta         Writes fasta files for different elements of the pangenome
      info          Prints information about a given pangenome graph file

    Regions of genomic Plasticity:
      align         Aligns a genome or a set of proteins to the pangenome gene families representatives and predict informations from it
      rgp           Predicts Regions of Genomic Plasticity in the genomes of your pangenome
      spot          Predicts spots in your pangenome

software ref: https://github.com/labgem/PPanGGOLiN
research ref: https://doi.org/10.1371/journal.pcbi.1007732