pandora 0.9.1

Pandora

Pandora is a tool for bacterial genome analysis using a pangenome reference graph (PanRG). It allows gene presence/absence detection and genotyping of SNPs, indels and longer variants in one or a number of samples. Pandora works with Illumina or Nanopore data. For more details, see our paper.

The PanRG is a collection of ‘floating’ local graphs (PRGs), each representing some orthologous region of interest (e.g. genes, mobile elements or intergenic regions). See https://github.com/leoisl/make_prg for a tool which can construct these PanRGs from a set of aligned sequence files.

Pandora can do the following for a single sample (read dataset):

  • Output inferred mosaic of reference sequences for loci (eg genes) from the PRGs which are present in the PanRG;
  • Output a VCF showing the variation found within these loci, with respect to any reference path in the PRGs;
  • Discovery of new variation not in the PanRG.

For a collection of samples, it can:

  • Output a matrix showing inferred presence-absence of each locus in each sample genome;
  • Output a multisample pangenome VCF including genotype calls for each sample in each of the loci. Variation is shown with respect to the most informative recombinant path in the PRGs (see our paper).

Location and version:

1
2
3
4
$ which pandora
/local/cluster/bin/pandora
$ pandora --version
pandora version 0.9.1

help message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
$ pandora --help
Pandora: Pan-genome inference and genotyping with long noisy or short accurate reads.
Usage: pandora [OPTIONS] SUBCOMMAND

Options:
  -h,--help                   Print this help message and exit
  -V,--version

Subcommands:
  index                       Index population reference graph (PRG) sequences.
  map                         Quasi-map reads to an indexed PRG, infer the sequence of present loci in the sample, and optionally genotype variants.
  compare                     Quasi-map reads from multiple samples to an indexed PRG, infer the sequence of present loci in each sample, and call variants between the samples.
  discover                    Quasi-map reads to an indexed PRG, infer the sequence of present loci in the sample and discover novel variants.
  walk                        Outputs a path through the nodes in a PRG corresponding to the either an input sequence (if it exists) or the top/bottom path
  seq2path                    For each sequence, return the path through the PRG
  get_vcf_ref                 Outputs a fasta suitable for use as the VCF reference using input sequences
  random                      Outputs a fasta of random paths through the PRGs
  merge_index                 Allows multiple indices to be merged (no compatibility check)

software ref: https://github.com/rmcolq/pandora
research ref: https://doi.org/10.1186/s13059-021-02473-1