pb-assembly 0.0.8

pb-assembly

pb-assembly is the bioconda recipe encompassing all code and dependencies necessary to run:

  • FALCON assembly pipeline
  • FALCON-Unzip to phase the genome and perform phased-polishing with Arrow
  • FALCON-Phase to extend phasing between unzipped haplotig blocks (requires HiC data)

Installed package recipes include:

  • pb-falcon
  • pb-dazzler
  • genomicconsensus
  • etc (all other dependencies)

FALCON and FALCON-Unzip

FALCON and FALCON-Unzip are de novo genome assemblers for PacBio long reads, also known as Single-Molecule Real-Time (SMRT) sequences. FALCON is a diploid-aware assembler which follows the hierarchical genome assembly process (HGAP) and is optimized for large genome assembly though microbial genomes can also be assembled. FALCON produces a set of primary contigs (p-contigs) as the primary assembly and a set of associate contigs (a-contigs) which represent divergent allelic variants. Each a-contig is associated with a homologous genomic region on an p-contig.

FALCON-Unzip is a true diploid assembler. It takes the contigs from FALCON and phases the reads based on heterozygous SNPs identified in the initial assembly. It then produces a set of partially-phased primary contigs and fully-phased haplotigs which represent divergent haplotypes.

NOTE: Please ensure your settings in the fc_run.cfg file includes setting NPROC * njobs = NTOTAL that corresponds to the NTOTAL processors you check out using SGE_Batch and/or SGE_Array, e.g. if NTOTAL = 64 then use -P 64 in your SGE command.

To activate:

1
2
bash
source /local/cluster/pb-assembly/activate.sh

Location:

1
2
3
$ which fc_run
/local/cluster/pb-assembly/bin/fc_run
(/local/cluster/pb-assembly)

help message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
$ fc_run --help
falcon-kit 1.8.1 (pip thinks "falcon-kit 1.8.1")
pypeflow 2.3.0
usage: fc_run [-h] config [logger]

positional arguments:
  config      .cfg/.ini/.json
  logger      (Optional)JSON config for standard Python logging module

optional arguments:
  -h, --help  show this help message and exit
(/local/cluster/pb-assembly)

Configuration files can be found here:

1
/nfs1/CGRB/databases/software/pb-assembly/cfgs

software ref: https://github.com/PacificBiosciences/pb-assembly
research ref: https://www.ncbi.nlm.nih.gov/pubmed/27749838
research ref: https://www.ncbi.nlm.nih.gov/pubmed/23644548