pb-assembly 0.0.8
pb-assembly
pb-assembly is the bioconda recipe encompassing all code and dependencies necessary to run:
- FALCON assembly pipeline
- FALCON-Unzip to phase the genome and perform phased-polishing with Arrow
- FALCON-Phase to extend phasing between unzipped haplotig blocks (requires HiC data)
Installed package recipes include:
- pb-falcon
- pb-dazzler
- genomicconsensus
- etc (all other dependencies)
FALCON and FALCON-Unzip
FALCON and FALCON-Unzip are de novo genome assemblers for PacBio long reads, also known as Single-Molecule Real-Time (SMRT) sequences. FALCON is a diploid-aware assembler which follows the hierarchical genome assembly process (HGAP) and is optimized for large genome assembly though microbial genomes can also be assembled. FALCON produces a set of primary contigs (p-contigs) as the primary assembly and a set of associate contigs (a-contigs) which represent divergent allelic variants. Each a-contig is associated with a homologous genomic region on an p-contig.
FALCON-Unzip is a true diploid assembler. It takes the contigs from FALCON and phases the reads based on heterozygous SNPs identified in the initial assembly. It then produces a set of partially-phased primary contigs and fully-phased haplotigs which represent divergent haplotypes.
NOTE: Please ensure your settings in the fc_run.cfg
file includes setting
NPROC * njobs = NTOTAL that corresponds to the NTOTAL processors you check out
using SGE_Batch
and/or SGE_Array
, e.g. if NTOTAL = 64 then use -P 64
in
your SGE command.
To activate:
|
|
Location:
|
|
help message:
|
|
Configuration files can be found here:
|
|
software ref: https://github.com/PacificBiosciences/pb-assembly
research ref: https://www.ncbi.nlm.nih.gov/pubmed/27749838
research ref: https://www.ncbi.nlm.nih.gov/pubmed/23644548