pb-assembly pb-assembly is the bioconda recipe encompassing all code and dependencies necessary to run:
FALCON assembly pipeline FALCON-Unzip to phase the genome and perform phased-polishing with Arrow FALCON-Phase to extend phasing between unzipped haplotig blocks (requires HiC data) Installed package recipes include:
pb-falcon pb-dazzler genomicconsensus etc (all other dependencies) FALCON and FALCON-Unzip FALCON and FALCON-Unzip are de novo genome assemblers for PacBio long reads, also known as Single-Molecule Real-Time (SMRT) sequences.
The Julia Language Scientific computing has traditionally required the highest performance, yet domain experts have largely moved to slower dynamic languages for daily work. We believe there are many good reasons to prefer dynamic languages for these applications, and we do not expect their use to diminish. Fortunately, modern language design and compiler techniques make it possible to mostly eliminate the performance trade-off and provide a single environment productive enough for prototyping and efficient enough for deploying performance-intensive applications.
swiss prot database - manually curated functional annotation Updated on October 12, 2021
BLAST and DIAMOND databases can be found here:
1 /nfs1/CGRB/databases/swiss-prot/current software ref: https://www.uniprot.org/downloads
research ref: https://www.uniprot.org/help/uniprotkb_sections
Bracken (Bayesian Reestimation of Abundance with KrakEN) Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample. Braken uses the taxonomy labels assigned by Kraken, a highly accurate metagenomics classification algorithm, to estimate the number of reads originating from each species present in a sample. Kraken classifies reads to the best matching location in the taxonomic tree, but does not estimate abundances of species.
AGAT - Another Gtf/Gff Analysis Toolkit AGAT has the power to check, fix, pad missing information (features/attributes) of any kind of GTF and GFF to create complete, sorted and standardised gff3 format. Over the years it has been enriched by many many tools to perform just about any tasks that is possible related to GTF/GFF format files (sanitizing, conversions, merging, modifying, filtering, FASTA sequence extraction, adding information, etc). Comparing to other methods AGAT is robust to even the most despicable GTF/GFF files.
smalt SMALT aligns DNA sequencing reads with a reference genome.
Reads from a wide range of sequencing platforms can be processed, for example Illumina, Roche-454, Ion Torrent, PacBio or ABI-Sanger. Paired reads are supported. There is no support for SOLiD reads.
A mode for the detection of split (chimeric) reads is provided. Multi-threaded program execution is supported.
About SMALT SMALT employs a hash index of short words up to 20 nucleotides long and sampled at equidistant steps along the reference genome.