Conda
See the ‘activating the conda environment’ section below to access this
software.
CONSENT-2.2.2
CONSENT (Scalable long read self-correction and assembly polishing with
multiple sequence alignment) is a self-correction method for long reads. It
works by, first, computing overlaps between the long reads, in order to define
an alignment pile (i.e. a set of overlapping reads used for correction) for
each read. Each read’s alignment pile is then further divided into smaller
windows, that are corrected idependently. First, a multiple alignment
strategy is used in order to compute consensus. Then, this consensus is
further polished with a local de Bruijn graph, in order to get rid of the
remaining errors. Additionally to error correction, CONSENT can also perform
assembly polishing.
Running CONSENT
Self-correction
To run CONSENT for long reads self-correction, run the following command:
./CONSENT-correct --in longReads.fast[a|q] --out result.fasta --type readsTechnology
- longReads.fast[a|q]: fasta or fastq file of long reads to .
- result.fasta: fasta file where to output the corrected long reads.
- readsTechnology: Indicate whether the long reads are from PacBio (–type PB) or Oxford Nanopore (–type ONT)
Polishing
To run CONSENT for assembly polishing, run the followning command:
./CONSENT-polish --contigs contigs.fast[a|q] --reads longReads.fast[a|q] --out result.fasta
- contigs.fast[a|q]: fasta or fastq file of contigs to polish.
- longReads.fast[a|q]: fasta or fastq file of long reads to use for polishing.
- result.fasta: fasta file where to output the polished contigs.
Options
--windowSize INT, -l INT: Size of the windows to process. (default: 500)
--minSupport INT, -s INT: Minimum support to consider a window for correction. (default: 4)
--maxSupport INT, -S INT: Maximum number of overlaps to include in a pile. (default: 150)
--maxMSA INT, -M: Maximum number of sequences to include into the MSA. (default: 150)
--merSize INT, -k INT: k-mer size for chaining and polishing. (default: 9)
--solid INT, -f INT: Minimum number of occurrences to consider a k-mer as solid during polishing. (default: 4)
--anchorSupport INT, -c INT: Minimum number of sequences supporting (Ai) - (Ai+1) to keep the two anchors in the chaining. (default: 8)
--minAnchors INT, -a INT: Minimum number of anchors in a window to allow consensus computation. (default: 2)
--windowOverlap INT, -o INT: Overlap size between consecutive windows. (default: 50)
--nproc INT, -j INT: Number of processes to run in parallel (default: number of cores).
--minimapIndex INT, -m INT: Split minimap2 index every INT input bases (default: 500M).
--tmpdir STRING, -t STRING: Path where to store the temporary overlaps file (default: working directory, as Alignments_dateTimeStamp.paf).
--help, -h: Print this help message.
Activating the conda environment
Check out a node with qrsh
and run
1
2
|
bash
source /local/cluster/consent/activate.sh
|
To run over SGE, include the source
line above in your shell script prior to
the CONSENT
commands.
Location and version
1
2
3
4
|
$ which CONSENT-correct
/local/cluster/consent-2.2.2/bin/CONSENT-correct
$ CONSENT-correct --version
CONSENT v2.2.2
|
help message
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
$ CONSENT-correct --help
Usage: /local/cluster/consent-2.2.2/bin/CONSENT-correct [options] --in longReads.fast[a|q] --out result.fasta --type readsTechnology
Input:
longReads.fast[a|q]: fasta or fastq file of long reads to correct.
result.fasta: fasta file where to output the corrected long reads.
readsTechnology: Indicate whether the long reads are from PacBio (--type PB) or Oxford Nanopore (--type ONT)
Options:
--windowSize INT, -l INT: Size of the windows to process. (default: 500)
--minSupport INT, -s INT: Minimum support to consider a window for correction. (default: 3)
--maxSupport INT, -S INT: Maximum number of overlaps to include in a pile. (default: 150)
--merSize INT, -k INT: k-mer size for chaining and polishing. (default: 9)
--solid INT, -f INT: Minimum number of occurrences to consider a k-mer as solid during polishing. (default: 4)
--anchorSupport INT, -c INT: Minimum number of sequences supporting (Ai) - (Ai+1) to keep the two anchors in the chaining. (default: 8)
--minAnchors INT, -a INT: Minimum number of anchors in a window to allow consensus computation. (default: 10)
--windowOverlap INT, -o INT: Overlap size between consecutive windows. (default: 50)
--nproc INT, -j INT: Number of processes to run in parallel (default: number of cores).
--minimapIndex INT, -m INT: Split minimap2 index every INT input bases (default: 500M).
--tmpdir STRING, -t STRING: Path where to store the temporary files (default: working directory).
--help, -h: Print this help message.
--version, -v: Print version information.
$ CONSENT-polish --help
Usage: /local/cluster/consent-2.2.2/bin/CONSENT-polish [options] --contigs contigs.fast[a|q] --reads longReads.fast[a|q] --out result.fasta
Input:
contigs.fast[a|q]: fasta or fastq file of contigs to polish.
longReads.fast[a|q]: fasta or fastq file of long reads to use for polishing.
result.fasta: fasta file where to output the polished contigs.
Options:
--windowSize INT, -l INT: Size of the windows to process. (default: 500)
--minSupport INT, -s INT: Minimum support to consider a window for correction. (default: 1)
--maxSupport INT, -S INT: Maximum number of overlaps to include in a pile. (default: 20,000)
--merSize INT, -k INT: k-mer size for chaining and polishing. (default: 9)
--solid INT, -f INT: Minimum number of occurrences to consider a k-mer as solid during polishing. (default: 4)
--anchorSupport INT, -c INT: Minimum number of sequences supporting (Ai) - (Ai+1) to keep the two anchors in the chaining. (default: 8)
--minAnchors INT, -a INT: Minimum number of anchors in a window to allow consensus computation. (default: 10)
--windowOverlap INT, -o INT: Overlap size between consecutive windows. (default: 50)
--nproc INT, -j INT: Number of processes to run in parallel (default: number of cores).
--minimapIndex INT, -m INT: Split minimap2 index every INT input bases (default: 500M).
--tmpdir STRING, -t STRING: Path where to store the temporary files (default: working directory).
--help, -h: Print this help message.
--version, -v: Print version information.
|
software ref: https://github.com/morispi/CONSENT
research ref: https://doi.org/10.1038/s41598-020-80757-5