Contents

CONSENT 2.2.2

Conda
See the ‘activating the conda environment’ section below to access this software.

CONSENT (Scalable long read self-correction and assembly polishing with multiple sequence alignment) is a self-correction method for long reads. It works by, first, computing overlaps between the long reads, in order to define an alignment pile (i.e. a set of overlapping reads used for correction) for each read. Each read’s alignment pile is then further divided into smaller windows, that are corrected idependently. First, a multiple alignment strategy is used in order to compute consensus. Then, this consensus is further polished with a local de Bruijn graph, in order to get rid of the remaining errors. Additionally to error correction, CONSENT can also perform assembly polishing.

Self-correction

To run CONSENT for long reads self-correction, run the following command:

./CONSENT-correct --in longReads.fast[a|q] --out result.fasta --type readsTechnology

  • longReads.fast[a|q]: fasta or fastq file of long reads to .
  • result.fasta: fasta file where to output the corrected long reads.
  • readsTechnology: Indicate whether the long reads are from PacBio (–type PB) or Oxford Nanopore (–type ONT)

Polishing

To run CONSENT for assembly polishing, run the followning command:

./CONSENT-polish --contigs contigs.fast[a|q] --reads longReads.fast[a|q] --out result.fasta

  • contigs.fast[a|q]: fasta or fastq file of contigs to polish.
  • longReads.fast[a|q]: fasta or fastq file of long reads to use for polishing.
  • result.fasta: fasta file where to output the polished contigs.

Options

  --windowSize INT, -l INT:      Size of the windows to process. (default: 500)
  --minSupport INT, -s INT:      Minimum support to consider a window for correction. (default: 4)
  --maxSupport INT, -S INT:      Maximum number of overlaps to include in a pile. (default: 150)
  --maxMSA INT, -M:              Maximum number of sequences to include into the MSA. (default: 150)
  --merSize INT, -k INT:         k-mer size for chaining and polishing. (default: 9)
  --solid INT, -f INT:           Minimum number of occurrences to consider a k-mer as solid during polishing. (default: 4)
  --anchorSupport INT, -c INT:   Minimum number of sequences supporting (Ai) - (Ai+1) to keep the two anchors in the chaining. (default: 8)
  --minAnchors INT, -a INT:      Minimum number of anchors in a window to allow consensus computation. (default: 2)
  --windowOverlap INT, -o INT:   Overlap size between consecutive windows. (default: 50)
  --nproc INT, -j INT:           Number of processes to run in parallel (default: number of cores).
  --minimapIndex INT, -m INT:    Split minimap2 index every INT input bases (default: 500M).
  --tmpdir STRING, -t STRING:    Path where to store the temporary overlaps file (default: working directory, as Alignments_dateTimeStamp.paf).
  --help, -h:                    Print this help message.

Activating the conda environment

Check out a node with qrsh and run

1
2
bash
source /local/cluster/consent/activate.sh

To run over SGE, include the source line above in your shell script prior to the CONSENT commands.

Location and version

1
2
3
4
$ which CONSENT-correct
/local/cluster/consent-2.2.2/bin/CONSENT-correct
$ CONSENT-correct --version
CONSENT v2.2.2

help message

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
$ CONSENT-correct --help
Usage: /local/cluster/consent-2.2.2/bin/CONSENT-correct [options] --in longReads.fast[a|q] --out result.fasta --type readsTechnology

	Input:
	longReads.fast[a|q]:           fasta or fastq file of long reads to correct.
	result.fasta:                  fasta file where to output the corrected long reads.
	readsTechnology:               Indicate whether the long reads are from PacBio (--type PB) or Oxford Nanopore (--type ONT)

	Options:
	--windowSize INT, -l INT:      Size of the windows to process. (default: 500)
	--minSupport INT, -s INT:      Minimum support to consider a window for correction. (default: 3)
	--maxSupport INT, -S INT:      Maximum number of overlaps to include in a pile. (default: 150)
	--merSize INT, -k INT:         k-mer size for chaining and polishing. (default: 9)
	--solid INT, -f INT:           Minimum number of occurrences to consider a k-mer as solid during polishing. (default: 4)
	--anchorSupport INT, -c INT:   Minimum number of sequences supporting (Ai) - (Ai+1) to keep the two anchors in the chaining. (default: 8)
	--minAnchors INT, -a INT:      Minimum number of anchors in a window to allow consensus computation. (default: 10)
	--windowOverlap INT, -o INT:   Overlap size between consecutive windows. (default: 50)
	--nproc INT, -j INT:           Number of processes to run in parallel (default: number of cores).
	--minimapIndex INT, -m INT:    Split minimap2 index every INT input bases (default: 500M).
	--tmpdir STRING, -t STRING:    Path where to store the temporary files (default: working directory).
	--help, -h:                    Print this help message.
	--version, -v: 	              Print version information.
$ CONSENT-polish --help
Usage: /local/cluster/consent-2.2.2/bin/CONSENT-polish [options] --contigs contigs.fast[a|q] --reads longReads.fast[a|q] --out result.fasta

	Input:
	contigs.fast[a|q]:             fasta or fastq file of contigs to polish.
	longReads.fast[a|q]:           fasta or fastq file of long reads to use for polishing.
	result.fasta:                  fasta file where to output the polished contigs.

	Options:
	--windowSize INT, -l INT:      Size of the windows to process. (default: 500)
	--minSupport INT, -s INT:      Minimum support to consider a window for correction. (default: 1)
	--maxSupport INT, -S INT:      Maximum number of overlaps to include in a pile. (default: 20,000)
	--merSize INT, -k INT:         k-mer size for chaining and polishing. (default: 9)
	--solid INT, -f INT:           Minimum number of occurrences to consider a k-mer as solid during polishing. (default: 4)
	--anchorSupport INT, -c INT:   Minimum number of sequences supporting (Ai) - (Ai+1) to keep the two anchors in the chaining. (default: 8)
	--minAnchors INT, -a INT:      Minimum number of anchors in a window to allow consensus computation. (default: 10)
	--windowOverlap INT, -o INT:   Overlap size between consecutive windows. (default: 50)
	--nproc INT, -j INT:           Number of processes to run in parallel (default: number of cores).
	--minimapIndex INT, -m INT:    Split minimap2 index every INT input bases (default: 500M).
	--tmpdir STRING, -t STRING:    Path where to store the temporary files (default: working directory).
	--help, -h:                    Print this help message.
	--version, -v: 	              Print version information.

software ref: https://github.com/morispi/CONSENT
research ref: https://doi.org/10.1038/s41598-020-80757-5