# Ribodetector 0.3.0 {{< admonition success "Installed" true >}} This software should be available with no extra configuration. {{< /admonition >}} ## ribodetector-0.3.0 Accurate and rapid RiboRNA sequences Detector based on deep learning `RiboDetector` is a software developed to accurately yet rapidly detect and remove rRNA sequences from metagenomeic, metatranscriptomic, and ncRNA sequencing data. It was developed based on LSTMs and optimized for both GPU and CPU usage to achieve a **10** times on CPU and **50** times on a consumer GPU faster runtime compared to the current state-of-the-art software. Moreover, it is very accurate, with ~**10** times fewer false classifications. Finally, it has a low level of bias towards any GO functional groups. ------------------------------------------------------------------------------- ## Location and version ```console $ which ribodetector /local/cluster/bin/ribodetector $ which ribodetector_cpu /local/cluster/bin/ribodetector_cpu $ ribodetector_cpu --version ribodetector_cpu 0.3.0 ``` ## help message ```console $ ribodetector_cpu --help usage: ribodetector_cpu [-h] [-c CONFIG] -l LEN -i [INPUT [INPUT ...]] -o [OUTPUT [OUTPUT ...]] [-r [RRNA [RRNA ...]]] [-e {rrna,norrna,both,none}] [-t THREADS] [--chunk_size CHUNK_SIZE] [--log LOG] [-v] rRNA sequence detector optional arguments: -h, --help show this help message and exit -c CONFIG, --config CONFIG Path of config file -l LEN, --len LEN Sequencing read length. Note: the accuracy reduces for reads shorter than 40. -i [INPUT [INPUT ...]], --input [INPUT [INPUT ...]] Path of input sequence files (fasta and fastq), the second file will be considered as second end if two files given. -o [OUTPUT [OUTPUT ...]], --output [OUTPUT [OUTPUT ...]] Path of the output sequence files after rRNAs removal (same number of files as input). (Note: 2 times slower to write gz files) -r [RRNA [RRNA ...]], --rrna [RRNA [RRNA ...]] Path of the output sequence file of detected rRNAs (same number of files as input) -e {rrna,norrna,both,none}, --ensure {rrna,norrna,both,none} Ensure which classificaion has high confidence for paired end reads. norrna: output only high confident non-rRNAs, the rest are clasified as rRNAs; rrna: vice versa, only high confident rRNAs are classified as rRNA and the rest output as non-rRNAs; both: both non-rRNA and rRNA prediction with high confidence; none: give label based on the mean probability of read pair. (Only applicable for paired end reads, discard the read pair when their predicitons are discordant) -t THREADS, --threads THREADS Number of threads to use. (default: 20) --chunk_size CHUNK_SIZE chunk_size * 1024 reads to load each time. When chunk_size=1000 and threads=20, consumming ~20G memory, better to be multiples of the number of threads.. --log LOG Log file name -v, --version show program's version number and exit # davised:Linux @ chrom1 in ~ [11:08:30] $ ribodetector --help usage: ribodetector [-h] [-c CONFIG] [-d DEVICEID] -l LEN -i [INPUT [INPUT ...]] -o [OUTPUT [OUTPUT ...]] [-r [RRNA [RRNA ...]]] [-e {rrna,norrna,both,none}] [-t THREADS] [-m MEMORY] [--chunk_size CHUNK_SIZE] [--log LOG] [-v] rRNA sequence detector optional arguments: -h, --help show this help message and exit -c CONFIG, --config CONFIG Path of config file -d DEVICEID, --deviceid DEVICEID Indices of GPUs to enable. Quotated comma-separated device ID numbers. (default: all) -l LEN, --len LEN Sequencing read length. Note: the accuracy reduces for reads shorter than 40. -i [INPUT [INPUT ...]], --input [INPUT [INPUT ...]] Path of input sequence files (fasta and fastq), the second file will be considered as second end if two files given. -o [OUTPUT [OUTPUT ...]], --output [OUTPUT [OUTPUT ...]] Path of the output sequence files after rRNAs removal (same number of files as input). (Note: 2 times slower to write gz files) -r [RRNA [RRNA ...]], --rrna [RRNA [RRNA ...]] Path of the output sequence file of detected rRNAs (same number of files as input) -e {rrna,norrna,both,none}, --ensure {rrna,norrna,both,none} Ensure which classificaion has high confidence for paired end reads. norrna: output only high confident non-rRNAs, the rest are clasified as rRNAs; rrna: vice versa, only high confident rRNAs are classified as rRNA and the rest output as non-rRNAs; both: both non-rRNA and rRNA prediction with high confidence; none: give label based on the mean probability of read pair. (Only applicable for paired end reads, discard the read pair when their predicitons are discordant) -t THREADS, --threads THREADS Number of threads to use. (default: 10) -m MEMORY, --memory MEMORY Amount (GB) of GPU RAM. (default: 12) --chunk_size CHUNK_SIZE Use this parameter when having low memory. Parsing the file in chunks. Not needed when free RAM >=5 * your_file_size (uncompressed, sum of paired ends). When chunk_size=256, memory=16 it will load 256 * 16 * 1024 reads each chunk (use ~20 GBfor 100bp paired end). --log LOG Log file name -v, --version show program's version number and exit ribodetector --help 4.87s user 6.13s system 249% cpu 4.409 total $ ribodetector_cpu --help usage: ribodetector_cpu [-h] [-c CONFIG] -l LEN -i [INPUT [INPUT ...]] -o [OUTPUT [OUTPUT ...]] [-r [RRNA [RRNA ...]]] [-e {rrna,norrna,both,none}] [-t THREADS] [--chunk_size CHUNK_SIZE] [--log LOG] [-v] rRNA sequence detector optional arguments: -h, --help show this help message and exit -c CONFIG, --config CONFIG Path of config file -l LEN, --len LEN Sequencing read length. Note: the accuracy reduces for reads shorter than 40. -i [INPUT [INPUT ...]], --input [INPUT [INPUT ...]] Path of input sequence files (fasta and fastq), the second file will be considered as second end if two files given. -o [OUTPUT [OUTPUT ...]], --output [OUTPUT [OUTPUT ...]] Path of the output sequence files after rRNAs removal (same number of files as input). (Note: 2 times slower to write gz files) -r [RRNA [RRNA ...]], --rrna [RRNA [RRNA ...]] Path of the output sequence file of detected rRNAs (same number of files as input) -e {rrna,norrna,both,none}, --ensure {rrna,norrna,both,none} Ensure which classificaion has high confidence for paired end reads. norrna: output only high confident non-rRNAs, the rest are clasified as rRNAs; rrna: vice versa, only high confident rRNAs are classified as rRNA and the rest output as non-rRNAs; both: both non-rRNA and rRNA prediction with high confidence; none: give label based on the mean probability of read pair. (Only applicable for paired end reads, discard the read pair when their predicitons are discordant) -t THREADS, --threads THREADS Number of threads to use. (default: 20) --chunk_size CHUNK_SIZE chunk_size * 1024 reads to load each time. When chunk_size=1000 and threads=20, consumming ~20G memory, better to be multiples of the number of threads.. --log LOG Log file name -v, --version show program's version number and exit ``` software ref: research ref: