infernal 1.1.4

Infernal - INFERence of RNA ALignment

Infernal is for searching DNA sequence databases for RNA structure and sequence similarities. It is an implementation of a special case of profile stochastic context-free grammars called covariance models (CMs). A CM is like a sequence profile, but it scores a combination of sequence consensus and RNA secondary structure consensus, so in many cases, it is more capable of identifying RNA homologs that conserve their secondary structure more than their primary sequence.

Location:

1
2
$ which cmsearch
/local/cluster/infernal/bin/cmsearch

help message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
$ cmsearch -h
# cmsearch :: search CM(s) against a sequence database
# INFERNAL 1.1.4 (Dec 2020)
# Copyright (C) 2020 Howard Hughes Medical Institute.
# Freely distributed under the BSD open source license.
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Usage: cmsearch [options] <cmfile> <seqdb>

Basic options:
  -h        : show brief help on version and usage
  -g        : configure CM for glocal alignment [default: local]
  -Z <x>    : set search space size in *Mb* to <x> for E-value calculations  (x>0)
  --devhelp : show list of otherwise hidden developer/expert options

Options directing output:
  -o <f>       : direct output to file <f>, not stdout
  -A <f>       : save multiple alignment of all significant hits to file <s>
  --tblout <f> : save parseable table of hits to file <s>
  --acc        : prefer accessions over names in output
  --noali      : don't output alignments, so output is smaller
  --notextw    : unlimit ASCII text output line width
  --textw <n>  : set max width of ASCII text output lines  [120]  (n>=120)
  --verbose    : report extra information; mainly useful for debugging

Options controlling reporting thresholds:
  -E <x> : report sequences <= this E-value threshold in output  [10.0]  (x>0)
  -T <x> : report sequences >= this score threshold in output

Options controlling inclusion (significance) thresholds:
  --incE <x> : consider sequences <= this E-value threshold as significant  [0.01]
  --incT <x> : consider sequences >= this score threshold as significant

Options controlling model-specific reporting thresholds:
  --cut_ga : use CM's GA gathering cutoffs as reporting thresholds
  --cut_nc : use CM's NC noise cutoffs as reporting thresholds
  --cut_tc : use CM's TC trusted cutoffs as reporting thresholds

Options controlling acceleration heuristics*:
  --max      : turn all heuristic filters off (slow)
  --nohmm    : skip all HMM filter stages, use only CM (slow)
  --mid      : skip first two HMM filter stages (SSV & Vit)
  --default  : default: run search space size-dependent pipeline  [default]
  --rfam     : set heuristic filters at Rfam-level (fast)
  --hmmonly  : use HMM only, don't use a CM at all
  --FZ <x>   : set filters to defaults used for a search space of size <x> Mb
  --Fmid <x> : with --mid, set P-value threshold for HMM stages to <x>  [0.02]

Other options*:
  --notrunc     : do not allow truncated hits at sequence termini
  --anytrunc    : allow full and truncated hits anywhere within sequences
  --nonull3     : turn off the NULL3 post hoc additional null model
  --mxsize <x>  : set max allowed alnment mx size to <x> Mb [df: autodetermined]
  --smxsize <x> : set max allowed size of search DP matrices to <x> Mb  [128.]
  --cyk         : use scanning CM CYK algorithm, not Inside in final stage
  --acyk        : align hits with CYK, not optimal accuracy
  --wcx <x>     : set W (expected max hit len) as <x> * cm->clen (model len)
  --toponly     : only search the top strand
  --bottomonly  : only search the bottom strand
  --tformat <s> : assert target <seqdb> is in format <s>: no autodetection
  --cpu <n>     : number of parallel CPU workers to use for multithreads

*Use --devhelp to show additional expert options.

software ref: http://eddylab.org/infernal/
research ref: https://doi.org/10.1093/bioinformatics/btt509