SequenceBouncer: A method to remove outlier entries from a multiple sequence
alignment
Location and version:
1
2
3
4
5
6
7
8
9
10
11
|
$ which SequenceBouncer.py
/local/cluster/bin/SequenceBouncer.py
$ SequenceBouncer.py --help
SequenceBouncer: A method to remove outlier entries from a multiple sequence alignment
Cory Dunn
University of Helsinki
cory.dunn@helsinki.fi
Version: 1.19
Please cite DOI: 10.1101/2020.11.24.395459
|
help message:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
|
$ SequenceBouncer.py --help
SequenceBouncer: A method to remove outlier entries from a multiple sequence alignment
Cory Dunn
University of Helsinki
cory.dunn@helsinki.fi
Version: 1.19
Please cite DOI: 10.1101/2020.11.24.395459
___
usage: SequenceBouncer.py [-h] -i INPUT_FILE [-o OUTPUT_FILE]
[-g GAP_PERCENT_CUT] [-k IQR_COEFFICIENT]
[-n SUBSAMPLE_SIZE] [-t TRIALS] [-s STRINGENCY]
[-r RANDOM_SEED]
optional arguments:
-h, --help show this help message and exit
-i INPUT_FILE, --input_file INPUT_FILE
Input file in FASTA format.
-o OUTPUT_FILE, --output_file OUTPUT_FILE
Output filename [do not include extensions] (default
will be 'input_file.ext').
-g GAP_PERCENT_CUT, --gap_percent_cut GAP_PERCENT_CUT
For columns with a greater fraction of gaps than the
selected value, expressed in percent, data will be
ignored in calculations (default is 2).
-k IQR_COEFFICIENT, --IQR_coefficient IQR_COEFFICIENT
Coefficient multiplied by the interquartile range that
helps to define an outlier sequence (default is 1.0).
-n SUBSAMPLE_SIZE, --subsample_size SUBSAMPLE_SIZE
|> Available for large alignments | The size of a
single sample taken from the full dataset (default is
entire alignment, but try a subsample size of 50 or
100 for large alignments).
-t TRIALS, --trials TRIALS
|> Available for large alignments | Number of times
each sequence is sampled and tested (default is to
examine all sequences in one single trial, but 5 or 10
trials may work well when subsamples are taken from
large alignments).
-s STRINGENCY, --stringency STRINGENCY
|> Available for large alignments | 1: Minimal
stringency 2: Moderate stringency 3: Maximum
stringency (default is moderate stringency).
-r RANDOM_SEED, --random_seed RANDOM_SEED
Random seed (integer) to be used during a sampling-
based approach (default is that the seed is randomly
selected). The user can use this seed to obtain
reproducible output and should note it in their
publications.
|
software ref: https://github.com/corydunnlab/SequenceBouncer
research ref: https://doi.org/10.1101/2020.11.24.395459