SequenceBouncer 1.19

SequenceBouncer: A method to remove outlier entries from a multiple sequence

alignment

Location and version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
$ which SequenceBouncer.py
/local/cluster/bin/SequenceBouncer.py
$ SequenceBouncer.py --help

SequenceBouncer: A method to remove outlier entries from a multiple sequence alignment

Cory Dunn
University of Helsinki
cory.dunn@helsinki.fi
Version: 1.19
Please cite DOI: 10.1101/2020.11.24.395459

help message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
$ SequenceBouncer.py --help

SequenceBouncer: A method to remove outlier entries from a multiple sequence alignment

Cory Dunn
University of Helsinki
cory.dunn@helsinki.fi
Version: 1.19
Please cite DOI: 10.1101/2020.11.24.395459
___

usage: SequenceBouncer.py [-h] -i INPUT_FILE [-o OUTPUT_FILE]
                          [-g GAP_PERCENT_CUT] [-k IQR_COEFFICIENT]
                          [-n SUBSAMPLE_SIZE] [-t TRIALS] [-s STRINGENCY]
                          [-r RANDOM_SEED]

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input_file INPUT_FILE
                        Input file in FASTA format.
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        Output filename [do not include extensions] (default
                        will be 'input_file.ext').
  -g GAP_PERCENT_CUT, --gap_percent_cut GAP_PERCENT_CUT
                        For columns with a greater fraction of gaps than the
                        selected value, expressed in percent, data will be
                        ignored in calculations (default is 2).
  -k IQR_COEFFICIENT, --IQR_coefficient IQR_COEFFICIENT
                        Coefficient multiplied by the interquartile range that
                        helps to define an outlier sequence (default is 1.0).
  -n SUBSAMPLE_SIZE, --subsample_size SUBSAMPLE_SIZE
                        |> Available for large alignments | The size of a
                        single sample taken from the full dataset (default is
                        entire alignment, but try a subsample size of 50 or
                        100 for large alignments).
  -t TRIALS, --trials TRIALS
                        |> Available for large alignments | Number of times
                        each sequence is sampled and tested (default is to
                        examine all sequences in one single trial, but 5 or 10
                        trials may work well when subsamples are taken from
                        large alignments).
  -s STRINGENCY, --stringency STRINGENCY
                        |> Available for large alignments | 1: Minimal
                        stringency 2: Moderate stringency 3: Maximum
                        stringency (default is moderate stringency).
  -r RANDOM_SEED, --random_seed RANDOM_SEED
                        Random seed (integer) to be used during a sampling-
                        based approach (default is that the seed is randomly
                        selected). The user can use this seed to obtain
                        reproducible output and should note it in their
                        publications.

software ref: https://github.com/corydunnlab/SequenceBouncer
research ref: https://doi.org/10.1101/2020.11.24.395459