Installed
This software should be available with no extra configuration.
The PPP is a software platform with the goal of reducing the computational
expertise required for conducting population genomic analyses. The PPP was
designed as a collection of scripts that facilitate common population genomic
workflows in a consistent and standardized environment. Functions were
developed to encompass entire workflows, including: input preparation, file
format conversion, various population genomic analyses, output generation, and
visualization. By facilitating entire workflows, the PPP offers several
benefits to prospective end users - it reduces the need of redundant in-house
software and scripts that would require development time and may be
error-prone, or incorrect, depending on the expertise of the investigator. The
platform has also been developed with reproducibility and extensibility of
analyses in mind.
The current documentation may be found here.
A PDF of the documentation is also available for
download.
Location and version
1
2
3
4
|
$ which vcf_filter.py
/local/cluster/bin/vcf_filter.py
$ vcf_filter.py -h
initLogger - WARNING: PPP, version 0.1.13
|
help message
There are other scripts associated with this software as well.
Please see the full documentation for more information.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
|
$ vcf_filter.py -h
initLogger - WARNING: PPP, version 0.1.13
usage: vcf_filter.py [-h] --vcf VCF [--model-file MODEL_FILE] [--model MODEL]
[--out OUT] [--out-prefix OUT_PREFIX]
[--out-format {vcf, vcf.gz, bcf, bed, sites}]
[--overwrite] [--force-samples]
[--filter-include-indv FILTER_INCLUDE_INDV [FILTER_INCLUDE_INDV ...]
| --filter-exclude-indv FILTER_EXCLUDE_INDV
[FILTER_EXCLUDE_INDV ...]]
[--filter-include-indv-file FILTER_INCLUDE_INDV_FILE | --filter-exclude-indv-file FILTER_EXCLUDE_INDV_FILE]
[--filter-only-biallelic]
[--filter-min-alleles FILTER_MIN_ALLELES]
[--filter-max-alleles FILTER_MAX_ALLELES]
[--filter-max-missing FILTER_MAX_MISSING | --filter-max-missing-count FILTER_MAX_MISSING_COUNT]
[--filter-include-indels | --filter-exclude-indels]
[--filter-include-snps | --filter-exclude-snps]
[--filter-include-pos FILTER_INCLUDE_POS [FILTER_INCLUDE_POS ...]]
[--filter-exclude-pos FILTER_EXCLUDE_POS [FILTER_EXCLUDE_POS ...]]
[--filter-include-pos-file FILTER_INCLUDE_POS_FILE]
[--filter-exclude-pos-file FILTER_EXCLUDE_POS_FILE]
[--filter-include-bed FILTER_INCLUDE_BED]
[--filter-exclude-bed FILTER_EXCLUDE_BED]
[--filter-include-passed] [--filter-exclude-passed]
[--filter-include-flag FILTER_INCLUDE_FLAG [FILTER_INCLUDE_FLAG ...]]
[--filter-exclude-flag FILTER_EXCLUDE_FLAG [FILTER_EXCLUDE_FLAG ...]]
[--filter-include-snp FILTER_INCLUDE_SNP [FILTER_INCLUDE_SNP ...]]
[--filter-exclude-snp FILTER_EXCLUDE_SNP [FILTER_EXCLUDE_SNP ...]]
[--filter-include-snp-file FILTER_INCLUDE_SNP_FILE]
[--filter-exclude-snp-file FILTER_EXCLUDE_SNP_FILE]
[--filter-maf-min FILTER_MAF_MIN]
[--filter-maf-max FILTER_MAF_MAX]
[--filter-mac-min FILTER_MAC_MIN]
[--filter-mac-max FILTER_MAC_MAX]
optional arguments:
-h, --help show this help message and exit
--vcf VCF Defines the filename of the VCF (default: None)
--model-file MODEL_FILE
Defines the model file (default: None)
--model MODEL Defines the model and the individual(s) to include
(default: None)
--out OUT Defines the complete output filename, overrides --out-
prefix (default: None)
--out-prefix OUT_PREFIX
Defines the output prefix (i.e. filename without file
extension) (default: out)
--out-format {vcf, vcf.gz, bcf, bed, sites}
Defines the desired output format (default: vcf.gz)
--overwrite Overwrite previous output (default: False)
--force-samples Ignore the error rasied when a sample that does not
exist (default: False)
--filter-include-indv FILTER_INCLUDE_INDV [FILTER_INCLUDE_INDV ...]
Defines the individual(s) to include. May be used
multiple times (default: None)
--filter-exclude-indv FILTER_EXCLUDE_INDV [FILTER_EXCLUDE_INDV ...]
Defines the individual(s) to exclude. May be used
multiple times (default: None)
--filter-include-indv-file FILTER_INCLUDE_INDV_FILE
Defines a file of individuals to include (default:
None)
--filter-exclude-indv-file FILTER_EXCLUDE_INDV_FILE
Defines a file of individuals to exclude (default:
None)
--filter-only-biallelic
Only include variants that are biallelic (default:
False)
--filter-min-alleles FILTER_MIN_ALLELES
Include variants with a number of allele >= to the
given number (default: None)
--filter-max-alleles FILTER_MAX_ALLELES
Include variants with a number of allele <= to the
given number (default: None)
--filter-max-missing FILTER_MAX_MISSING
Max proportion of missing data allowed (0.0: no
missing data, 1.0: include all data) (default: None)
--filter-max-missing-count FILTER_MAX_MISSING_COUNT
Max number of sample with missing data allowed
(default: None)
--filter-include-indels
Include variants if they contain an insertion or a
deletion (default: False)
--filter-exclude-indels
Exclude variants if they contain an insertion or a
deletion (default: False)
--filter-include-snps
Include variants if they contain a SNP (default:
False)
--filter-exclude-snps
Exclude variants if they contain a SNP (default:
False)
--filter-include-pos FILTER_INCLUDE_POS [FILTER_INCLUDE_POS ...]
Defines comma seperated positions (i.e. CHROM:START-
END) to include. START and END are optional. May be
used multiple times (default: None)
--filter-exclude-pos FILTER_EXCLUDE_POS [FILTER_EXCLUDE_POS ...]
Defines comma seperated positions (i.e. CHROM:START-
END) to exclude. START and END are optional. May be
used multiple times (default: None)
--filter-include-pos-file FILTER_INCLUDE_POS_FILE
Defines a file of positions to include within a file
(default: None)
--filter-exclude-pos-file FILTER_EXCLUDE_POS_FILE
Defines a file of positions to exclude within a file
(default: None)
--filter-include-bed FILTER_INCLUDE_BED
Defines a BED file of positions to include (default:
None)
--filter-exclude-bed FILTER_EXCLUDE_BED
Defines a BED file of positions to exclude (default:
None)
--filter-include-passed
Include variants with the 'PASS' filter flag (default:
False)
--filter-exclude-passed
Exclude variants with the 'PASS' filter flag (default:
False)
--filter-include-flag FILTER_INCLUDE_FLAG [FILTER_INCLUDE_FLAG ...]
Include variants with the specified filter flag
(default: None)
--filter-exclude-flag FILTER_EXCLUDE_FLAG [FILTER_EXCLUDE_FLAG ...]
Exclude variants with the specified filter flag
(default: None)
--filter-include-snp FILTER_INCLUDE_SNP [FILTER_INCLUDE_SNP ...]
Include SNP(s) with the matching ID. This argument may
be used multiple times (default: None)
--filter-exclude-snp FILTER_EXCLUDE_SNP [FILTER_EXCLUDE_SNP ...]
Exclude SNP(s) with the matching ID. This argument may
be used multiple times (default: None)
--filter-include-snp-file FILTER_INCLUDE_SNP_FILE
Defines a file of SNP IDs to include (default: None)
--filter-exclude-snp-file FILTER_EXCLUDE_SNP_FILE
Defines a file of SNP IDs to exclude (default: None)
--filter-maf-min FILTER_MAF_MIN
Include variants with equal or greater MAF values
(default: None)
--filter-maf-max FILTER_MAF_MAX
Include variants with equal or lesser MAF values
(default: None)
--filter-mac-min FILTER_MAC_MIN
Include variants with equal or greater MAC values
(default: None)
--filter-mac-max FILTER_MAC_MAX
Include variants with equal or lesser MAC values
(default: None)
|
software ref: https://github.com/jaredgk/PPP/
research ref: https://ppp.readthedocs.io/en/latest/PPP_pages/citations.html