Contents

PPP 0.1.13

Installed
This software should be available with no extra configuration.

ppp-0.1.13 - Popgen Pipeline Platform (PPP)

The PPP is a software platform with the goal of reducing the computational expertise required for conducting population genomic analyses. The PPP was designed as a collection of scripts that facilitate common population genomic workflows in a consistent and standardized environment. Functions were developed to encompass entire workflows, including: input preparation, file format conversion, various population genomic analyses, output generation, and visualization. By facilitating entire workflows, the PPP offers several benefits to prospective end users - it reduces the need of redundant in-house software and scripts that would require development time and may be error-prone, or incorrect, depending on the expertise of the investigator. The platform has also been developed with reproducibility and extensibility of analyses in mind.

The current documentation may be found here.

A PDF of the documentation is also available for download.


Location and version

1
2
3
4
$ which vcf_filter.py
/local/cluster/bin/vcf_filter.py
$ vcf_filter.py -h
initLogger - WARNING: PPP, version 0.1.13

help message

There are other scripts associated with this software as well.

Please see the full documentation for more information.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
$ vcf_filter.py -h
initLogger - WARNING: PPP, version 0.1.13
usage: vcf_filter.py [-h] --vcf VCF [--model-file MODEL_FILE] [--model MODEL]
                     [--out OUT] [--out-prefix OUT_PREFIX]
                     [--out-format {vcf, vcf.gz, bcf, bed, sites}]
                     [--overwrite] [--force-samples]
                     [--filter-include-indv FILTER_INCLUDE_INDV [FILTER_INCLUDE_INDV ...]
                     | --filter-exclude-indv FILTER_EXCLUDE_INDV
                     [FILTER_EXCLUDE_INDV ...]]
                     [--filter-include-indv-file FILTER_INCLUDE_INDV_FILE | --filter-exclude-indv-file FILTER_EXCLUDE_INDV_FILE]
                     [--filter-only-biallelic]
                     [--filter-min-alleles FILTER_MIN_ALLELES]
                     [--filter-max-alleles FILTER_MAX_ALLELES]
                     [--filter-max-missing FILTER_MAX_MISSING | --filter-max-missing-count FILTER_MAX_MISSING_COUNT]
                     [--filter-include-indels | --filter-exclude-indels]
                     [--filter-include-snps | --filter-exclude-snps]
                     [--filter-include-pos FILTER_INCLUDE_POS [FILTER_INCLUDE_POS ...]]
                     [--filter-exclude-pos FILTER_EXCLUDE_POS [FILTER_EXCLUDE_POS ...]]
                     [--filter-include-pos-file FILTER_INCLUDE_POS_FILE]
                     [--filter-exclude-pos-file FILTER_EXCLUDE_POS_FILE]
                     [--filter-include-bed FILTER_INCLUDE_BED]
                     [--filter-exclude-bed FILTER_EXCLUDE_BED]
                     [--filter-include-passed] [--filter-exclude-passed]
                     [--filter-include-flag FILTER_INCLUDE_FLAG [FILTER_INCLUDE_FLAG ...]]
                     [--filter-exclude-flag FILTER_EXCLUDE_FLAG [FILTER_EXCLUDE_FLAG ...]]
                     [--filter-include-snp FILTER_INCLUDE_SNP [FILTER_INCLUDE_SNP ...]]
                     [--filter-exclude-snp FILTER_EXCLUDE_SNP [FILTER_EXCLUDE_SNP ...]]
                     [--filter-include-snp-file FILTER_INCLUDE_SNP_FILE]
                     [--filter-exclude-snp-file FILTER_EXCLUDE_SNP_FILE]
                     [--filter-maf-min FILTER_MAF_MIN]
                     [--filter-maf-max FILTER_MAF_MAX]
                     [--filter-mac-min FILTER_MAC_MIN]
                     [--filter-mac-max FILTER_MAC_MAX]

optional arguments:
  -h, --help            show this help message and exit
  --vcf VCF             Defines the filename of the VCF (default: None)
  --model-file MODEL_FILE
                        Defines the model file (default: None)
  --model MODEL         Defines the model and the individual(s) to include
                        (default: None)
  --out OUT             Defines the complete output filename, overrides --out-
                        prefix (default: None)
  --out-prefix OUT_PREFIX
                        Defines the output prefix (i.e. filename without file
                        extension) (default: out)
  --out-format {vcf, vcf.gz, bcf, bed, sites}
                        Defines the desired output format (default: vcf.gz)
  --overwrite           Overwrite previous output (default: False)
  --force-samples       Ignore the error rasied when a sample that does not
                        exist (default: False)
  --filter-include-indv FILTER_INCLUDE_INDV [FILTER_INCLUDE_INDV ...]
                        Defines the individual(s) to include. May be used
                        multiple times (default: None)
  --filter-exclude-indv FILTER_EXCLUDE_INDV [FILTER_EXCLUDE_INDV ...]
                        Defines the individual(s) to exclude. May be used
                        multiple times (default: None)
  --filter-include-indv-file FILTER_INCLUDE_INDV_FILE
                        Defines a file of individuals to include (default:
                        None)
  --filter-exclude-indv-file FILTER_EXCLUDE_INDV_FILE
                        Defines a file of individuals to exclude (default:
                        None)
  --filter-only-biallelic
                        Only include variants that are biallelic (default:
                        False)
  --filter-min-alleles FILTER_MIN_ALLELES
                        Include variants with a number of allele >= to the
                        given number (default: None)
  --filter-max-alleles FILTER_MAX_ALLELES
                        Include variants with a number of allele <= to the
                        given number (default: None)
  --filter-max-missing FILTER_MAX_MISSING
                        Max proportion of missing data allowed (0.0: no
                        missing data, 1.0: include all data) (default: None)
  --filter-max-missing-count FILTER_MAX_MISSING_COUNT
                        Max number of sample with missing data allowed
                        (default: None)
  --filter-include-indels
                        Include variants if they contain an insertion or a
                        deletion (default: False)
  --filter-exclude-indels
                        Exclude variants if they contain an insertion or a
                        deletion (default: False)
  --filter-include-snps
                        Include variants if they contain a SNP (default:
                        False)
  --filter-exclude-snps
                        Exclude variants if they contain a SNP (default:
                        False)
  --filter-include-pos FILTER_INCLUDE_POS [FILTER_INCLUDE_POS ...]
                        Defines comma seperated positions (i.e. CHROM:START-
                        END) to include. START and END are optional. May be
                        used multiple times (default: None)
  --filter-exclude-pos FILTER_EXCLUDE_POS [FILTER_EXCLUDE_POS ...]
                        Defines comma seperated positions (i.e. CHROM:START-
                        END) to exclude. START and END are optional. May be
                        used multiple times (default: None)
  --filter-include-pos-file FILTER_INCLUDE_POS_FILE
                        Defines a file of positions to include within a file
                        (default: None)
  --filter-exclude-pos-file FILTER_EXCLUDE_POS_FILE
                        Defines a file of positions to exclude within a file
                        (default: None)
  --filter-include-bed FILTER_INCLUDE_BED
                        Defines a BED file of positions to include (default:
                        None)
  --filter-exclude-bed FILTER_EXCLUDE_BED
                        Defines a BED file of positions to exclude (default:
                        None)
  --filter-include-passed
                        Include variants with the 'PASS' filter flag (default:
                        False)
  --filter-exclude-passed
                        Exclude variants with the 'PASS' filter flag (default:
                        False)
  --filter-include-flag FILTER_INCLUDE_FLAG [FILTER_INCLUDE_FLAG ...]
                        Include variants with the specified filter flag
                        (default: None)
  --filter-exclude-flag FILTER_EXCLUDE_FLAG [FILTER_EXCLUDE_FLAG ...]
                        Exclude variants with the specified filter flag
                        (default: None)
  --filter-include-snp FILTER_INCLUDE_SNP [FILTER_INCLUDE_SNP ...]
                        Include SNP(s) with the matching ID. This argument may
                        be used multiple times (default: None)
  --filter-exclude-snp FILTER_EXCLUDE_SNP [FILTER_EXCLUDE_SNP ...]
                        Exclude SNP(s) with the matching ID. This argument may
                        be used multiple times (default: None)
  --filter-include-snp-file FILTER_INCLUDE_SNP_FILE
                        Defines a file of SNP IDs to include (default: None)
  --filter-exclude-snp-file FILTER_EXCLUDE_SNP_FILE
                        Defines a file of SNP IDs to exclude (default: None)
  --filter-maf-min FILTER_MAF_MIN
                        Include variants with equal or greater MAF values
                        (default: None)
  --filter-maf-max FILTER_MAF_MAX
                        Include variants with equal or lesser MAF values
                        (default: None)
  --filter-mac-min FILTER_MAC_MIN
                        Include variants with equal or greater MAC values
                        (default: None)
  --filter-mac-max FILTER_MAC_MAX
                        Include variants with equal or lesser MAC values
                        (default: None)

software ref: https://github.com/jaredgk/PPP/
research ref: https://ppp.readthedocs.io/en/latest/PPP_pages/citations.html