1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
|
$ panX.py --help
usage: ./panX.py -h (help)
panX: Software for computing core and pan-genome from a set of genome
sequences. The results will be exported as json files for visualization in the
browser.
optional arguments:
-h, --help show this help message and exit
-fn , --folder_name the absolute path for project folder
-sl , --species_name
species name as prefix for some temporary folders
(e.g.: P_aeruginosa)
-ngbk, --gbk_present use nucleotide/amino acid sequence files (fna/faa)
when no genBank files given (this option does not
consider annotations)
-st [ ...], --steps [ ...]
run specific steps or run all steps by default
-mo, --metainfo_organism
add organism information in metadata table.
-mr, --metainfo_reconcile
use reconciled metadata (redundancy removed) instead
of original metadata.
-rt , --raxml_max_time
RAxML tree optimization: maximal runing time (minutes,
default:30min)
-t , --threads number of threads
-v, --version show program's version number and exit
-bp , --blast_file_path
the absolute path for blast result (e.g.:
/path/blast.out)
-rp , --roary_file_path
the absolute path for roary result (e.g.:
/path/roary.out)
-op , --orthofinder_file_path
the absolute path for orthofinder result (e.g.:
/path/orthofinder.out)
-otp , --other_tool_fpath
the absolute path for result from other orthology
inference tool (e.g.: /path/other_tool.out)
-mi , --metainfo_fpath
the absolute path for meta_information file (e.g.:
/path/meta.out)
-dmp , --diamond_path
alternative diamond path provided by user
-dme , --diamond_evalue
default: e-value threshold below 0.001
-dmt , --diamond_max_target_seqs
Diamond: maximum number of target sequences per query
Estimation: #strain * #max_duplication (50*10=500)
-dmi , --diamond_identity
Diamond: sequence identity threshold to report an
alignment. Default: no restriction (0)
-dmqc , --diamond_query_cover
Diamond: query sequence coverage threshold to report
an alignment. Default: no restriction (0)
-dmsc , --diamond_subject_cover
Diamond: subject sequence coverage threshold to report
an alignment. Default: no restriction (0)
-dmdc, --diamond_divide_conquer
running diamond alignment in divide-and-conquer(DC)
algorithm for large dataset
-dcs , --subset_size
subset_size (number of strains in a subset) for
divide-and-conquer(DC) algorithm. Default:50
-dmsi , --diamond_identity_subproblem
Diamond divide-and-conquer subproblem: sequence
identity threshold to report an alignment.
-dmsqc , --diamond_query_cover_subproblem
Diamond divide-and-conquer subproblem: query sequence
coverage threshold to report an alignment
-dmssc , --diamond_subject_cover_subproblem
Diamond divide-and-conquer subproblem: subject
sequence coverage threshold to report an alignment
-imcl , --mcl_inflation
MCL: inflation parameter (this parameter affects
granularity)
-bmt , --blastn_RNA_max_target_seqs
Blastn on RNAs: the maximum number of target sequences
per query Estimation: #strain * #max_duplication
-np, --disable_cluster_postprocessing
disable postprocessing (split overclustered genes and
paralogs, and cluster unclustered genes)
-nsl, --disable_long_branch_splitting
disable splitting long branch
-rna, --enable_RNA_clustering
cluster rRNAs
-fcd , --factor_core_diversity
default: factor used to refine raw core genome
diversity, apply
(0.1+2.0*core_diversity)/(1+2.0*core_diversity) to
decide split_long_branch_cutoff
-slb , --split_long_branch_cutoff
split long branch cutoff provided by user (by default:
0.0 as not given):
-pep, --explore_paralog_plot
default: not plot paralog statistics
-pfc , --paralog_frac_cutoff
fraction of strains required for splitting paralogy.
Default: 0.33
-pbc , --paralog_branch_cutoff
branch_length cutoff used in paralogy splitting
-ws , --window_size_smoothed
postprocess_unclustered_genes: window size for
smoothed cluster length distribution
-spr , --strain_proportion
postprocess_unclustered_genes: strain proportion
-ss , --sigma_scale postprocess_unclustered_genes: sigma scale
-cg , --core_genome_threshold
percentage of strains used to decide whether a gene is
core. Default: 1.0 for strictly core gene; < 1.0 for
soft core genes
-csf , --core_gene_strain_fpath
file path for user-provided subset of strains (core
genes should be present in all strains in this list)
-sitr, --simple_tree simple tree: does not use treetime for ancestral
inference
-dgl, --disable_gain_loss
disable enable gene gain and loss inference (not
recommended)
-mglo, --merged_gain_loss_output
not split gene presence/absence and gain/loss pattern
into separate files for each cluster
-iba, --infer_branch_association
infer branch association
-bamin , --min_strain_fraction_branch_association
minimal fraction of the total number of strains for
branch association
-pamin , --min_strain_fraction_presence_association
minimal fraction of the total number of strains for
presence/absence association
-pamax , --max_strain_fraction_presence_association
maximal fraction of the total number of strains for
presence/absence association
-slt, --store_locus_tag
store locus_tags in a separate file instead of saving
locus_tags in gene cluster json for large dataset
-rlt, --raw_locus_tag
use raw locus_tag from GenBank instead of strain_ID +
locus_tag
-otc, --optional_table_column
add customized column in gene cluster json file for
visualization.
-mtf , --meta_data_config
file path for pre-defined metadata structure
(discrete/continuous data type, etc.)
-rxm , --raxml_path absolute path of raxml
-ct, --clean_temporary_files
default: keep temporary files
|