ipa - Improved Phased Assembly
Improved Phased Assembler (IPA) is the official PacBio software for HiFi
genome assembly. IPA was designed to utilize the accuracy of PacBio HiFi reads
to produce high-quality phased genome assemblies. IPA is an end-to-end
solution, starting with input reads and resulting in a polished assembly. IPA
is fast, providing an easy to use local run mode or a distributed pipeline for
a cluster.
To activate:
1
2
|
bash
source /local/cluster/ipa/activate.sh
|
Location and version:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
$ which ipa
/local/cluster/ipa/bin/ipa
$ ipa validate
INFO: /local/cluster/ipa/bin/ipa validate
INFO: ipa.py ipa (wrapper) version=1.5.0 ... Checking dependencies ...
INFO: Dependencies
/local/cluster/ipa/bin/python3
/local/cluster/ipa/bin/ipa2-task
/local/cluster/ipa/bin/falconc
/local/cluster/ipa/bin/minimap2
/local/cluster/ipa/bin/nighthawk
/local/cluster/ipa/bin/pancake
/local/cluster/ipa/bin/pblayout
/local/cluster/ipa/bin/racon
/local/cluster/ipa/bin/samtools
/local/cluster/ipa/bin/ipa_purge_dups
/local/cluster/ipa/bin/ipa_purge_dups_split_fa
snakemake version=6.10.0
ipa2-task 1.5.0 (commit c875fce13bdacbafc2f4f750c6438f4453e1354d)
Machine name: 'Linux'
Copyright (C) 2004-2021 Pacific Biosciences of California, Inc.
This program comes with ABSOLUTELY NO WARRANTY; it is intended for
Research Use Only and not for use in diagnostic procedures.
falconc version=1.13.1+git.f9d1b5651e891efe379bd9727a0fa0931b875d7b, nim-version=1.5.1
minimap2 version=2.22-r1101
Nighthawk 0.1.0 (commit SL-release-10.1.0-7-gbe5dfb1*)
pancake 1.3.0 (commit SEQII-release-10.1.0-432-gf2693fd*)
pblayout 1.0.0 (commit SL-release-10.1.0-152-g66936d1*)
racon version=v1.4.20
samtools 1.12
Using htslib 1.12
ipa_purge_dups Version: 1.2.5
|
help message:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
|
$ ipa --help
usage: ipa [-h] [--version] {local,dist,validate} ...
Improved Phased Assembly tool for HiFi reads.
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
subcommands:
One of these must follow the options listed above and may be followed by sub-command specific options.
{local,dist,validate}
sub-command help
local Run IPA on your local machine.
dist Distribute IPA jobs to your cluster.
validate Check dependencies.
Try "ipa local --help".
Or "ipa validate" to validate dependencies.
https://github.com/PacificBiosciences/pbbioconda/wiki/Improved-Phased-Assember
$ ipa local --help
usage: ipa local [-h] [--input-fn INPUT_FN] [--no-polish] [--no-phase] [--no-purge-dups]
[--genome-size GENOME_SIZE] [--coverage COVERAGE] [--advanced-opt ADVANCED_OPT]
[--nthreads NTHREADS] [--nshards NSHARDS] [--tmp-dir TMP_DIR] [--verbose]
[--njobs NJOBS] [--run-dir RUN_DIR] [--target TARGET] [--unlock] [--dry-run]
[--only-print] [--resume]
This sub-command runs snakemake in local-mode.
optional arguments:
-h, --help show this help message and exit
--input-fn INPUT_FN, -i INPUT_FN
(Required.) Input reads in FASTA, FASTQ, BAM, XML or FOFN formats. Repeat "-i fn1 -i fn2" for multiple inputs, or use a "file-of-filenames", e.g. "-i foo.fofn". (default: [])
Algorithmic options:
--no-polish Skip polishing. (default: False)
--no-phase Skip phasing. (default: False)
--no-purge-dups Skip purge_dups. (default: False)
--genome-size GENOME_SIZE
Genome size, required only for downsampling. (default: 0)
--coverage COVERAGE Downsampled coverage, only if genome_size * coverage > 0. (default: 0)
--advanced-opt ADVANCED_OPT
Advanced options (quoted). (default: )
Workflow options:
--nthreads NTHREADS (Required) Maximum number of threads to use per job. (Applies to both remote and local tasks.) (default: 0)
--nshards NSHARDS Maximum number of parallel tasks to split work into (though the number of simultaneous jobs could be much lower). (default: 40)
--tmp-dir TMP_DIR Temporary directory for some disk based operations like sorting. (default: /tmp)
--verbose Extra logging for each task. (Show full env, e.g.) (default: False)
Snakemake options:
--njobs NJOBS (Required) Maximum number of simultaneous jobs, each running up to nthreads. (default: 0)
--run-dir RUN_DIR Directory in which to run snakemake. (default: ./RUN)
--target TARGET "finish" is implied, but you can use this to short-circuit. (default: )
--unlock Pass "--unlock" to snakemake, in case snakemake crashed earlier. (default: False)
--dry-run, -n Print the snakemake command and do a "dry run" quickly. Very useful! (default: False)
--only-print Do not actually run snakemake. Simply print the snakemake command and exit. (default: False)
--resume Restart snakemake, but after regenerating the config file. In this case, run-dir may already exist. (Without --resume, run-dir must not already exist.) (default: False)
|
Be sure to set:
- –nthreads # and –njobs #
- Check out with SGE_Batch nthreads*njobs
- Set your
--tmp-dir /data
To use over SGE, add the source /local/cluster/ipa/activate.sh
to the top of
your shell script, then run your ipa
commands.
software ref: https://github.com/PacificBiosciences/pbbioconda/wiki/Improved-Phased-Assembler