ipa 1.5.0

2021-10-25 702 words 4 minutes

ipa - Improved Phased Assembly

Improved Phased Assembler (IPA) is the official PacBio software for HiFi genome assembly. IPA was designed to utilize the accuracy of PacBio HiFi reads to produce high-quality phased genome assemblies. IPA is an end-to-end solution, starting with input reads and resulting in a polished assembly. IPA is fast, providing an easy to use local run mode or a distributed pipeline for a cluster.

To activate:

1
2


bash
source /local/cluster/ipa/activate.sh

Location and version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33


$ which ipa
/local/cluster/ipa/bin/ipa
$ ipa validate
INFO: /local/cluster/ipa/bin/ipa validate
INFO: ipa.py ipa (wrapper) version=1.5.0 ... Checking dependencies ...
INFO: Dependencies
/local/cluster/ipa/bin/python3
/local/cluster/ipa/bin/ipa2-task
/local/cluster/ipa/bin/falconc
/local/cluster/ipa/bin/minimap2
/local/cluster/ipa/bin/nighthawk
/local/cluster/ipa/bin/pancake
/local/cluster/ipa/bin/pblayout
/local/cluster/ipa/bin/racon
/local/cluster/ipa/bin/samtools
/local/cluster/ipa/bin/ipa_purge_dups
/local/cluster/ipa/bin/ipa_purge_dups_split_fa
snakemake version=6.10.0
ipa2-task 1.5.0 (commit c875fce13bdacbafc2f4f750c6438f4453e1354d)
 Machine name: 'Linux'
Copyright (C) 2004-2021     Pacific Biosciences of California, Inc.
This program comes with ABSOLUTELY NO WARRANTY; it is intended for
Research Use Only and not for use in diagnostic procedures.

falconc version=1.13.1+git.f9d1b5651e891efe379bd9727a0fa0931b875d7b, nim-version=1.5.1
minimap2 version=2.22-r1101
Nighthawk 0.1.0 (commit SL-release-10.1.0-7-gbe5dfb1*)
pancake 1.3.0 (commit SEQII-release-10.1.0-432-gf2693fd*)
pblayout 1.0.0 (commit SL-release-10.1.0-152-g66936d1*)
racon version=v1.4.20
samtools 1.12
Using htslib 1.12
ipa_purge_dups Version: 1.2.5

help message:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59


$ ipa --help
usage: ipa [-h] [--version] {local,dist,validate} ...

Improved Phased Assembly tool for HiFi reads.

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

subcommands:
  One of these must follow the options listed above and may be followed by sub-command specific options.

  {local,dist,validate}
                        sub-command help
    local               Run IPA on your local machine.
    dist                Distribute IPA jobs to your cluster.
    validate            Check dependencies.

Try "ipa local --help".
Or "ipa validate" to validate dependencies.
https://github.com/PacificBiosciences/pbbioconda/wiki/Improved-Phased-Assember
$ ipa local --help
usage: ipa local [-h] [--input-fn INPUT_FN] [--no-polish] [--no-phase] [--no-purge-dups]
                 [--genome-size GENOME_SIZE] [--coverage COVERAGE] [--advanced-opt ADVANCED_OPT]
                 [--nthreads NTHREADS] [--nshards NSHARDS] [--tmp-dir TMP_DIR] [--verbose]
                 [--njobs NJOBS] [--run-dir RUN_DIR] [--target TARGET] [--unlock] [--dry-run]
                 [--only-print] [--resume]

This sub-command runs snakemake in local-mode.

optional arguments:
  -h, --help            show this help message and exit
  --input-fn INPUT_FN, -i INPUT_FN
                        (Required.) Input reads in FASTA, FASTQ, BAM, XML or FOFN formats. Repeat "-i fn1 -i fn2" for multiple inputs, or use a "file-of-filenames", e.g. "-i foo.fofn". (default: [])

Algorithmic options:
  --no-polish           Skip polishing. (default: False)
  --no-phase            Skip phasing. (default: False)
  --no-purge-dups       Skip purge_dups. (default: False)
  --genome-size GENOME_SIZE
                        Genome size, required only for downsampling. (default: 0)
  --coverage COVERAGE   Downsampled coverage, only if genome_size * coverage > 0. (default: 0)
  --advanced-opt ADVANCED_OPT
                        Advanced options (quoted). (default: )

Workflow options:
  --nthreads NTHREADS   (Required) Maximum number of threads to use per job. (Applies to both remote and local tasks.) (default: 0)
  --nshards NSHARDS     Maximum number of parallel tasks to split work into (though the number of simultaneous jobs could be much lower). (default: 40)
  --tmp-dir TMP_DIR     Temporary directory for some disk based operations like sorting. (default: /tmp)
  --verbose             Extra logging for each task. (Show full env, e.g.) (default: False)

Snakemake options:
  --njobs NJOBS         (Required) Maximum number of simultaneous jobs, each running up to nthreads. (default: 0)
  --run-dir RUN_DIR     Directory in which to run snakemake. (default: ./RUN)
  --target TARGET       "finish" is implied, but you can use this to short-circuit. (default: )
  --unlock              Pass "--unlock" to snakemake, in case snakemake crashed earlier. (default: False)
  --dry-run, -n         Print the snakemake command and do a "dry run" quickly. Very useful! (default: False)
  --only-print          Do not actually run snakemake. Simply print the snakemake command and exit. (default: False)
  --resume              Restart snakemake, but after regenerating the config file. In this case, run-dir may already exist. (Without --resume, run-dir must not already exist.) (default: False)

Be sure to set:

–nthreads # and –njobs #
Check out with SGE_Batch nthreads*njobs
Set your --tmp-dir /data

To use over SGE, add the source /local/cluster/ipa/activate.sh to the top of your shell script, then run your ipa commands.

software ref: https://github.com/PacificBiosciences/pbbioconda/wiki/Improved-Phased-Assembler