Guidance on using the CQLS queuing system

2022-10-04 3573 words 17 minutes

Contents

The CQLS queuing system

Queuing systems allow for multiple users to make use of multiple compute nodes across an infrastructure. At the CQLS, we log in to the shell.cqls.oregonstate.edu machine, currently called vaughan. From vaughan, users can submit batch or array jobs (1 or many tasks, respectively) to the, queuing system such that they run autonomously, and will continue running even when users log off.

Reading pre-requisite

Please see this previous post with some information regarding the CQLS infrastructure along with some of the commands that are a part of the SGE queuing system.

How does one use the queuing system?

The CQLS is currently using the Son of Grid Engine (SGE) queuing system. To facilitate the use of the queuing system, the CQLS wrote and maintains two helper scripts for job submission: SGE_Batch and SGE_Array.

SGE_Batch is designed to support a job submission with individual tasks (one can submit array jobs using SGE_Batch, but we recommend using SGE_Array), while SGE_Array is designed for submitting array jobs including multiple tasks.

These wrapper scripts write a shell script to the filesystem and submit a job to the queuing system for you according to the options you requested. In the background, the scripts call the qsub command for you with the shell script.

To see what resources are available to you, use the SGE_Avail script. To monitor currently running jobs, use qstat, and to kill jobs, use qdel. See the above post linked above for more information regarding these underlying commands.

To check out a node interactively, use the qrsh command. You can specify a -q QUEUE where QUEUE is the name of the queue you’d like to check out a node from. You can check out multiple processors on a machine using the qrsh -pe thread N option, where N is the number of processors you need.

To land on a specific node on a queue, use the -q QUEUE@node syntax, where QUEUE is e.g. bpp and node is e.g. symbiosis. This syntax can be used with qrsh and the SGE_Array and SGE_Batch helper scripts.

Please make sure to exit the node when you are finished so that the queue can reclaim the processors for others to use.

Practical application

We’ve established what the queuing system is and what commands one can use to interact with the queuing system. Let’s put these scripts to use in some hypothetical scenarios that may be related to your own research.

Common considerations

Naming your jobs (run-name)

You must specify a run-name (-r flag) for each job that you submit. I suggest using a syntax whereby each job submitted to SGE has a run-name prefix with sge.; while using the infrastructure for 10+ years, I have had to find different ways to deal with having these run directories in my various workflows. What I found is that having the directories always prefixed with sge. allows me to identify them even in a directory full of other input and output files.

Additionally, if one uses sample names as part of the run-names, one might find that there are errors whenever a sample name starts with a number. This is because numbers are disallowed in the run-names by SGE. Therefore, prefixing all run-names with sge. will alleviate this issue.

I also generally suggest including the program that one is running in the run-name.

Therefore, a standard syntax that one might adopt is sge.ProgramName_SampleName. This will also help identify what jobs are running more easily when one runs qstat, although the names sometimes do get cut-off in that output.

You can run the qstat-long command to get the full names for jobs.

When you submit a job, it might be useful to use the watch qstat command to monitor the job progress until your job enters the r (i.e. running) state.

Matching command settings and SGE settings

SGE does not and cannot know what settings you applied to your commands in terms of CPU and/or memory utilization. Therefore, whenever you are submitting a command to the queue, it is a good idea to set your number of CPUs or threads (often using the --threads or -t option, this will be program dependent) and the -P option of SGE_Batch or SGE_Array. Setting a higher value of -P will not make your program automatically use more threads. Furthermore, setting the number of threads of your program to a higher value without the corresponding change to -P will cause the server to be overloaded.

Standard output and standard error

Your standard output and error will end up in the run-name directory that you specify on job creation. The STDOUT will be in the files with the .o$RUNID suffix and the STDERR will be in the files with the .e$RUNID suffix.

Generally, it’s useful to specify your output to go to a file, either using STDOUT redirection (i.e. command > output.txt) or using the program specific flag (e.g. command -o output.txt) instead of pulling the output from the .o$RUNID files.

Use the local /data drives on compute nodes

In order to speed up computation and reduce the network congestion, it can be useful to specify the local /data drive as input or output for your commands. If you are running a command from a /nfs or /home directory, then both the reads and writes to disk will happen over the network. With processes that are especially heavy on input/output (I/O), this can cause slowdowns and disk waits that will reduce efficiency of both your own processing and others using the same compute nodes as you.

One common high I/O process is genome assembly using short reads, like the SPAdes genome assembler. The authors of this software suggest using local hard drives, like /data, for processing using SPAdes. At minimum, it will be useful to specify the output directory on the /data drive. For the most speed-up, you can also copy the input files to the /data drive as well, prior to assembly job submssion. NOTE: You will have to remember which node you copied the reads over to and the output was written to so that you can copy the outputs off afterward.

Some programs also allow you to specify a temporary directory to write intermediates to, and do not always use the $TMPDIR environment variable as default, so it can be useful to specify the temp dir as /data manually when that option is available.

SGE job states listed using qstat

The common SGE job states are:

Category	State	SGE Letter Code
Pending	pending, queue wait	qw
Pending	pending, holding (for other job) queue wait	hqw
Running	running	r
Running	transferring	t
Suspended	suspended	s
Error	all pending states with error	Eqw, Ehqw
Deleted	all running and suspended states with deletion	dr, dt

The most commonly seen states will be qw of queue wait and r of running. At times, you might find an E state (along with other codes) that indicates some issue with job submission. You can run qstat -j $JOBID for that job and figure out why that job has an error (grep for error and you will find the appropriate line).

Sometimes you can use the clear_Eqw_job.sh script to clear the errors and the will continue as submitted, but often you have to kill and resubmit the jobs.

If you find jobs stuck in the d state, the node you are trying to delete the job on is likely waiting for disk I/O and is unresponsive. The node may need to be restarted.

SGE_Avail queue states

State	SGE Letter Code	Note
alarm	a	Too many resources being used (load average too high)
unreachable	u	Machine is offline or off network
Error	E	Transfer and/or network error most often
normal	normal	All is well

If a queue is in error state, you can run qstat -explain E to find more information about the error. If you see au or a state, you can run qstat -explain a (or qstat -explain aE for both). The -explain flag can be limited by -q flag as well to get information about specific queues.

qstat -explain aE | grep -B 1 -e alarm -e ERROR shows all E and a state messages.

Individual job submission

There are several different ways to successfully submit a job through SGE_Batch. The two most common methods are:

Include the bare command in quotes as the -c option to SGE_Batch (as above)
Write the commands in a bash script (e.g. run.sh) and submit the bash script as the -c option (e.g. SGE_Batch -c 'bash ./run.sh' ...)

For one-off commands, I think the utility of the bare -c command is obvious because of the ease of use and quick access.

For jobs as part of a larger analysis, it may be useful to include the commands in a bash script, thus saving a record of what you ran and making it easier to re-run the commands in the future. In addition, if you are on a node interactively, using qrsh, you can run the command in the bash script without having to retype the command or scroll through your command history.

An additional benefit of using a shell script to store your commands is that it is easier to run commands serially, e.g.

1
2
3
4
5


#!/usr/bin/env bash
program1 -options > outputA
program2 -input ouptutA -output outputB
cat outputB | sed 's/foo/bar/' | program3 -options
...

when compared to running several SGE_Batch -c 'command'... after the previous job finishes.

Finally, if you need to use a piece of software that is in a conda environment, you can include the commands to activate the environment in the bash script.

Reference updates page

1
2
3


#!/usr/bin/env bash
source /local/cluster/bakta/activate.sh
bakta input.fasta ...

In this way, you can record what conda environment needs to be activated for the program to work properly.

Multiple job (task) submission

As with many workflows, you may have multiple files to process through the same program. This will be the case with any project where you have multiple samples. Instead of submitting multiple (thousand) SGE_Batch commands, it can be useful to organize your job submissions using SGE_Array instead.

SGE_Array allows you to submit a single job that includes mutliple tasks (in SGE terminology) that will be grouped. This can be especially useful when doing some sort of large comparative genomics project with thousands of individual tasks - you can group the tasks in a single job and not take over the entire infrastructure by using the concurrency flag (-b; 50 by default). As a general rule, if you are using a lot of resources per job, you can limit your concurrency to a smaller number so others can use the infrastructure. Conversely, if your jobs finish quickly and are unlikely to stay in the queue for a long period (say, > 24h), then you can increase the concurrency flag to speed up the processing.

Let’s assume you have some predicted proteomes (NCBI .faa format) and you want to characterize their predicted functions using interproscan. Here is one way you could run them all through using SGE_Array.

Here is our directory structure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


$ lsd --tree
 .
├──  faa
│  ├──  EQ1999.faa
│  ├──  EQ2001.faa
│  ├──  RJ10.faa
│  ├──  RJ19.faa
│  ├──  TZD22.faa
│  └──  TZD59.faa
└──  run.sh

Here is the contents of run.sh:

1
2
3
4
5
6
7
8


#!/usr/bin/env bash
iprscan=/local/cluster/interproscan/interproscan/interproscan.sh
threads=8
appl='-appl TIGRFAM,Pfam,TMHMM,SignalP_GRAM_NEGATIVE'
options="-T /data -pa -iprlookup -goterms -cpu $threads"
for faa in $( ls -1 faa ); do
    echo $iprscan $appl $options -i $faa
done

You’ll note that this just runs echo to print the commands to the terminal, which is a good thing because it allows you to check your command before you submit it. For example, while writing this tutorial, I neglected to include the -i $faa at the end of the command such that I did not provide any input files to the command.

Let’s give the script a try:

1
2
3
4
5
6
7


$ bash ./run.sh
/local/cluster/interproscan/interproscan/interproscan.sh -appl TIGRFAM,Pfam,TMHMM,SignalP_GRAM_NEGATIVE -T /data -pa -iprlookup -goterms -cpu 8 -i EQ1999.faa
/local/cluster/interproscan/interproscan/interproscan.sh -appl TIGRFAM,Pfam,TMHMM,SignalP_GRAM_NEGATIVE -T /data -pa -iprlookup -goterms -cpu 8 -i EQ2001.faa
/local/cluster/interproscan/interproscan/interproscan.sh -appl TIGRFAM,Pfam,TMHMM,SignalP_GRAM_NEGATIVE -T /data -pa -iprlookup -goterms -cpu 8 -i RJ10.faa
/local/cluster/interproscan/interproscan/interproscan.sh -appl TIGRFAM,Pfam,TMHMM,SignalP_GRAM_NEGATIVE -T /data -pa -iprlookup -goterms -cpu 8 -i RJ19.faa
/local/cluster/interproscan/interproscan/interproscan.sh -appl TIGRFAM,Pfam,TMHMM,SignalP_GRAM_NEGATIVE -T /data -pa -iprlookup -goterms -cpu 8 -i TZD22.faa
/local/cluster/interproscan/interproscan/interproscan.sh -appl TIGRFAM,Pfam,TMHMM,SignalP_GRAM_NEGATIVE -T /data -pa -iprlookup -goterms -cpu 8 -i TZD59.faa

So far so good. Now we can submit the jobs using SGE_Array, which takes STDIN as the commands as default.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


$ bash ./run.sh | SGE_Array -q fast -P 8
Successfully submitted job 9999617.1-6:1, logging job number, timestamp, and rundir to .sge_array_jobnums
$ qstat
job-ID  prior   name	  user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
9999617 0.51686 j2022-11-0 davised	r     11/08/2022 00:38:43 fast@chrom1.cgrb.oregonstate.l     8 1
9999617 0.51686 j2022-11-0 davised	r     11/08/2022 00:38:43 fast@chrom1.cgrb.oregonstate.l     8 2
9999617 0.51686 j2022-11-0 davised	r     11/08/2022 00:38:43 fast@chrom1.cgrb.oregonstate.l     8 3
9999617 0.51686 j2022-11-0 davised	r     11/08/2022 00:38:43 fast@chrom1.cgrb.oregonstate.l     8 4
9999617 0.51686 j2022-11-0 davised	r     11/08/2022 00:38:43 fast@chrom1.cgrb.oregonstate.l     8 5
9999617 0.51686 j2022-11-0 davised	r     11/08/2022 00:38:43 fast@chrom1.cgrb.oregonstate.l     8 6

One nice feature of SGE_Array is that you will get an automatic run-name if you do not provide a run-name with the -r flag. The downside is you will have to examine the command.N files inside the directory to figure out what you ran.

Let’s take a look at the SGE directory:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34


$ lsd --tree j2022-11-08_00-38-39_interproscansh_etal
 j2022-11-08_00-38-39_interproscansh_etal
├──  command.1.txt
├──  command.2.txt
├──  command.3.txt
├──  command.4.txt
├──  command.5.txt
├──  command.6.txt
├──  commands.txt
├──  j2022-11-08_00-38-39_interproscansh_etal.e9999617.1
├──  j2022-11-08_00-38-39_interproscansh_etal.e9999617.2
├──  j2022-11-08_00-38-39_interproscansh_etal.e9999617.3
├──  j2022-11-08_00-38-39_interproscansh_etal.e9999617.4
├──  j2022-11-08_00-38-39_interproscansh_etal.e9999617.5
├──  j2022-11-08_00-38-39_interproscansh_etal.e9999617.6
├──  j2022-11-08_00-38-39_interproscansh_etal.o9999617.1
├──  j2022-11-08_00-38-39_interproscansh_etal.o9999617.2
├──  j2022-11-08_00-38-39_interproscansh_etal.o9999617.3
├──  j2022-11-08_00-38-39_interproscansh_etal.o9999617.4
├──  j2022-11-08_00-38-39_interproscansh_etal.o9999617.5
├──  j2022-11-08_00-38-39_interproscansh_etal.o9999617.6
├──  j2022-11-08_00-38-39_interproscansh_etal.pe9999617.1
├──  j2022-11-08_00-38-39_interproscansh_etal.pe9999617.2
├──  j2022-11-08_00-38-39_interproscansh_etal.pe9999617.3
├──  j2022-11-08_00-38-39_interproscansh_etal.pe9999617.4
├──  j2022-11-08_00-38-39_interproscansh_etal.pe9999617.5
├──  j2022-11-08_00-38-39_interproscansh_etal.pe9999617.6
├──  j2022-11-08_00-38-39_interproscansh_etal.po9999617.1
├──  j2022-11-08_00-38-39_interproscansh_etal.po9999617.2
├──  j2022-11-08_00-38-39_interproscansh_etal.po9999617.3
├──  j2022-11-08_00-38-39_interproscansh_etal.po9999617.4
├──  j2022-11-08_00-38-39_interproscansh_etal.po9999617.5
├──  j2022-11-08_00-38-39_interproscansh_etal.po9999617.6
└──  j2022-11-08_00-38-39_interproscansh_etal.sh

Each of the command.N.txt files corresponds to the commands taken in as input on the commandline when you submit the SGE_Array command (all of which are found in the commands.txt file).

The jobname.oJOBID.N and jobname.eJOBID.N files corresponds to the stdout and stderr of each task, respectively.

Advanced topics

While you can get by with the above commands and protocols, there are other ways that one can use the queuing system as well.

bash scripting for automation

One method for automating the job submission process takes two distinct steps:

Writing a shell script that includes the program, options, env vars, and activating conda envs
Writing a shell script to pass all input files through the script using SGE.

e.g.

1
2
3
4
5
6
7


$ cat run_bakta.sh
#!/usr/bin/env bash
source /local/cluster/bakta/activate.sh
fasta=$1
threads=16
prefix=${fasta%.fasta}
bakta --output $prefix --threads $threads $fasta

1
2
3
4
5


$ cat submit_bakta.sh
#!/usr/bin/env bash
for fasta in $( ls -1 fna ); do
    echo bash ./run_bakta.sh $fasta
done

Then, to submit the commands to SGE

bash ./submit_bakta.sh | SGE_Array -P 16 -q ...

or, to preserve a record of the commands

1
2


bash ./submit_bakta.sh > cmds.txt
SGE_Array -c cmds.txt -P 16 -q ...

or, if you don’t need to run multiple jobs at once, and can run them serially

1
2


bash ./submit_bakta.sh > cmds.txt
SGE_Batch -c 'bash ./cmds.txt' -P 16 -q ...

or…

1
2


bash ./submit_bakta.sh > cmds.txt
SGE_Array -c cmds.txt -P 16 -b 1 -q ...

Using this two step approach is flexible, as you can see.

snakemake pipelines

Snakemake is a job running system that can be configured to submit jobs through SGE. The snakemake system is too large of a topic to cover here, however, here is some information that might get you in the right direction.

default cluster config:

1
2
3
4
5
6


$ cat cluster.yaml
__default__:
  threads: 1
  queue: 'fast'
  error: 'log/cluster'
  output: 'log/cluster'

job specific config:

1
2
3


$ head config.yaml
threads: 8
queue: 'bpp'

submission script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10


$ cat submit.sh
#!/usr/bin/env bash

smk=$1

if [ $smk ]; then
    smk="-s $smk"
fi

snakemake $smk --cluster 'qsub -q {cluster.queue} -pe thread {threads} -o {cluster.output} -e {cluster.error} -cwd -S /bin/bash -V' --jobs 12 --cluster-config cluster.yaml -p --latency-wait 30

Using the just command runner

I have recently taken to using the just command runner for my analysis, and other, jobs. One difficulty I’ve experienced over the years of bioinformatics work is keeping track of individual commands that I have run. For me, making each and every command as a shell script to keep a record of what I’ve done has not been possible or even desireable.

One of the main tasks that we have to deal with is converting files between formats. While there are tools that can assist us in this, often we end up using a pipeline of grep, awk, sed, cut, and paste to join our commands together.

While using just doesn’t completely eliminate barriers to writing the commands in a file, it certainly has the benefit of not cluttering the working directory with shell scripts. All commands can be contained in a file called justfile and then can be invoked using just $command, as might be expected when selecting options from any program. Further, arguments can be provided (and defaults set) so that you can re-use your justfiles if desired.

While there is some syntax and language-specific syntax to be learned (mainly, commands are submitted from the directory the justfile is in rather than the pwd), you can use just similarly to your standard bash scripts except that you get a command-line parser built in for you.

Real world example

Rather than re-writing the documentation that you can find by following the link above, I’ll walk you through a scenario from today.

Here is my directory structure (some files removed for readability):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


$ lsd --tree --depth 2
 .
├──  fna
│  ├──  Porticoccaceae_bacterium_EAC642.fna ⇒ ../../analysis/filtered/fna/Porticoccaceae_bacterium_EAC642.fna
│  ├──  Porticoccaceae_bacterium_HJW.50.fna ⇒ ../../analysis/filtered/fna/Porticoccaceae_bacterium_HJW.50.fna
│  ├──  Porticoccaceae_bacterium_IN41.fna ⇒ ../../analysis/filtered/fna/Porticoccaceae_bacterium_IN41.fna
│  ├──  Porticoccaceae_bacterium_MED13.fna ⇒ ../../analysis/filtered/fna/Porticoccaceae_bacterium_MED13.fna
..............
├──  fnas.txt
├──  justfile
├──  SAR92_dist.txt
├──  SAR92_dist_fixed.txt
├──  SAR92_dist_rename.txt
├──  SAR92_sizes.txt
└──  sge.run-dashing-dist
   ├──  sge.run-dashing-dist.e9998991
   ├──  sge.run-dashing-dist.o9998991
   ├──  sge.run-dashing-dist.pe9998991
   ├──  sge.run-dashing-dist.po9998991
   └──  sge.run-dashing-dist_1_sge.sh

I needed to run a genome hashing algorithm so I could compare some complete and incomplete genomes and later cluster them based on the hash distance.

Here is the justfile I wrote:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35


$ cat justfile
#!/usr/bin/env just

THREADS := '16'
K := '31'
OUTDIST := 'SAR92_dist.txt'
OUTSIZE := 'SAR92_sizes.txt'
OUTDISTR := 'SAR92_dist_rename.txt'
OUTDISTF := 'SAR92_dist_fixed.txt'

# set export
# TMPDIR := "/data"
# TMPDIR := "/tmp"

_default:
  @just -l -f {{justfile()}}

run-dist:
  #!/usr/bin/env bash
  set -euxo pipefail
  fnas=fnas.txt
  outdist={{OUTDIST}}
  outsize={{OUTSIZE}}
  threads={{THREADS}}
  k={{K}}
  dashing dist -k$k -p$threads -O$outdist -o$outsize -Q $fnas -F $fnas -T -M -J

convert:
  #!/usr/bin/env bash
  outdist={{OUTDIST}}
  outdistr={{OUTDISTR}}
  outdistf={{OUTDISTF}}
  sed 's!fna/!!' $outdist | sed 's!.fna!!' > $outdistr
  paste <( echo Genome ) <( cut -f 1 $outdistr | xargs | sed 's/ /\t/g' ) > $outdistf
  cat $outdistr >> $outdistf

You’ll see some parameters at the top, and then three commands, the default, which just lists the available commands, and then run-dist and convert.

This is what you see when you run just on the command line:

1
2
3
4


$ just
Available recipes:
    convert
    run-dist

The default command is hidden because it is prefixed with and underscore in the justfile.

I was then able to submit the run-dist command in this way:

SGE_Batch -c 'just run-dist' -q bact -P 16 -r sge.run-dashing-dist

Since I was unfamiliar with the outputs, I used qrsh to check out a node and navigate to the directory where the outputs were. I realized I needed to modify the output files, and instead of running the commands individually, I wrote the convert command above. Then, I could just run the command using just convert from the commandline. Since I had a node checked out, I did not have to worry about taking up too many resources (although the convert command could easily run on vaughan as well).

Additionally, I was able to re-run the run-dist command with several -k settings and examine the outputs just by changing the options in the justfile.

One could extend this idea using the bash scripting for automation section above, and have a run-command recipe that has the commands necesary for running the command, and a submit-command recipe that echos the command so it’s easy to run through SGE; I’ve done this in the past and it has been helpful. You can run recipes while in other recipes, making this set up even easier.