Wildwood HPC Cluster
The hpcman
software was developed, in part, to help ease migration to the new Wildwood HPC infrastructure at the CQLS.
The SGE queuing system is presently in use on the CQLS research and teaching infrastructures. SGE will be available on the new 'Wildwood' HPC infrastructure, but we are encouraging labs to move their compute nodes over to the Slurm queuing system. SGE will primarily be intended for use of legacy workflows that are too costly or difficult to migrate to Slurm.
In order to facilitate this migration, the hpcman queue set of commands was developed to replace the
SGE_Batch
, SGE_Array
, and SGE_Avail
commands. Both SGE and Slurm jobs are able to be submitted and monitored
using the hpcman queue
commands.
Submitting jobs
Jobs, by default, are sent to Slurm partitions on the 'Wildwood' HPC cluster, when hpcman queue submit
(alias:
hqsub
) is used. This option can be toggled using the --queuetype
flag. See the
user-guide or the command line help hqsub -h
for more information.
Under the hood, the hqsub
software is generating a SGE- or Slurm-compatible job script and then submitting the script
using qsub
or sbatch
, respectively.
Monitoring jobs
Jobs of both SGE and Slurm can be monitored using the hpcman queue stat
(alias: hqstat
) command.
Jobs are queried using qstat
or squeue
for SGE and Slurm, respectively. Job attributes are then collated into a
table and displayed together, such that jobs from both queuing systems can be monitored at the same time.
Finding available resources
Resources for both type of queuing systems can be found using the hpcman queue avail
(alias: hqavail
) command.
Resources for SGE are gathered using the qstat -f
command more details here.
Resources for Slurm are gathered using the sinfo -Nl
command.
Stopping jobs
SGE jobs are killed using the qdel -j $JOBID
command, while Slurm jobs are stopped using the scancel $JOBID
command.
These features have not yet been integrated into the hpcman queue
software, but they are on the roadmap.