Introduction
SLURM (Simple Linux Utility for Resource Management) is a powerful and flexible job scheduler used in High-Performance Computing (HPC) environments. It helps manage and allocate resources efficiently across a cluster of computers. In this blog post, we will cover some of the most commonly used SLURM commands.
The following are from Quick reference sheet for SLURM resource manager
Job scheduling commands
Commands | Function | Basic Usage | Example |
---|---|---|---|
sbatch | submit a slurm job | sbatch [script] | $ sbatch job.sub |
scancel | delete slurm batch job | scancel [job_id] | $ scancel 123456 |
scontrol hold | hold slurm batch jobs | scontrol hold [job_id] | $ scontrol hold 123456 |
scontrol release | release hold on slurm batch jobs | scontrol release [job_id] | $ scontrol release 123456 |
Job management commands
Commands | Function |
---|---|
sinfo -a | list all queues |
squeue | list all jobs |
squeue -u userid | list jobs for userid |
squeue -t R | list running jobs |
smap | show jobs, partitions and nodes in a graphical network topology |
Job script basics
A typical job script will look like this:
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --time=02:00:00
#SBATCH --mem=128G
#SBATCH --mail-user=netid@gmail.com
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --error=JobName.%J.err
#SBATCH --output=JobName.%J.out
cd $SLURM_SUBMIT_DIR
module load modulename
your_commands_goes_here
Important options
Option | Examples | Description |
---|---|---|
–nodes | #SBATCH –nodes=1 | Number of nodes |
–cpus-per-task | #SBATCH –cpus-per-task=16 | Number of CPUs per node |
–time | #SBATCH –time=HH:MM:SS | Total time requested for your job |
–output | #SBATCH –output filename | STDOUT to a file |
–error | #SBATCH –error filename | STDERR to a file |
–mail-user | #SBATCH –mail-user user@domain.edu | Email address to send notifications |
Interactive session
To start a interactive session execute the following:
#this command will give 1 Node for a time of 4 hours
srun -N 1 -t 4:00:00 --pty /bin/bash
Getting information on past jobs
You can use slurm database to see how much memory your previous jobs used, e.g. the following command will report requested memory and used residential and virtual memory for job
sacct -j <JOBID> --format JobID,Partition,Submit,Start,End,NodeList%40,ReqMem,MaxRSS,MaxRSSNode,MaxRSSTask,MaxVMSize,ExitCode