Commonly Used SLURM Commands in HPC

Introduction

SLURM (Simple Linux Utility for Resource Management) is a powerful and flexible job scheduler used in High-Performance Computing (HPC) environments. It helps manage and allocate resources efficiently across a cluster of computers. In this blog post, we will cover some of the most commonly used SLURM commands.

The following are from Quick reference sheet for SLURM resource manager

Job scheduling commands

Commands	Function	Basic Usage	Example
sbatch	submit a slurm job	sbatch [script]	$ sbatch job.sub
scancel	delete slurm batch job	scancel [job_id]	$ scancel 123456
scontrol hold	hold slurm batch jobs	scontrol hold [job_id]	$ scontrol hold 123456
scontrol release	release hold on slurm batch jobs	scontrol release [job_id]	$ scontrol release 123456

Job management commands

Commands	Function
sinfo -a	list all queues
squeue	list all jobs
squeue -u userid	list jobs for userid
squeue -t R	list running jobs
smap	show jobs, partitions and nodes in a graphical network topology

Job script basics

A typical job script will look like this:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --time=02:00:00
#SBATCH --mem=128G
#SBATCH --mail-user=netid@gmail.com
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --error=JobName.%J.err
#SBATCH --output=JobName.%J.out

cd $SLURM_SUBMIT_DIR

module load modulename

your_commands_goes_here

Important options

Option	Examples	Description
–nodes	#SBATCH –nodes=1	Number of nodes
–cpus-per-task	#SBATCH –cpus-per-task=16	Number of CPUs per node
–time	#SBATCH –time=HH:MM:SS	Total time requested for your job
–output	#SBATCH –output filename	STDOUT to a file
–error	#SBATCH –error filename	STDERR to a file
–mail-user	#SBATCH –mail-user user@domain.edu	Email address to send notifications

Interactive session

To start a interactive session execute the following:

#this command will give 1 Node for a time of 4 hours

srun -N 1 -t 4:00:00 --pty /bin/bash

Getting information on past jobs

You can use slurm database to see how much memory your previous jobs used, e.g. the following command will report requested memory and used residential and virtual memory for job

sacct -j <JOBID> --format JobID,Partition,Submit,Start,End,NodeList%40,ReqMem,MaxRSS,MaxRSSNode,MaxRSSTask,MaxVMSize,ExitCode