Commonly Used SLURM Commands in HPC

HPC
SLURM
Computing
Author
Published

Saturday, March 1, 2025

Introduction

SLURM (Simple Linux Utility for Resource Management) is a powerful and flexible job scheduler used in High-Performance Computing (HPC) environments. It helps manage and allocate resources efficiently across a cluster of computers. In this blog post, we will cover some of the most commonly used SLURM commands.

The following are from Quick reference sheet for SLURM resource manager

Job scheduling commands

Commands Function Basic Usage Example
sbatch submit a slurm job sbatch [script] $ sbatch job.sub
scancel delete slurm batch job scancel [job_id] $ scancel 123456
scontrol hold hold slurm batch jobs scontrol hold [job_id] $ scontrol hold 123456
scontrol release release hold on slurm batch jobs scontrol release [job_id] $ scontrol release 123456

Job management commands

Commands Function
sinfo -a list all queues
squeue list all jobs
squeue -u userid list jobs for userid
squeue -t R list running jobs
smap show jobs, partitions and nodes in a graphical network topology

Job script basics

A typical job script will look like this:

#!/bin/bash
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --time=02:00:00
#SBATCH --mem=128G
#SBATCH --mail-user=netid@gmail.com
#SBATCH --mail-type=begin
#SBATCH --mail-type=end
#SBATCH --error=JobName.%J.err
#SBATCH --output=JobName.%J.out

cd $SLURM_SUBMIT_DIR

module load modulename

your_commands_goes_here

Important options

Option Examples Description
–nodes #SBATCH –nodes=1 Number of nodes
–cpus-per-task #SBATCH –cpus-per-task=16 Number of CPUs per node
–time #SBATCH –time=HH:MM:SS Total time requested for your job
–output #SBATCH –output filename STDOUT to a file
–error #SBATCH –error filename STDERR to a file
–mail-user #SBATCH –mail-user user@domain.edu Email address to send notifications

Interactive session

To start a interactive session execute the following:

#this command will give 1 Node for a time of 4 hours

srun -N 1 -t 4:00:00 --pty /bin/bash

Getting information on past jobs

You can use slurm database to see how much memory your previous jobs used, e.g. the following command will report requested memory and used residential and virtual memory for job

sacct -j <JOBID> --format JobID,Partition,Submit,Start,End,NodeList%40,ReqMem,MaxRSS,MaxRSSNode,MaxRSSTask,MaxVMSize,ExitCode