Skip to main content

Slurm Usage

[[Category:HPC]] [[Category:UserDocs]]

== Quick Introduction == Our more administrative oriented docs are at: [[Slurm]]

A queue in Slurm is called a partition. User commands are prefixed with '''s'''.

=== Useful Commands ===

  • sacct, sbatch, sinfo, sprio, squeue, srun, sshare, sstate etc... sbatch # sends jobs to the slurm queue sinfo # general info about slurm squeue # inspect queue sinfo -lNe # more detailed info reporting with long format and nodes listed individually scancel 22 # cancel job 22 scontrol show job 2 # show control info on job 2

Examples:

# find the que names:
[user@computer ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
basic*       up   infinite      1   idle 

#  test a job submission (don't run)
sbatch --test-only slurm_test.sh

#  run a job
sbatch slurm_test.sh


=== Example Slurm job file ===

#!/bin/bash
## SLURM REQUIRED SETTINGS
#SBATCH --partition=basic
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1

## SLURM reads %x as the job name and %j as the job ID
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

# Output some basic info with job
pwd; hostname; date;

# requires ED2_HOME env var to be set
cd $ED2_HOME/run

# Job to run
./ed2

Another Example:

#!/bin/bash
# 
#SBATCH -p basic # partition name (aka queue)
#SBATCH -c 1 # number of cores
#SBATCH --mem 100 # memory pool for all cores
#SBATCH -t 0-2:00 # time (D-HH:MM)
#SBATCH -o slurm.%N.%j.out # STDOUT
#SBATCH -e slurm.%N.%j.err # STDERR

# code or script to run 
for i in {1..100000}; do
echo $RANDOM >> SomeRandomNumbers.txt
donesort SomeRandomNumbers.txt                                                                                                                                                 

====Python Example==== The output goes to a file in your home directory called hello-python-*.out, which should contain a message from python.

#!/bin/bash

## SLURM REQUIRED SETTINGS1G
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1 

## SLURM reads %x as the job name and %j as the job ID
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

#SBATCH --job-name=hello-python     # create a short name for your job
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)

## Example use of Conda:

# first source bashrc (with conda.sh), then conda can be used
source ~/.bashrc

# make sure conda base is activated
conda activate

# Other conda commands go here


## run python 
python hello.py

hello.py should be something like this:

print('Hello from python!')

=== Computer Facts === Find out facts about the computer for the job file

# number of cores?
grep 'cpu cores' /proc/cpuinfo | uniq

# memory
[emery@bellows ~]$ free -h
              total        used        free      shared  buff/cache   available
Mem:          1.5Ti       780Gi       721Gi       1.5Gi       8.6Gi       721Gi
Swap:          31Gi          0B        31Gi

=== nodes vs tasks vs cpus vs cores === Here's a very good writeup: https://researchcomputing.princeton.edu/support/knowledge-base/scaling-analysis. For most of our use cases, one node and one task is all that is needed (More than this requires special code such as mpi4py (MPI = Message Passing Interface).

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=N

is the correct way to request N cores for a job. Just replace N in that config with the number of cores you need

To get the max value for N for a computer:

scontrol show node | grep CPU

produces 'CPUTot'

Quoting directly from: https://login.scg.stanford.edu/faqs/cores/ Also useful: https://stackoverflow.com/questions/65603381/slurm-nodes-tasks-cores-and-cpus

=== See Also === https://www.carc.usc.edu/user-information/user-guides/hpc-basics/slurm-templates

https://docs.rc.fas.harvard.edu/kb/convenient-slurm-commands/

https://csc.cnsi.ucsb.edu/docs/slurm-job-scheduler

Genomics related: https://wiki.itap.purdue.edu/display/CGSB/How-to+Genomics

Python: https://rcpedia.stanford.edu/topicGuides/jobArrayPythonExample.html

Finding info for slurm.conf To find the number of CPUs, SocketsPerBoard, and CoresPerSocket on Ubuntu 20, you can use the following commands:

  1. To find the number of CPUs: grep -c ^processor /proc/cpuinfo

  2. To find the number of sockets per board: sudo dmidecode -t 4 | grep "Socket Designation" | awk -F: '{print $2}' | uniq | wc -l

This command will output the number of unique socket designations found in the dmidecode output, which should correspond to the actual number of physical sockets on your motherboard.

  1. To find the number of cores per socket: lscpu | grep "Core(s) per socket" | awk '{print $4}'