Skip to main content

Slurm Usage

[[Category:HPC]] [[Category:UserDocs]]

== Quick Introduction == Our more administrative oriented docs are at: [[Slurm]]

A queue in Slurm is called a partition. User commands are prefixed with '''s'''.

=== Useful Commands ===

  • sacct, sbatch, sinfo, sprio, squeue, srun, sshare, sstate etc... sbatch # sends jobs to the slurm queue sinfo # general info about slurm squeue # inspect queue sinfo -lNe # more detailed info reporting with long format and nodes listed individually scancel 22 # cancel job 22 scontrol show job 2 # show control info on job 2

Examples:

# find the que names:
[user@computer ~]$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
basic*       up   infinite      1   idle 

#  test a job submission (don't run)
sbatch --test-only slurm_test.sh

#  run a job
sbatch slurm_test.sh


=== Example Slurm job file ===

#!/bin/bash
## SLURM REQUIRED SETTINGS
#SBATCH --partition=basic
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1

## SLURM reads %x as the job name and %j as the job ID
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

# Output some basic info with job
pwd; hostname; date;

# requires ED2_HOME env var to be set
cd $ED2_HOME/run

# Job to run
./ed2

Another Example:

#!/bin/bash
# 
#SBATCH -p basic # partition name (aka queue)
#SBATCH -c 1 # number of cores
#SBATCH --mem 100 # memory pool for all cores
#SBATCH -t 0-2:00 # time (D-HH:MM)
#SBATCH -o slurm.%N.%j.out # STDOUT
#SBATCH -e slurm.%N.%j.err # STDERR

# code or script to run 
for i in {1..100000}; do
echo $RANDOM >> SomeRandomNumbers.txt
donesort SomeRandomNumbers.txt                                                                                                                                                 

====Python Example==== The output goes to a file in your home directory called hello-python-*.out, which should contain a message from python.

#!/bin/bash

## SLURM REQUIRED SETTINGS1G
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1 

## SLURM reads %x as the job name and %j as the job ID
#SBATCH --output=%x-%j.out
#SBATCH --error=%x-%j.err

#SBATCH --job-name=hello-python     # create a short name for your job
#SBATCH --time=00:01:00          # total run time limit (HH:MM:SS)

## Example use of Conda:

# first source bashrc (with conda.sh), then conda can be used
source ~/.bashrc

# make sure conda base is activated
conda activate

# Other conda commands go here


## run python 
python hello.py

hello.py should be something like this:

print('Hello from python!')

=== Computer Facts === Find out facts about the computer for the job file

# number of cores?
grep 'cpu cores' /proc/cpuinfo | uniq

# memory
[emery@bellows ~]$ free -h
              total        used        free      shared  buff/cache   available
Mem:          1.5Ti       780Gi       721Gi       1.5Gi       8.6Gi       721Gi
Swap:          31Gi          0B        31Gi

=== nodes vs tasks vs cpus vs cores === Here's a very good writeup: https://researchcomputing.princeton.edu/support/knowledge-base/scaling-analysis. For most of our use cases, one node and one task is all that is needed (More than this requires special code such as mpi4py (MPI = Message Passing Interface).

#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=N

is the correct way to request N cores for a job. Just replace N in that config with the number of cores you need

To get the max value for N for a computer:

scontrol show node | grep CPU

produces 'CPUTot'

Quoting directly from: https://login.scg.stanford.edu/faqs/cores/ Also useful: https://stackoverflow.com/questions/65603381/slurm-nodes-tasks-cores-and-cpus

=== See Also === https://www.carc.usc.edu/user-information/user-guides/hpc-basics/slurm-templates

https://docs.rc.fas.harvard.edu/kb/convenient-slurm-commands/

https://csc.cnsi.ucsb.edu/docs/slurm-job-scheduler

Genomics related: https://wiki.itap.purdue.edu/display/CGSB/How-to+Genomics

Python: https://rcpedia.stanford.edu/topicGuides/jobArrayPythonExample.html