Slurm Usage
**[[Category:HPC]]
[[Category:UserDocs]]
== Quick Introduction**Introduction ==
Our more administrative oriented docs are at: [[Slurm]]
A queue in Slurm is called a partition. User commands are prefixed with
'''s'''.s
**=== Useful Commands**Commands ===
-
- sacct,
sbatch,sbatch,sinfo,sinfo,sprio,sprio,squeue,squeue,srun,srun,sshare,sshare,sstate
,etc.
..-sbatchsbatch:# sends jobs to theSlurmslurm queue
-sinfosinfo:# general info aboutSlurm
slurm-squeuesqueue:# inspect queue
-sinfo -lNe
:# more detailed info reporting with long format and nodes listed individually
-scancel 22
:# cancel job 22
-scontrol show job 2
:# show control info on job 2
sacct
**Examples:**
\
# find the queueque names:
\[user@computer ~\]$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
basic\*basic* up infinite 1 idle
\# test a job submission (don't run)
sbatch --test-only slurm\_test.slurm_test.sh
\# run a job
sbatch slurm\_test.slurm_test.sh
**=== Example Slurm job file:**file ===
\
#!/bin/bash
\## SLURM REQUIRED SETTINGS
\#SBATCH --partition=basic
\#SBATCH --nodes=1
\#SBATCH --ntasks=1
\#SBATCH --cpus-per-task=1
\## SLURM reads %x as the job name and %j as the job ID
\#SBATCH --output=%x-%j.out
\#SBATCH --error=%x-%j.err
\# Output some basic info with job
pwd; hostname; date;
\# requires ED2\_HOMEED2_HOME env var to be set
cd $ED2\_HOME/ED2_HOME/run
\# Job to run
./ed2
**Another Example:**
\
#!/bin/bash
\#
\#SBATCH -p basic # partition name (aka queue)
\#SBATCH -c 1 # number of cores
\#SBATCH --mem 100 # memory pool for all cores
\#SBATCH -t 0-2:00 # time (D-HH:MM)
\#SBATCH -o slurm.%N.%j.out # STDOUT
\#SBATCH -e slurm.%N.%j.err # STDERR
\# code or script to run
for i in {1..100000}; do
echo $RANDOM >>>> SomeRandomNumbers.txt
donesort SomeRandomNumbers.txt
**====Python Example:**
The output goes to a file in your home directory called hello-python-
\*.out,out, which should contain a message from Python.python.
\
#!/bin/bash
\## SLURM REQUIRED SETTINGS1G
\#SBATCH --nodes=1
\#SBATCH --ntasks=1
\#SBATCH --cpus-per-task=1
\## SLURM reads %x as the job name and %j as the job ID
\#SBATCH --output=%x-%j.out
\#SBATCH --error=%x-%j.err
\#SBATCH --job-name=hello-python # create a short name for your job
\#SBATCH --time=00:01:00 # total run time limit (HH:MM:SS)
\## Example use of Conda:
\# first source bashrc (with conda.sh), then conda can be used
source ~/.bashrc
\# make sure conda base is activated
conda activate
\# Other conda commands go here
\## run python
python hello.py
hello.py
should be something like this:
print('Hello from Python!python!')
**=== Computer Facts:**
=== Find out facts about the computer for the job file
\
# number of cores?
grep 'cpu cores' /proc/cpuinfo | uniq
\# memory
\[emery@bellows ~\]$ free -h
total used free shared buff/cache available
Mem: 1.5Ti 780Gi 721Gi 1.5Gi 8.6Gi 721Gi
Swap: 31Gi 0B 31Gi
=== nodes vs tasks vs cpus vs cores === Here's a very good writeup: https://researchcomputing.princeton.edu/support/knowledge-base/scaling-analysis. For most of our use cases, one node and one task is all that is needed (More than this requires special code such as mpi4py (MPI = Message Passing Interface).
#SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=N
is the correct way to request N cores for a job. Just replace N in that config with the number of cores you need
To get the max value for N for a computer:
scontrol show node | grep CPU
produces 'CPUTot'
Quoting directly from: https://login.scg.stanford.edu/faqs/cores/ Also useful: https://stackoverflow.com/questions/65603381/slurm-nodes-tasks-cores-and-cpus
=== See Also === https://www.carc.usc.edu/user-information/user-guides/hpc-basics/slurm-templates
https://docs.rc.fas.harvard.edu/kb/convenient-slurm-commands/
https://csc.cnsi.ucsb.edu/docs/slurm-job-scheduler
Python: https://rcpedia.stanford.edu/topicGuides/jobArrayPythonExample.html