CLI Usage
Getting on the cluster
-
SSH:
ssh hpc.grit.ucsb.edu
You’ll land on a compute node inside a Slurm-backed interactive session (the login host forwards you automatically). -
File transfers:
scp
,rsync
, sftp, etc, still work as usual tohpc.grit.ucsb.edu
.
Partitions you can use
-
grit_nodes
(default) – general use; includes nodeshpc-01/02/03
. -
Other partitions exist but are group-restricted
Resource basics (what Slurm expects)
-
CPUs:
-c <cores>
per task, or-n <ntasks>
total tasks. -
Memory:
-
--mem=<MB|GB>
= per-node memory, or -
--mem-per-cpu=<MB|GB>
= per allocated CPU.
-
-
Time:
-t D-HH:MM:SS
(set this realistically; backfill favors shorter jobs). -
Partition:
-p grit_nodes
(default).
On this cluster the default memory per CPU is 4 GB if you don’t specify otherwise.
See what’s available / what you’re running
sinfo -p grit_nodes -Nel # nodes, CPUs, memory, state
squeue -u $USER # your jobs
squeue --start # scheduler’s predicted start times
One-off noninteractive command
srun -p grit_nodes -c 2 --mem=8G -t 30:00 myprog --arg foo
Batch jobs (recommended for longer runs)
Create a script job.sh
:
#!/bin/bash
#SBATCH -p grit_nodes
#SBATCH -c 8
#SBATCH --mem=64G
#SBATCH -t 12:00:00
#SBATCH -J myjob
#SBATCH -o slurm-%j.out
module load mytool # if you use environment modules
python train.py --epochs 10
Submit + check:
sbatch job.sh
squeue -u $USER
tail -f slurm-<jobid>.out
Job arrays (many similar runs)
sbatch --array=0-99 -p grit_nodes -c 2 --mem=8G -t 1:00:00 job.sh
Inside job.sh
use $SLURM_ARRAY_TASK_ID
to index your inputs.
Cancel / modify
scancel <jobid> # cancel one
scancel -u $USER # cancel all yours
scontrol update JobId=<jobid> TimeLimit=02:00:00 # shorten time limit
Accounting & live stats
sacct -j <jobid> --format=JobID,JobName,State,Elapsed,Timelimit,MaxRSS,ReqMem,AllocCPUS
sstat -j <jobid>.batch --format=AveCPU,AveRSS,MaxRSS,MaxVMSize,TaskCPU
Common “why is my job pending?” reasons
-
(Resources)
: not enough free CPUs or memory right now. Try shorter-t
, fewer CPUs, or less--mem
. -
(BeginTime)
: Slurm reserved a future start window for your job. Lower-t
or resources to start sooner, or runsqueue --start
to see the ETA. -
Constraints or node eligibility: very large per-node requests (CPUs or
--mem
) may only fit on the biggest nodes, which can lengthen wait time.
Good citizenship / performance tips
-
Prefer multiple smaller tasks over one huge single-node grab when you can.
-
Keep single-node requests well under a node’s total RAM/cores unless you truly need them.
-
Set realistic time limits; the backfill scheduler starts shorter jobs sooner.
Examples you can paste
# 1) CPU-only batch job with array:
sbatch -p grit_nodes -c 2 --mem=8G -t 2:00:00 --array=1-50 run_sim.sh
# 2) Memory-per-CPU style (8 cpus × 6 GB each = 48 GB/node):
srun -p grit_nodes -c 8 --mem-per-cpu=6G -t 1:00:00 --pty bash -l
# 3) Check predicted start times:
squeue --start
FAQ for this cluster
-
Do I need to
salloc
first? No. SSH gives you a Slurm-backed shell. Usesrun
for bigger interactive bursts, orsbatch
for long runs. -
VS Code / PyCharm remote? Not supported on the login host; use terminal + Slurm (
srun
/sbatch
) instead. -
Which partition do I use?
grit_nodes
unless you were explicitly added to a project-specific partition.
If you paste a specific job command you’re planning to run, I’ll check it against the node sizes here and suggest the best flags (CPUs/--mem
/-t
).