Skip to main content

CLI Usage

Getting on the cluster

  • SSH: ssh hpc.grit.ucsb.edu
    You’ll land on a compute node inside a Slurm-backed interactive session (the login host forwards you automatically).

  • File transfersscp, rsync, sftp, etc, still work as usual to hpc.grit.ucsb.edu.

Partitions you can use

  • grit_nodes (default) – general use; includes nodes hpc-01/02/03.

  • Other partitions exist but are group-restricted 

Resource basics (what Slurm expects)

  • CPUs: -c <cores> per task, or -n <ntasks> total tasks.

  • Memory:

    • --mem=<MB|GB> = per-node memory, or

    • --mem-per-cpu=<MB|GB> = per allocated CPU.

  • Time: -t D-HH:MM:SS (set this realistically; backfill favors shorter jobs).

  • Partition: -p grit_nodes (default).

On this cluster the default memory per CPU is 4 GB if you don’t specify otherwise.

See what’s available / what you’re running

sinfo -p grit_nodes -Nel          # nodes, CPUs, memory, state
squeue -u $USER                   # your jobs
squeue --start                    # scheduler’s predicted start times

One-off noninteractive command

srun -p grit_nodes -c 2 --mem=8G -t 30:00 myprog --arg foo

Batch jobs (recommended for longer runs)

Create a script job.sh:

#!/bin/bash
#SBATCH -p grit_nodes
#SBATCH -c 8
#SBATCH --mem=64G
#SBATCH -t 12:00:00
#SBATCH -J myjob
#SBATCH -o slurm-%j.out

module load mytool   # if you use environment modules
python train.py --epochs 10

Submit + check:

sbatch job.sh
squeue -u $USER
tail -f slurm-<jobid>.out

Job arrays (many similar runs)

sbatch --array=0-99 -p grit_nodes -c 2 --mem=8G -t 1:00:00 job.sh

Inside job.sh use $SLURM_ARRAY_TASK_ID to index your inputs.

Cancel / modify

scancel <jobid>                     # cancel one
scancel -u $USER                    # cancel all yours
scontrol update JobId=<jobid> TimeLimit=02:00:00   # shorten time limit

Accounting & live stats

sacct -j <jobid> --format=JobID,JobName,State,Elapsed,Timelimit,MaxRSS,ReqMem,AllocCPUS
sstat -j <jobid>.batch --format=AveCPU,AveRSS,MaxRSS,MaxVMSize,TaskCPU

Common “why is my job pending?” reasons

  • (Resources): not enough free CPUs or memory right now. Try shorter -t, fewer CPUs, or less --mem.

  • (BeginTime): Slurm reserved a future start window for your job. Lower -t or resources to start sooner, or run squeue --start to see the ETA.

  • Constraints or node eligibility: very large per-node requests (CPUs or --mem) may only fit on the biggest nodes, which can lengthen wait time.

Good citizenship / performance tips

  • Prefer multiple smaller tasks over one huge single-node grab when you can.

  • Keep single-node requests well under a node’s total RAM/cores unless you truly need them.

  • Set realistic time limits; the backfill scheduler starts shorter jobs sooner.

Examples you can paste

# 1) CPU-only batch job with array:
sbatch -p grit_nodes -c 2 --mem=8G -t 2:00:00 --array=1-50 run_sim.sh

# 2) Memory-per-CPU style (8 cpus × 6 GB each = 48 GB/node):
srun -p grit_nodes -c 8 --mem-per-cpu=6G -t 1:00:00 --pty bash -l

# 3) Check predicted start times:
squeue --start

FAQ for this cluster

  • Do I need to salloc first? No. SSH gives you a Slurm-backed shell. Use srun for bigger interactive bursts, or sbatch for long runs.

  • VS Code / PyCharm remote? Not supported on the login host; use terminal + Slurm (srun/sbatch) instead.

  • Which partition do I use? grit_nodes unless you were explicitly added to a project-specific partition.

If you paste a specific job command you’re planning to run, I’ll check it against the node sizes here and suggest the best flags (CPUs/--mem/-t).