Maxwell : random samples

sbatch

Create a batch script my-script.sh like the following and submit with sbatch my-script.sh:

#!/bin/bash
#SBATCH --time=0-00:01:00
#SBATCH --nodes=1
#SBATCH --partition=maxcpu
#SBATCH --job-name=slurm-01
unset LD_PRELOAD                     # useful on max-display nodes, harmless on others
source /etc/profile.d/modules.sh     # make the module command available
...                                  # your actual job 

That's the core information would you probably should also keep. Note: never add a #SBATCH after a regular command. It will be ignored like any other comment.

A simple example for a mathematica:

#!/bin/bash
#SBATCH --time=0-00:01:00
#SBATCH --nodes=1
#SBATCH --partition=allcpu
#SBATCH --job-name=mathematica
unset LD_PRELOAD                     
source /etc/profile.d/modules.sh     
module purge
module load mathematica
export nprocs=$((`/usr/bin/nproc` / 2))   # we have hyperthreading enabled. nprocs==number of physical cores
math -noprompt -run '<<math-trivial.m' 

# sample math-trivial.m:
tmp = Environment["nprocs"]
nprocs = FromDigits[tmp]
LaunchKernels[nprocs]
Do[Pause[1];f[i],{i,nprocs}] // AbsoluteTiming        >> "math-trivial.out"
ParallelDo[Pause[1];f[i],{i,nprocs}] // AbsoluteTiming  >>> "math-trivial.out"
Quit[]

salloc

salloc uses the same syntax as sbatch.

# request one node with a P100 GPU for 8hours in the allcpu partition:
salloc --nodes=1 --partition=allcpu --constraint=P100 --time=08:00:00

# start an interactive graphical matlab session on the allocated host.
ssh -t -Y $SLURM_JOB_NODELIST matlab_R2018a

# the allocation won't disappear when being idle. You have to terminate the session
exit


scancel

scancel 1234                       # cancel job 1234
scancel -u $USER                   # cancel all my jobs
scancel -u $USER -t PENDING        # cancel all my pending jobs
scancel --name myjob               # cancel a named job
scancel 1234_3                     # cancel an indexed job in a job array


sinfo

sinfo                                                               # basic list of partitions
sinfo -N -p allcpu                                                  # list all nodes and state in allcpu partition 
sinfo -N -p petra4 -o "%10P %.6D %8c %8L %12l %8m %30f %N"          # list all nodes with limits and features in petra4 partition 

squeue

squeue                              # show all jobs
squeue -u $USER                     # show all jobs of user 
squeue -u $USER -p upex -t PENDING  # all pending jobs of user in upex partition

sacct

Provides accounting information. Never use it for time spans exceeding a month!

sacct -j 1628456                                                    # accounting information for jobid 
sacct -u $USER                                                      # todays jobs 

# get detailed information about all my jobs since 2019-01-01 and grep for all that FAILED:
sacct -u $USER --format="partition,jobid,state,start,end,nodeList,CPUTime,MaxRSS" --starttime 2019-01-01 | grep FAILED 

scontrol

Display information about currently running/pending jobs, configuration of partitions and nodes. Allows to alter job characteristics of pending jobs.

scontrol show job 12345                              # show information about job 12345. Will show nothing after a job has finished.
scontrol show reservation                            # list current and future reservations
sontrol update jobid=12345 partition=allcpu          # move pending job 12345 to partition allcpu

slurm

module load maxwell tools
slurm 

#Show or watch job queue:
 slurm [watch] queue     # show own jobs
 slurm [watch] q <user>  # show user's jobs
 slurm [watch] quick     # show quick overview of own jobs
 slurm [watch] shorter   # sort and compact entire queue by job size
 slurm [watch] short     # sort and compact entire queue by priority
 slurm [watch] full      # show everything
 slurm [w] [q|qq|ss|s|f] shorthands for above!

 slurm qos               # show job service classes
 slurm top [queue|all]   # show summary of active users

#Show detailed information about jobs:
 slurm prio [all|short]  # how priority components
 slurm j|job <jobid>     # how everything else
 slurm steps <jobid>     # show memory usage of running srun job steps

#Show usage and fair-share values from accounting database:
 slurm h|history <time>  # show jobs finished since, e.g. "1day" (default)
 slurm shares

#Show nodes and resources in the cluster:
 slurm p|partitions      # all partitions
 slurm n|nodes           # all cluster nodes
 slurm c|cpus            # total cpu cores in use
 slurm cpus <partition>  # cores available to partition, allocated and free
 slurm cpus jobs         # cores/memory reserved by running jobs
 slurm cpus queue        # cores/memory required by pending jobs
 slurm features          # List features and GRES
 slurm brief_features    # List features with node counts
 slurm matrix_features   # List possible combinations of features with node counts

Ensuring minimum memory per core

The Maxwell cluster is not configured for consumable resource like memory. For an mpi-job running on heterogeneous hardware, you have to prepare your batch-script to tailor the number of cores used to the available memory for each node. A simple example:

#!/bin/bash
#SBATCH --partition=maxcpu
unset LD_PRELOAD
source /etc/profile.d/modules.sh
module purge
module load mpi/openmpi-x86_64
 
# set hostfile
HOSTFILE=/tmp/hosts.$SLURM_JOB_ID
rm -f $HOSTFILE
 
# set minimum 40GB per core
mem_per_core=$((40*1024))
 
# generate hostfile
for node in $(srun hostname -s | sort -u) ; do
   mem=$(sinfo -n $node --noheader -o '%m')
   cores=$(sinfo -n $node --noheader -o '%c')
   slots=$(( $mem / $mem_per_core ))
   slots=$(( $cores < $slots ? $cores : $slots ))
   echo $node slots=$slots >> $HOSTFILE
done
 
# run ...
mpirun --hostfile $HOSTFILE


For a homogeneous set of nodes life becomes much easier

#!/bin/bash
#SBATCH --partition=allcpu,maxcpu
#SBATCH --constraint='[(EPYC&7402)|Gold-6240|Gold-6140]'
#SBATCH --nodes=8
unset LD_PRELOAD
source /etc/profile.d/modules.sh
module purge
module load mpi/openmpi-x86_64

# only use physical cores. Since nodes are all identical (constraint) this fits for all nodes
nprocs=$(( $(nproc) / 2 ))

# -N ensure $nprocs processes per node
mpirun -N $nprocs hostname | sort | uniq -c # should have same counts for each node