Maxwell : Useful commands

Overview

CommandEnvironmentman pageExplainedExample
my-resources

Show major resources and their availability for your account
my-partitions

Show all maxwell partitons and which ones are accessible for your account
my-licenses

Show licenses for major commercial products. Lists all the licenses you are currently usingmy-licenses -p matlab
my-quota

Show quota of maxwell home directory
xwhich

find applications and print required setupxwhich Avizo
Job controls
sbatch
man sbatchSubmit a batch job to the clustersbatch -p allcpu --constraint=P100 --time=1-12:00:00
salloc
man sallocSubmit a request for an interactive job to the clustersalloc --partition=allcpu --nodes=1
srun
man srunRun interactive jobsrun -p allcpu --pty -t 0-06:00 matlab_R2018a
scancel
man scancelsignal jobs or job stepsscancel -j 12345
scontrol
man scontrolview and modify Slurm configuration and state
Job & Cluster information

Please note that these commands create additional load on the management nodes, so you should not execute them without need or with high frequency.
Every 60sec would be nice.

savailmodule load maxwellsavail -hShow the real availability of nodessavail -p maxcpu
webavail

web based and much more powerful alternative to sview
sview
man sviewGUI for SLURM to show current status of the cluster
sinfo
man sinfo view information about Slurm nodes and partitions
squeue
man squeueview information about jobs in scheduling queue
sacct
man sacctAccounting  information  for  jobs invoked with Slurm
sstat
man sstatStatus information for running jobs invoked with Slurm
slurmmodule load maxwell


max-limits

Show limits of partitionsmax-limits -p jhub -a


sbatch

Create a batch script my-script.sh like the following and submit with sbatch my-script.sh:

#!/bin/bash
#SBATCH --time      0-00:01:00
#SBATCH --nodes     1
#SBATCH --partition maxcpu
#SBATCH --job-name  slurm-01
export LD_PRELOAD=""                 # useful on max-display nodes, harmless on others
source /etc/profile.d/modules.sh     # make the module command available
...                                  # your actual job 

That's the core information would you probably should also keep. Note: never add a #SBATCH after a regular command. It will be ignored like any other comment.

A simple example for a mathematica:

#!/bin/bash
#SBATCH --time      0-00:01:00
#SBATCH --nodes     1
#SBATCH --partition allcpu
#SBATCH --job-name  mathematica
export LD_PRELOAD=""                 # useful on max-display nodes, harmless on others
source /etc/profile.d/modules.sh     # make the module command is available
module load mathematica
export nprocs=$((`/usr/bin/nproc` / 2))   # we have hyperthreading enabled. nprocs==number of physical cores
math -noprompt -run '<<math-trivial.m' 

# sample math-trivial.m:
tmp = Environment["nprocs"]
nprocs = FromDigits[tmp]
LaunchKernels[nprocs]
Do[Pause[1];f[i],{i,nprocs}] // AbsoluteTiming        >> "math-trivial.out"
ParallelDo[Pause[1];f[i],{i,nprocs}] // AbsoluteTiming  >>> "math-trivial.out"
Quit[]

salloc

salloc uses the same syntax as sbatch.

# request one node with a P100 GPU for 8hours in the allcpu partition:
salloc --nodes=1 --partition=allcpu --constraint=P100 --time=08:00:00

# start an interactive graphical matlab session on the allocated host.
ssh -t -Y $SLURM_JOB_NODELIST matlab_R2018a

# the allocation won't disappear when being idle. You have to terminate the session
exit


scancel

scancel 1234                       # cancel job 1234
scancel -u $USER                   # cancel all my jobs
scancel -u $USER -t PENDING        # cancel all my pending jobs
scancel --name myjob               # cancel a named job
scancel 1234_3                     # cancel an indexed job in a job array


sinfo

sinfo                                                               # basic list of partitions
sinfo -N -p allcpu                                                  # list all nodes and state in all partition 
sinfo -N -p petra4 -o "%10P %.6D %8c %8L %12l %8m %30f %N"          # list all nodes with limits and features in petra4 partition 

squeue

squeue                              # show all jobs
squeue -u $USER                     # show all jobs of user 
squeue -u $USER -p upex -t PENDING  # all pending jobs of user in upex partition

sacct

Provides accounting information. Never use it for time spans exceeding a month!

sacct -j 1628456                                                    # accounting information for jobid 
sacct -u $USER                                                      # todays jobs 

# get detailed information about all my jobs since 2019-01-01 and grep for all that FAILED:
sacct -u $USER --format="partition,jobid,state,start,end,nodeList,CPUTime,MaxRSS" --starttime 2019-01-01 | grep FAILED 

scontrol

Display information about currently running/pending jobs, configuration of partitions and nodes. Allows to alter job characteristics of pending jobs.

scontrol show job 12345                              # show information about job 12345. Will show nothing after a job has finished.
scontrol show reservation                            # list current and future reservations
sontrol update jobid=12345 partition=allcpu          # move pending job 12345 to partition allcpu

slurm

module load maxwell tools
slurm 

#Show or watch job queue:
 slurm [watch] queue     # show own jobs
 slurm [watch] q <user>  # show user's jobs
 slurm [watch] quick     # show quick overview of own jobs
 slurm [watch] shorter   # sort and compact entire queue by job size
 slurm [watch] short     # sort and compact entire queue by priority
 slurm [watch] full      # show everything
 slurm [w] [q|qq|ss|s|f] shorthands for above!

 slurm qos               # show job service classes
 slurm top [queue|all]   # show summary of active users

#Show detailed information about jobs:
 slurm prio [all|short]  # how priority components
 slurm j|job <jobid>     # how everything else
 slurm steps <jobid>     # show memory usage of running srun job steps

#Show usage and fair-share values from accounting database:
 slurm h|history <time>  # show jobs finished since, e.g. "1day" (default)
 slurm shares

#Show nodes and resources in the cluster:
 slurm p|partitions      # all partitions
 slurm n|nodes           # all cluster nodes
 slurm c|cpus            # total cpu cores in use
 slurm cpus <partition>  # cores available to partition, allocated and free
 slurm cpus jobs         # cores/memory reserved by running jobs
 slurm cpus queue        # cores/memory required by pending jobs
 slurm features          # List features and GRES
 slurm brief_features    # List features with node counts
 slurm matrix_features   # List possible combinations of features with node counts