schedmd offers exhaustive documentation how to use SLURM: http://slurm.schedmd.com/. We have just collected a few examples below.

Maxwell useful commands provides a short list of commands which might become handy.

SLURM offers three distinct ways to run jobs

  • sbatch - submits a script which instructs slurm what to do.
  • srun - runs a command or script in parallel on the cluster. All instructions are specified on the command line.
  • salloc - just allocates resources according to instructions specified on the command line, but doesn't actually execute any command on the allocated node.

Using sbatch

sbatch is the prime command to run jobs on the maxwell cluster. It accepts a script, copies the script to the scheduler, and the scheduler executes the script as soon as compute resources become available. It's particularly handy for running large number of tasks, without the need to worry about the whereabouts of the jobs. Unlike salloc or srun it returns your shell immediately, so you can continue to work, and the jobs will not be affected by accidentally closing a shell or session or crash of the login node used to submit jobs. The jobs won't even be affected by accidentially deleting the job-script: slurm has a copy and you can even recreate the job-scripts while jobs are still running or pending. The details are explained on a separate page on running batch jobs.  

Using srun

For details visit the documentation: https://slurm.schedmd.com/srun.html

srun allows to run one or several instances of the same command in parallel on compute nodes.

# alllocate a node with at least 40 cores and run 40 copies of the same command:
@max-wgs001:~$ srun --partition=all --ntasks=40 /usr/bin/hostname | sort  | uniq -c
srun: job 8474306 queued and waiting for resources
srun: job 8474306 has been allocated resources
     40 max-cfel003.desy.de

@max-wgs001:~$ srun --partition=all --ntasks=123 /usr/bin/hostname  | sort  | uniq -c
srun: job 8474446 queued and waiting for resources
srun: job 8474446 has been allocated resources
     43 max-exfl195.desy.de
     40 max-ferrari018.desy.de
     40 max-ferrari019.desy.de

Lets assume you quickly want to convert image img_001.tif to img_123.png in parallel. You could do the following:

#!/bin/bash
# create a simple script named test-srun.sh:

echo "processing img_$(printf "%03d" $SLURM_PROCID).tif on $(hostname)"
convert img_$(printf "%03d" $SLURM_PROCID).tif img_$(printf "%03d" $SLURM_PROCID).png
exit

# make sure the script can be executed:
@max-wgs001:~$ chmod u+x test-srun.sh

# run the script:
@max-wgs001:~$ srun --partition=all --ntasks=123 ./test-srun.sh
[...]
processing img_028.tif on max-exfl259.desy.de
processing img_035.tif on max-exfl259.desy.de
processing img_022.tif on max-exfl259.desy.de

...  but there are better ways doing it ...

Using salloc

salloc allows to run "batch jobs" interactively. For example ...

@max-wgs001:~$ salloc -p all --time=0-00:10:00 --job-name=salloc.test 
salloc: Granted job allocation 8562766
salloc: Waiting for resource configuration
salloc: Nodes max-wn050 are ready for job

... allocates a node in the all partition for 10 minutes. Being lucky, the node becomes available immediately, but if the partition is busy, it can take quite a while. While the node is allocated you can login to the node and work there like on an ordinary workgroup server ...

@max-wgs001:~$ ssh max-wn050
Last login: Thu Apr 22 18:21:39 2021 from max-wgse002.desy.de

... but all your processes will be terminated, all your files in /tmp, /scratch deleted, as soon as the node allocation expires ...

@max-wgs001:~$ salloc -p all --time=0-00:10:00 --job-name=salloc.test 
salloc: Granted job allocation 8562776
salloc: Waiting for resource configuration
salloc: Nodes max-wn050 are ready for job
@max-wgs001:~$ ssh max-wn050
Last login: Mon Aug  2 08:34:25 2021 from max-wgse001.desy.de
@max-wn050:~$ sleep 600
@max-wn050:~$ 
salloc: Job 8562776 has exceeded its time limit and its allocation has been revoked.
                                                                                    Killed by signal 1.
@max-wgs001:~$ exit # leave the shell spawned by salloc
@max-wgs001:~$ 

When running salloc, it spawn by default a new shell. The shell is still active, even after the allocation has been revoked. To return to the original shell, just "exit". Also do so when you are done with your job:

@max-wgs001:~$ salloc -p all
salloc: Granted job allocation 8562882
salloc: Waiting for resource configuration
salloc: Nodes max-wn081 are ready for job
@max-wgs001:~$ ssh max-wn081
@max-wn081:~$ ... do something
@max-wn081:~$ exit
logout
Connection to max-wn081 closed.
# at this point your allocation is still running!

@max-wgs001:~$ exit # leave the shell spawned by salloc
exit
salloc: Relinquishing job allocation 8562882  # only now the allocation is really done!


The simple examples illustrates the downside of using salloc: it blocks allocated resources even while the are idle. Resources utilization of salloc'ated nodes is in average below 1%. There are use cases where salloc is the only option to run certain jobs, and it's very handy to run quick tests, so we have no intention to impose limitations, but it would be helpful if you

  • use salloc only when really unavoidable, not just because it appears more convenient
  • use if preferably only for short running tests
  • don't forget to finish an allocation
  • preferably invoke the commands directly from salloc, for example
@max-wgs001:~$ module load mpi/openmpi-x86_64

@max-wgs001:~$ salloc -p all mpirun /usr/bin/hostname
salloc: Granted job allocation 8562874
salloc: Waiting for resource configuration
salloc: Nodes max-wn081 are ready for job
max-wn081.desy.de
[...]
max-wn081.desy.de
salloc: Relinquishing job allocation 8562874

# the allocation get's automatically terminated once the command (mpirun in this case) finishes.