Maxwell : SLURM Reservations

SLURM does not have a role model which would allow to delegate certain tasks to individual users; you are either a SLURM admin or you are not. The ability to create and manage reservations is one of the tasks which would greatly benefit from delegation, which would allow for example to reserve a compute node for a beamtime. We have therefore implemented a web-service which does exactly that (but please note that it's work in progress!).

The web-service is accessible (from within DESY network) at https://max-portal.desy.de/reservation/. It requires DESY credentials to login, but you won't be able to do anything unless you have been added to the list of authorized accounts. If you would like to use the web-service please get in touch with maxwell.service@desy.de.

The reservation tool has a number of nice features:

  • it allows to set authorization per partition
  • it allows to limit the consumable resources per partition; it's possible to impose limits that a partition can never have more than N nodes reserved at a time.
  • it supports constraints; it guides you through set of constraints and makes it impossible to create invalid combination of constraints
  • it nicely handles groups and users
  • it comes with a REST API

The REST API has been used to create a couple of python scriplets, which allow to perform most of the tasks of the web-services directly from the command-line.

SLURM RESERVATION CLI

The python modules to handle slurm reservations can be found on maxwell under /software/tools/lib/python3.6/slurmres. The python modules are not bound to Maxwell, and should work on any machine (i.e. would allow to create reservation from a beamline pc).

Like for the web-service: without account authorization none of the modules will work. Assuming that you are authorized to manage reservations for partition allcpu, the CLI works as follows.

TOKEN

To work conveniently with the CLI you'll need a token:

Create Token
@max-wgse001:~$ python3 /software/tools/lib/python3.6/slurmres/slurmrestoken.py -h
usage: slurmrestoken.py [-h] [-t TOKEN_PATH | -r]

Creates a new token

optional arguments:
  -h, --help            show this help message and exit
  -t TOKEN_PATH, --token TOKEN_PATH
                        local path to token file
  -r, --revoke          revoke token on server

@max-wgse001:~$ python3 /software/tools/lib/python3.6/slurmres/slurmrestoken.py
Username: user
Password: 
Token generated at /home/user/slurm_res/slurm_res_token.dat


LIST
List reservations
@max-wgse001:~$ python3 /software/tools/lib/python3.6/slurmres/slurmreslist.py -h
usage: slurmreslist.py [-h] [-t TOKEN] [-p PARTITION]

List all reservations or all reservations of a partition.

optional arguments:
  -h, --help            show this help message and exit
  -t TOKEN, --token TOKEN
                        the path to the token
  -p PARTITION, --partition PARTITION
                        show reservations only for a specific partition

@max-wgse001:~$ python3 /software/tools/lib/python3.6/slurmres/slurmreslist.py -p allcpu

{'accounts': [],
 'burst_buffer': [],
 'core_cnt': 20,
 'end_time': '2021-07-19T14:00:00',
 'features': [],
 'flags': '',
 'licenses': {},
 'name': 'res_test_001',
 'node_cnt': 1,
 'node_list': 'max-cfel023',
 'partition': 'allcpu',
 'start_time': '2021-07-19T12:00:00',
 'tres_str': ['cpu=40'],
 'users': ['user1', 'user2']}

1 reservation found


CREATE

Create reservation
@max-wgse001:~$ python3 /software/tools/lib/python3.6/slurmres/slurmresnew.py -h
usage: slurmresnew.py -n NAME -p PARTITION -c COUNT -u user [user ...] -s
                      START -e END [-h] [-f feature [feature ...] | -N node
                      [node ...]] [-i | -P | -k] [-t TOKEN_PATH]

Create a reservation in a partition.

required arguments:
  -n NAME, --name NAME  the name of the reservation
  -p PARTITION, --partition PARTITION
                        name of partition the reservation is to be created in
  -c COUNT, --count COUNT
                        the amount of nodes
  -u user [user ...], --users user [user ...]
                        a list of users
  -s START, --start START
                        the start date of the reservation [Y-M-DTH:M]
  -e END, --end END     the end date of the reservation [Y-M-DTH:M]

optional arguments:
  -h, --help            show this help message and exit
  -f feature [feature ...], --features feature [feature ...]
                        optional features
  -N node [node ...], --nodes node [node ...]
                        optional specified nodes
  -i, --ignore_jobs     ignore currently running jobs
  -P, --preempt_jobs    kill currently running jobs if preemptable
  -k, --kill_jobs       kill currently running jobs
  -t TOKEN_PATH, --token TOKEN_PATH
                        local path to token file


@max-wgse001:~$ python3 /software/tools/lib/python3.6/slurmres/slurmresnew.py -p allcpu -n res_test_002 -c 1 -s 2021-07-19T12:00 -e 2021-07-19T14:00 -u user1,user2
{'accounts': [],
 'burst_buffer': [],
 'core_cnt': 20,
 'end_time': '2021-07-19T14:00:00',
 'features': [],
 'flags': '',
 'licenses': {},
 'name': 'res_test_001',
 'node_cnt': 1,
 'node_list': 'max-cfel023',
 'partition': 'allcpu',
 'start_time': '2021-07-19T12:00:00',
 'tres_str': ['cpu=40'],
 'users': ['user1', 'user2']}

{'accounts': [],
 'burst_buffer': [],
 'core_cnt': 20,
 'end_time': '2021-07-19T14:00:00',
 'features': [],
 'flags': '',
 'licenses': {},
 'name': 'res_test_002',
 'node_cnt': 1,
 'node_list': 'max-cfel024',
 'partition': 'allcpu',
 'start_time': '2021-07-19T12:00:00',
 'tres_str': ['cpu=40'],
 'users': ['user1', 'user2']}

Reservation successfully created




EDIT

Edit reservation
@max-wgse001:~$ python3 /software/tools/lib/python3.6/slurmres/slurmresedit.py -h
usage: slurmresedit.py -n NAME -p PARTITION [-h] [-c COUNT]
                       [-u user [user ...]] [-s [START]] [-e [END]]
                       [-N node [node ...]] [-i | -P | -k] [-t TOKEN_PATH]

Edit a reservation in a partition.

required arguments:
  -n NAME, --name NAME  the name of the reservation
  -p PARTITION, --partition PARTITION
                        name of the reservations partition

optional arguments:
  -h, --help            show this help message and exit
  -c COUNT, --count COUNT
                        the amount of nodes
  -u user [user ...], --users user [user ...]
                        a list of users
  -s [START], --start [START]
                        the start date of the reservation [Y-M-DTH:M]
  -e [END], --end [END]
                        the end date of the reservation [Y-M-DTH:M]
  -N node [node ...], --nodes node [node ...]
                        optional specified nodes
  -i, --ignore_jobs     ignore currently running jobs
  -P, --preempt_jobs    kill currently running jobs if preemptable
  -k, --kill_jobs       kill currently running jobs
  -t TOKEN_PATH, --token TOKEN_PATH
                        local path to token file


# change nodecount and list of users:
@max-wgse001:~$ python3 /software/tools/lib/python3.6/slurmres/slurmresedit.py -n res_test_002 -c 2 -u user1,user2,user3 -p allcpu
[...]
{'accounts': [],
 'burst_buffer': [],
 'core_cnt': 40,
 'end_time': '2021-07-19T14:00:00',
 'features': [],
 'flags': '',
 'licenses': {},
 'name': 'res_test_002',
 'node_cnt': 2,
 'node_list': 'max-cfel[024-025]',
 'partition': 'allcpu',
 'start_time': '2021-07-19T12:00:00',
 'tres_str': ['cpu=80'],
 'users': ['user1', 'user2', 'user3']}

Reservation successfully edited


DELETE

Delete reservation
@max-wgse001:~$ python3 /software/tools/lib/python3.6/slurmres/slurmresdelete.py -h
usage: slurmresdelete.py [-h] -n NAME -p PARTITION [-t TOKEN]

Delete a reservation in a partition.

optional arguments:
  -h, --help            show this help message and exit
  -n NAME, --name NAME  name of reservation
  -p PARTITION, --partition PARTITION
                        name of partition the reservation is in
  -t TOKEN, --token TOKEN
                        the path to the token

@max-wgse001:~$ python3 /software/tools/lib/python3.6/slurmres/slurmresdelete.py -n res_test_002 -p allcpu
Reservation successfully deleted


WRAPPER

Most people will presumably make use of the python code. For convenience there is a wrapper which invokes the python-module, syntax is identical:

convenience wrapper
@max-wgse001:~$ /software/tools/sbin/slurmreservation
usage: slurmreservation token|list|create|edit|delete

@max-wgse001:~$ /software/tools/sbin/slurmreservation create -n res_test_004 -c 1 -u user1,user2 -p allcpu -s 2021-07-19T14:50 -e 2021-07-19T16:00
{'accounts': [],
 'burst_buffer': [],
 'core_cnt': 48,
 'end_time': '2021-07-19T16:00:00',
 'features': [],
 'flags': '',
 'licenses': {},
 'name': 'res_test_004',
 'node_cnt': 1,
 'node_list': 'max-wn096',
 'partition': 'allcpu',
 'start_time': '2021-07-19T14:50:00',
 'tres_str': ['cpu=96'],
 'users': ['user1', 'user2']}

Reservation successfully created

@max-wgse001:~$ /software/tools/sbin/slurmreservation list

{'accounts': [],
 'burst_buffer': [],
 'core_cnt': 48,
 'end_time': '2021-07-19T16:00:00',
 'features': [],
 'flags': '',
 'licenses': {},
 'name': 'res_test_004',
 'node_cnt': 1,
 'node_list': 'max-wn096',
 'partition': 'allcpu',
 'start_time': '2021-07-19T14:50:00',
 'tres_str': ['cpu=96'],
 'users': ['user1', 'user2']}

1 reservation found

@max-wgse001:~$ /software/tools/sbin/slurmreservation delete -n res_test_004 -p allcpu
Reservation successfully deleted