Maxwell : SLURM REST API

Current versions of SLURM provide a REST API daemon which allows to submit and manage jobs through REST calls for example via curl. For users there is hardly a benefit using the REST API; the slurm commands like sbatch, squeue, etc. are much more handy. It provides however the possibility to launch and manage batch jobs from a (web-)service and - under certain circumstances - the handling of batch jobs on behalf of other users.

BE AWARE: whoever knows your token has access to all your files on maxwell! We therefore disabled the ability to generate unlimited tokens using scontrol. Only the mechanism described below will work.

The SLURM REST daemon is currently accessible at https://max-portal.desy.de/sapi. The service has just been started; consider it being a pilot installation.

Documentation

General information about SLURMs REST APIhttps://slurm.schedmd.com/rest.html
SLURM REST API referencehttps://slurm.schedmd.com/rest_api.html
Information about JSON web tokenshttps://slurm.schedmd.com/jwt.html
slides for the SLUG 2020 talkhttps://slurm.schedmd.com/SLUG20/REST_API.pdf
slides for the SLUG 2019 talkhttps://slurm.schedmd.com/SLUG19/REST_API.pdf

JSON web token (JWT)

slurmrestd is configured to work only with JWTs for authentication. To talk to slurmrestd you first need to generate such a token and set the environment SLURM_TOKEN:

# in order to generate a slurm JWT, first generate a maxwell portal token:
portal_token
> Username:
> Password:
> Portal token generated at /home/username/.maxwell/portal.token
# the portal token never expires - if desired, use -r flag to revoke or create new portal token to overwrite old one (-h flag for help)

# generate a slurm JWT with a default lifespan of 1800 seconds:
slurm_token
> SLURM_TOKEN=long.token

# generate a token with a lifespan of 1 day (max lifespan):
slurm_token -l $((3600*24))

# generate a token specifying a username
slurm_token -l $((3600*24)) -u $USER  
# only a privileged account can specify a username - drop us a mail at maxwell.service@desy.de if you need to create JWTs for other accounts

# generate a token and set $ so it can be used in curl:
export $(slurm_token -l $((3600*24)))

Note: we decided to name the environment variable SLURM_TOKEN, so it will not have any effect on standard slurm commands like sbatch or squeue, even if the token expired. To use the token for standard slurm commands (which should hardly ever be necessary) just set SLURM_JWT=$SLURM_TOKEN.

Job submission

To submit a job, the job-script has to be embedded into a json string. A very simple example for a job script:

cat job.json
> {"job":{"partition": "short","tasks":1,"name":"test","nodes":1,"current_working_directory":"/home/user","environment":{"PATH":"/bin:/usr/bin/:/usr/local/bin/","LD_LIBRARY_PATH":"/lib/:/lib64/:/usr/local/lib"}},"script":"#!/bin/bash\nsrun hostname\necho \"hello world\"\nsleep 300"}

curl -H "Content-Type: application/json" -H X-SLURM-USER-NAME:$(whoami) -H X-SLURM-USER-TOKEN:$SLURM_TOKEN \
     -X POST https://max-portal.desy.de/sapi/slurm/v0.0.38/job/submit -d@job.json


For complex batch-scripts that might quickly become unfeasible. For simpler scripts one can convert a job-script into a json'ized strings, for example

# sample job script
cat job.script
> #!/bin/bash
> echo "hello script"
> srun hostname
> echo $SLURM_JOB_ID
> sleep 100

# convert into string
scr=$(cat job.script | sed 's|"|\\"|g' | sed ':a;N;$!ba;s|\n|\\n|g' )
echo $scr
> #!/bin/bash\necho \"hello script\"\nsrun hostname\necho $SLURM_JOB_ID\nsleep 100

# embed into payload
payload=$(cat <<eof
{"job":{"partition": "short","tasks":1,"name":"test","nodes":1,"current_working_directory":"/home/$USER","environment":{"PATH":"/bin:/usr/bin/:/usr/local/bin/","LD_LIBRARY_PATH":"/lib/:/lib64/:/usr/local/lib"}},"script":"$scr"}
eof
)
echo $payload
> {"job":{"partition": "short","tasks":1,"name":"test","nodes":1,"current_working_directory":"/home/user","environment":{"PATH":"/bin:/usr/bin/:/usr/local/bin/","LD_LIBRARY_PATH":"/lib/:/lib64/:/usr/local/lib"}},"script":"#!/bin/bash\necho \"hello script\"\nsrun hostname\necho $SLURM_JOB_ID\nsleep 100"}

# submit job
curl -H "Content-Type: application/json" -H X-SLURM-USER-NAME:$(whoami) -H X-SLURM-USER-TOKEN:$SLURM_TOKEN -X POST 'https://max-portal.desy.de/sapi/slurm/v0.0.38/job/submit' -d "$payload"

Be aware: curl will NOT transport your current environment to the batch-job. You have to define everything as part of the environment, or part of your batch-job. batch-jobs also won't read ~/.bashrc unless when using a login shell ('#!/bin/bash -l').

Job information

# all jobs
curl -s -H "Content-Type: application/json" -H X-SLURM-USER-NAME:$(whoami) -H X-SLURM-USER-TOKEN:$SLURM_TOKEN -X GET https://max-portal.desy.de/sapi/slurm/v0.0.38/jobs
# to extract information about individual jobs use json parser like jq:
curl -s -H "Content-Type: application/json" -H X-SLURM-USER-NAME:$(whoami) -H X-SLURM-USER-TOKEN:$SLURM_TOKEN -X GET https://max-portal.desy.de/sapi/slurm/v0.0.38/jobs > jobs.json
cat jobs.json | jq '.jobs[]| select(.job_id == 7752003)'
cat jobs.json | jq '.jobs[]| select(.user_name == "username")'  # replace username by a real username

# specific running or pending job
curl -s -H "Content-Type: application/json" -H X-SLURM-USER-NAME:$(whoami) -H X-SLURM-USER-TOKEN:$SLURM_TOKEN -X GET https://max-portal.desy.de/sapi/slurm/v0.0.38/job/7750418
# a job already removed from the queue will report an error
>       "error": "_handle_job_get: unknown job 7751560",

# retrieve information about finished jobs:
curl -H "Content-Type: application/json" -H X-SLURM-USER-NAME:$(whoami) -H X-SLURM-USER-TOKEN:$SLURM_TOKEN -X GET https://max-portal.desy.de/sapi/slurmdb/v0.0.38/job/7740309

Generating JWT for service providers

As a privileged user it's possible to create token for arbitrary users on maxwell. For services it might be more handy to generate tokens on the service host and not necessarily requiring priviliges. schedmd has provided a simple python script to generate token (with a very minor modification):

#!/usr/bin/env python3
import sys
import os
import pprint
import json
import time
from datetime import datetime, timedelta, timezone

from jwt import JWT
from jwt.jwa import HS256
from jwt.jwk import jwk_from_dict
from jwt.utils import b64decode,b64encode

if len(sys.argv) != 3:
    sys.exit("generate_jwt.py [user name] [expiration time (seconds)]");

jwt_key = os.environ.get('JWT_KEY', '/etc/slurm/jwt_hs256.key')

with open(jwt_key, "rb") as f:
    priv_key = f.read()

signing_key = jwk_from_dict({
    'kty': 'oct',
    'k': b64encode(priv_key)
})

message = {
    "exp": int(time.time() + int(sys.argv[2])),
    "iat": int(time.time()),
    "sun": sys.argv[1]
}

a = JWT()
compact_jws = a.encode(message, signing_key, alg='HS256')
print("SLURM_TOKEN={}".format(compact_jws))

python3 generate_jwt.py user 3600 would generate the token - if you have a copy of the "secret key". Naturally you won't. Please get in touch with maxwell.service@desy.de if you consider using such mechanisms...