The information on this page is deprecated and will probably don't work as expected. It maybe a starting point for your own solution.
Batch job
Prepare job script
job.sh#!/bin/bash #SBATCH --ntasks=64 #SBATCH --cpus-per-task=1 #SBATCH --time=00:01:00 # Maximum time request #SBATCH --partition=all # start-up docker cluster, we use -u to pull new image from repository if necessary dockercluster -u centos_mpi # run an mpi command in docker cluster dockerexec mpirun -np 64 hostname
Run sbatch command
$ sbatch job.sh
Batch job - multiple Docker images
Prepare job script
job.sh#!/bin/bash #SBATCH --ntasks=64 #SBATCH --cpus-per-task=1 #SBATCH --time=00:01:00 # Maximum time request #SBATCH --partition=all # start-up first docker cluster dockercluster -n cluster1 centos_mpi # start-up second docker cluster (we need to use another port here) dockercluster -n cluster2 -p 2023 centos_mpi_benchmarks # run an mpi command in the first cluster DOCKER_CONT_NAME=test1 dockerexec mpirun -np 64 hostname # run an mpi command in the second cluster DOCKER_CONT_NAME=test2 dockerexec mpirun -np 2 mpi_bandwidth # stop clusters (not really necessary unless you want to reuse cluster name in the same script) dockercluster -n cluster1 -s dockercluster -n cluster2 -s
- Run sbatch command
Interactive job
Actually, it is not recommended to run MPI interactive jobs, but with some effort you can do this
Allocate resources
$ salloc -n 64 -p all salloc: Granted job allocation 3708 salloc: Waiting for resource configuration salloc: Nodes max-wna[004-005] are ready for job
Start Docker cluster. Provide a name for it or remember SLURM job number
$ dockercluster -u centos_mpi or $ dockercluster -u -n test centos_mpi
Login to one of the allocated nodes
$ ssh max-wna004
Run you application, set SLURM_JOB_ID or DOCKER_CONT_NAME if you used -n parameter
$ SLURM_JOB_ID=3708 dockerexec mpurun hostname or $ DOCKER_CONT_NAME=test dockerexec mpirun hostname
With some scripting you can avoid remembering numbers and logging to a node (skip steps 3-4), but would not it be better to switch to batch job now?
$ ssh `scontrol show hostname $SLURM_NODELIST | head -1` \ SLURM_JOB_ID=$SLURM_JOB_ID dockerexec mpirun hostname