Page tree

  • Batch job

  1. Prepare job script

    job.sh
    #!/bin/bash
    #SBATCH --ntasks=64
    #SBATCH --cpus-per-task=1
    #SBATCH --time=00:01:00                  # Maximum time request
    #SBATCH --partition=all
    
    
    # start-up docker cluster, we use -u to pull new image from repository if necessary
    dockercluster -u centos_mpi
    
    
    # run an mpi command in docker cluster
    dockerexec mpirun -np 64 hostname
    
    
    
  2. Run sbatch command

    $ sbatch job.sh
    
    
  • Batch job - multiple Docker images

  1. Prepare job script

    job.sh
    #!/bin/bash
    #SBATCH --ntasks=64
    #SBATCH --cpus-per-task=1
    #SBATCH --time=00:01:00                  # Maximum time request
    #SBATCH --partition=all
    
    
    # start-up first docker cluster
    dockercluster -n cluster1 centos_mpi
    
    # start-up second docker cluster (we need to use another port here)
    dockercluster -n cluster2 -p 2023 centos_mpi_benchmarks
    
    # run an mpi command in the first cluster
    DOCKER_CONT_NAME=test1 dockerexec mpirun -np 64 hostname
    
    # run an mpi command in the second cluster
    DOCKER_CONT_NAME=test2 dockerexec mpirun -np 2 mpi_bandwidth
    
    
    # stop clusters (not really necessary unless you want to reuse cluster name in the same script)
    dockercluster -n cluster1 -s
    dockercluster -n cluster2 -s 
    
    
    
  2. Run sbatch command

 

  • Interactive job

Actually, it is not recommended to run MPI interactive jobs, but with some effort you can do this

  1. Allocate resources

    $ salloc -n 64 -p all
    salloc: Granted job allocation 3708
    salloc: Waiting for resource configuration
    salloc: Nodes max-wna[004-005] are ready for job
  2. Start Docker cluster. Provide a name for it or remember SLURM job number

    $ dockercluster -u centos_mpi
    or
    $ dockercluster -u -n test centos_mpi



  3. Login to one of the allocated nodes

    $ ssh max-wna004
  4. Run you application, set SLURM_JOB_ID or DOCKER_CONT_NAME if you used -n parameter

    $ SLURM_JOB_ID=3708 dockerexec mpurun hostname
    or
    $ DOCKER_CONT_NAME=test dockerexec mpirun hostname
    
    

With some scripting you can avoid  remembering numbers and logging to a node (skip steps 3-4), but would not it be better to switch to batch job now?

 

$ ssh `scontrol show hostname $SLURM_NODELIST | head -1` \ 
SLURM_JOB_ID=$SLURM_JOB_ID dockerexec mpirun hostname