A separate slurm instance has been created to support single or few-core jobs. The slurm commands are almost identical to those described for standard full-node jobs, except that you need to specify the slurm instance:
max-wgse002:~$ sinfo -M solaris # or sinfo --cluster=solaris CLUSTER: solaris PARTITION AVAIL TIMELIMIT NODES STATE NODELIST solcpu* up 7-00:00:00 5 idle max-wn[008-012]
The slurm instance - named solaris - contains a single partition - named solcpu - with a handful of old nodes:
max-wgse002:~$ sinfo --cluster=solaris -o '%n %f' CLUSTER: solaris HOSTNAMES AVAIL_FEATURES max-wn008 INTEL,V4,E5-2640,256G max-wn009 INTEL,V4,E5-2640,256G max-wn010 INTEL,V4,E5-2640,256G max-wn011 INTEL,V4,E5-2640,256G max-wn012 INTEL,V4,E5-2640,256G
Job configuration
The solaris instance supports allocation of specific number of cores, and specification of memory. This means, that you have to set sensible limits. The node will otherwise either be poorly utilized, or your jobs terminated once exceeding the limits.
The default memory allocated to a job is 4GB.
Example 1:
Allocate 4 cores:
#!/bin/bash #SBATCH --cluster=solaris #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --time=0-00:10:00 unset LD_PRELOAD np=$(nproc) echo "Cores available: $np" srun -n $np hostname # Output: Cores available: 4 max-wn008.desy.de max-wn008.desy.de max-wn008.desy.de max-wn008.desy.de
Example 2:
Allocate 4 cores and try to use 6 cores:
#!/bin/bash #SBATCH --cluster=solaris #SBATCH --ntasks=1 #SBATCH --cpus-per-task=4 #SBATCH --mem=4G #SBATCH --time=0-00:10:00 unset LD_PRELOAD np=$(nproc) echo "Cores available: $np" srun -n 6 hostname # Output: Cores available: 4 srun: error: Unable to create step for job 51: More processors requested than permitted
Example 3:
Allocate 4GB of memory and try to use 5GB
#!/bin/bash #SBATCH --cluster=solaris #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=4G #SBATCH --time=0-00:10:00 unset LD_PRELOAD np=$(nproc) echo "Cores available: $np" # try to allocate 5G of memory: timeout 10 cat /dev/zero | head -c 5G | tail # Output: /var/spool/slurmd/job00050/slurm_script: line 17: 24886 Broken pipe timeout 10 cat /dev/zero 24887 | head -c 5G 24888 Killed | tail slurmstepd: error: Detected 1 oom-kill event(s) in StepId=50.batch. Some of your processes may have been killed by the cgroup out-of-memory handler. # Note: the job state will in this case be OUT_OF_MEMORY
Job information
the squeue, sinfo, sacct ... commands work all as usual, just that you need to add --cluster=solaris. So to see your job it's
# squeue squeue -u $USER -M solaris # or squeue --user=$USER --cluster=solaris # sacct sacct -M solaris # or sacct --cluster=solaris # or sacct -L # for both slurm instances (maxwell,solaris)