Summary
Source: https://github.com/ochubar/SRW
License: Open Source https://github.com/ochubar/SRW/blob/master/COPYRIGHT.txt
Path: /software/oasys as part of the oasys installation
Documentation: https://wpg.readthedocs.io/en/latest/tutorials/2/Tutorial_case_2.html
Related: oasys
SRW (Synchrotron Radiation Workshop") is a physical optics computer code for calculation of detailed characteristics of Synchrotron Radiation (SR) generated by relativistic electrons in magnetic fields of arbitrary configuration and for simulation of the radiation wavefront propagation through optical systems of beamlines.
Using srw
srw is available as part of the oasys installation and initialied with a custom oasys-mpi module:
[max]% module load maxwell oasys-mpi [max]% mpiexec -np 16 --mca pml ucx python SRWLIB_Example12.py -m 100
To simplify batch-jobs there is a minimal batch-script available under /software/oasys/bin/srw.sh. Simple examples to run the script:
# this ensures that srw.sh in your PATH. # only helpful for salloc, for sbatch one needs to specify the full path - or use $(which srw.sh) - [max]% module load maxwell oasys-mpi # run srw on the "local" maxwell machine. # on max-display, max-wgs the script sets the default number of course to 4, but can be overridden: [max]% NP=2 srw.sh "SRWLIB_Example12.py -m 100" # salloc can used to run srw on batch-node, while still being kind of interactive, for example [max]% NP=48 salloc --partition=all --time=0-04:00:00 --constraint='EPYC&7402' --nodes=2 srw.sh "SRWLIB_Example12.py -m 100" # NP=48: use 48 cores per node. EPYC 7402 are equipped with 48 physical cores. # --partition=all ... options specified on the command-line override the ones specified in srw.sh (--partition=maxcpu --time?=0-04:00) # "SRWLIB_Example12.py -m 100" all parameters for srw have to specified in a single string, the quotes are required. # using sbatch all nodes are exclusive, so no need to specify the number of cores [max]% sbatch --partition=short --nodes=4 /software/oasys/bin/srw.sh "SRWLIB_Example12.py -m 10000" # for partitions with mixed hardware it's helpful to ensure all nodes used are identical, for example [max]% sbatch --partition=maxcpu,allcpu --nodes=4 --constraint="[Gold-6240|7402|E5-2640]" $(which srw.sh) "SRWLIB_Example12.py -m 10000"
Note: NP defines the number of cores pro node used by srw.sh. The defaults (see benchmarks below):
- For 1-4 nodes use all cores (physical+logical)
- For >4 nodes use only physical cores
- For interactive jobs use only 4 cores
The defaults can be overridden by setting NP=<your-favorite-number-of-cores>
Benchmarking
# submit jobs for 1,2,4,8,16 nodes # give each output file a unique name # --dependency=singleton ensures that only one job is running at a time [max]% for nodes in 1 2 4 8 16 ; do sbatch --dependency=singleton --partition=all --output=srw.nodes-$nodes.%j.out \ --nodes=$nodes --constraint=7402 /software/oasys/bin/srw.sh "SRWLIB_Example12.py -m 10000" done # check for status of jobs [max]% squeue -u $USER -n srw.sh # once all jobs are done collect some information [max]% for out in srw*.out ; do x=($(echo $out | tr '\-.' ' ')) echo "nodes: ${x[2]} time: $(sacct --noheader -X -j ${x[3]} --format=elapsed) jobid: ${x[3]}" done # sacct: # --noheader don't display header # -X don't show job-steps but only the entire job # --format=elapsed time used by job # output nodes: 1 time: 00:26:44 jobid: 8499227 nodes: 2 time: 00:12:56 jobid: 8499228 nodes: 4 time: 00:06:51 jobid: 8499229 nodes: 8 time: 00:04:44 jobid: 8499230
Benchmarking results
nodes | ||||||
---|---|---|---|---|---|---|
CPU-Type | cores used / available /physical | 1 | 2 | 4 | 8 | 16 |
AMD EPYC 7402 | 48 / 96 / 48 | 26:44 | 12:56 | 06:51 | 04:44 | 02:34 |
... use hyperthreaded cores | 96 / 96 / 48 | 18:53 | 10:01 | 06:52 | 03:40 | 03:31 |
AMD EPYC 7542 | 64 / 128 / 64 | 19:59 | 10:31 | 05:46 | 03:40 | 03:31 |
... use hyperthreaded cores | 128 / 128 / 64 | 14:30 | 08:05 | 05:11 | 04:33 | - |
Intel Gold-6240 | 36 / 72 / 36 | 43:58 | 19:53 | 10:40 | 06:31 | 04:02 |
... use hyperthreaded cores | 72 / 72 / 36 | 33:46 | 17:14 | 10:56 | 06:50 | 05:10 |
Intel E5-2640 | 20 / 40 / 20 | 83:24 | 41:14 | 20:59 | 10:04 | 05:38 |
... use hyperthreaded cores | 40 / 40 / 20 | 68:31 | 34:35 | 16:35 | * | 08:08 |
When running on more than 4 nodes, use of hyperthreaded cores was almost always slower than running on physical cores only. * tends to crash.