This is simple test run for gromacs. All files can be found on Maxwell under /beegfs/desy/group/it/Benchmarks/gromacs:
- submit.sh: sample to submit single node jobs for different CPUs
- benchmark.gromacs.sh: gromacs sample script
Running gromacs on a single node
Comparison of fastest execution (shorter is better) | Scaling behavior |
---|---|
Relative speed compared to AMD EPYC at 96 cores |
Note: the runtime of a little more than 3 seconds is too short to provide very good estimates. However, Intel Xeon Gold 6240 is clearly the fastest, followed by AMD EPYC 7402. In general, Pelegant performs best using only physical cores (i.e. half of available cores).
cores / node | cores used at best performance | |
---|---|---|
AMD EPYC 7402 | 96 | 96 |
Intel Xeon E5-2640 V3 | 32 | 16 |
Intel Xeon E5-2640 V4 | 40 | 40 |
Intel Xeon E5-2698 V3 | 64 | na |
Intel Xeon E5-2698 V4 | 80 | 40 |
Intel Xeon Gold 6140 | 72 | 36 |
Intel Xeon Gold 6226 | 48 | na |
Intel Xeon Gold 6240 | 72 | 36 |
Intel Xeon Silver 4114 | 40 | 20 |
Running an gromacs on 2 nodes
Due to openmpi's problem with connect-X6 IB HCAs, gromacs had to recompiled against openmpi 4.0.3. This also involves enabling of AVX instructions which is particularly beneficial for INTEL cpus. Number are hence not exactly comparable to single-node stats.
Comparison of fastest execution (shorter is better) | Scaling behavior |
---|---|
Relative speed compared to AMD EPYC at 96 cores. log scale |
cores / node | cores used at best performance | cores used / node | |
---|---|---|---|
AMD EPYC 7402 | 96 | 96 | 48 |
Intel Xeon E5-2640 V3 | 32 | 32 | 16 |
Intel Xeon E5-2640 V4 | 40 | 72 | 36 |
Intel Xeon E5-2698 V3 | 64 | 64 | 32 |
Intel Xeon E5-2698 V4 | 80 | 144 | 72 |
Intel Xeon Gold 6140 | 72 | 72 | 36 |
Intel Xeon Gold 6226 | 48 | 40 | 20 |
Intel Xeon Gold 6240 | 72 | 72 | 36 |
Intel Xeon Silver 4114 | 40 | 20 | 40 |
Tuning excercise
AMD performance on 2 nodes was originally a bit poor. It turned out that the setup of the AMD nodes was not optimal, using conservative governor and C2-states:
- Elapsed time with governor conservative, C2-state enabled: 59.11s
- Elapsed time with governor performance, C2-state enabled: 47.63s
- Elapsed time with governor performance, C2-state disabled: 44.11s
Attachments:
pelegant.elapsed.png (image/png)
pelegant.elapsed-relative.png (image/png)
benchmark.pelegant.modified.png (image/png)
pelegant.ext.elapsed.png (image/png)
pelegant.ext.elapsed-relative.png (image/png)
benchmark.gromacs.single-node.png (image/png)
gromacs.elapsed.single-node.png (image/png)
gromacs.elapsed-relative.single-node.png (image/png)
benchmark.gromacs.2-node.png (image/png)
gromacs.elapsed.2-node.png (image/png)
gromacs.elapsed-relative.2-node.png (image/png)
benchmark.gromacs.2-node.tuning.png (image/png)
benchmark.gromacs.2-node.tuning.png (image/png)