Maxwell : Gromacs benchmark

This is simple test run for gromacs. All files can be found on Maxwell under /beegfs/desy/group/it/Benchmarks/gromacs:

  • submit.sh: sample to submit single node jobs for different CPUs
  • benchmark.gromacs.sh: gromacs sample script

Running gromacs on a single node

Comparison of fastest execution (shorter is better)Scaling behavior


Relative speed compared to AMD EPYC at 96 cores


Note: the runtime of a little more than 3 seconds is too short to provide very good estimates. However, Intel Xeon Gold 6240 is clearly the fastest, followed by AMD EPYC 7402. In general, Pelegant performs best using only physical cores (i.e. half of available cores).


cores / nodecores used at best performance
AMD EPYC 74029696
Intel Xeon E5-2640 V33216
Intel Xeon E5-2640 V44040
Intel Xeon E5-2698 V364na
Intel Xeon E5-2698 V48040
Intel Xeon Gold 61407236
Intel Xeon Gold 622648na
Intel Xeon Gold 62407236
Intel Xeon Silver 41144020

Running an gromacs on 2 nodes

Due to openmpi's problem with connect-X6 IB HCAs, gromacs had to recompiled against openmpi 4.0.3. This also involves enabling of AVX instructions which is particularly beneficial for INTEL cpus. Number are hence not exactly comparable to single-node stats.

Comparison of fastest execution (shorter is better)Scaling behavior



Relative speed compared to AMD EPYC at 96 cores. log scale



cores / nodecores used at best performancecores used / node
AMD EPYC 7402969648
Intel Xeon E5-2640 V3323216
Intel Xeon E5-2640 V4407236
Intel Xeon E5-2698 V3646432
Intel Xeon E5-2698 V48014472
Intel Xeon Gold 6140727236
Intel Xeon Gold 6226484020
Intel Xeon Gold 6240727236
Intel Xeon Silver 4114402040

Tuning excercise

AMD performance on 2 nodes was originally a bit poor. It turned out that the setup of the AMD nodes was not optimal, using conservative governor and C2-states:

  • Elapsed time with governor conservative, C2-state enabled: 59.11s
  • Elapsed time with governor performance, C2-state enabled: 47.63s
  • Elapsed time with governor performance, C2-state disabled: 44.11s