Maxwell : Astra benchmarks

Scaling over nodes for ompi 4.0.4 vs compat-openmpi16

CPUCoresompi

#nodes

ratio (nodes=1)ZSTOP#runs



1244.0.4/1.6

EPYC 740248/96/484.0.44443122581.321.03


1.6588475403


EPYC 764296/192/964.0.4392339-1.531.03


1.6599--


EPYC 754264/128/644.0.43752862731.491.03


1.6560--


EPYC 7F5232/64/324.0.4454--1.241.03


1.6564--


EPYC 7H12128/256/1284.0.4430--1.321.03


1.6568--


Gold 614036/72/364.0.46814603481.241.03


1.6845--


Gold 624036/72/364.0.46324123021.251.03


1.6788--


Remarks:

  • Cores: phyiscal/physical+logical/cores/used
  • ompi 4.0.4: mpirun  -N <cores> --mca pm ucx pastra-gfortran9_3-openmpi4_0_3 ASTRA_Benchmark.in
  • ompi 1.6: -npernode  <cores> --mca btl_openib_device_param_files mca-btl-openib-device-params.ini ASTRA_Benchmark.in
  • ratio: runtime for ompi 1.6 vs ompi 4.0.4 for single node run

Scaling over cores for ompi 4.0.4 vs compat-openmpi16

CPUompiCoresNodesruntime (s) for number of cores used (ZSTOP=1.0)




81216243236486496128192256
EPYC 7F52
32/64115691019794564454-575483----




-13881027746550-638576----
EPYC 74024.0.448/961198913881032683564-444591585---

1.6

-161713221322744-540673658---
EPYC 7542
64/128117491219919667525-416380528579--




-16371257895729-559564556668--
EPYC 7642

4.0.4

96/1921217914781122752589-433396393-723-

1.6

---944827-612-586---
EPYC 7H12
128/2561225814951146803621-452393374429-1018




-17471386930741-511505499594--
Gold 6140
36/721210714821147827707685972884



Gold 6240
36/721194713641050776677655888825



Remarks:

  • Cores: phyiscal/physical+logical/cores/used
  • Highlighted: number of cores used == number of physical cores

Concurrent processes

CPUompiCoresruntime (s) per job for concurrent processes. #concurrent*#threads. cores=#physical_cores/#concurrent



2*12*24*14*28*18*2
12*112*216*116*2
EPYC 74024.0.448/96392380375304368275363265431262
EPYC 76424.0.496/192289410231241217184210169162206
EPYC 75424.0.464/128319327283239274205371402373192
EPYC 7F524.0.432/64401368404325465322-460--
EPYC 7H124.0.4128/256257421186238159158159169153125
Gold 61404.0.436/72630644714543579710740479-507
Gold 62404.0.436/72552581648488519638665434-460

Remarks:

  • Cores: physical/physical+logical cores
  • For more than 8 concurrent mpi-jobs runs occasionally get stuck. Memory might play a rolle for #concurrent>8.
  • mpi-jobs don't always finish at the same time. There can be delay of up to 60s between first and last thread to finish (which is however just 2% of the runtime)
  • Runtime: for example on EPYC 7542 there are 64 physical cores.
    • For 2*1 there are 2 concurrent processes, each using 64/2=32 core, so in total 64 cores which is the number of physical cores
    • For 2*2 there are 2 concurrent processes, each using 64/2*2=64 cores, so in total 128 cores which is the total number of cores
    • Runtime is the runtime per job, so the total time (all mpi-processes finished) divided by the number of concurrent processes