Scaling over nodes for ompi 4.0.4 vs compat-openmpi16
CPU | Cores | ompi | #nodes | ratio (nodes=1) | ZSTOP | #runs | ||
---|---|---|---|---|---|---|---|---|
1 | 2 | 4 | 4.0.4/1.6 | |||||
EPYC 7402 | 48/96/48 | 4.0.4 | 444 | 312 | 258 | 1.32 | 1.0 | 3 |
1.6 | 588 | 475 | 403 | |||||
EPYC 7642 | 96/192/96 | 4.0.4 | 392 | 339 | - | 1.53 | 1.0 | 3 |
1.6 | 599 | - | - | |||||
EPYC 7542 | 64/128/64 | 4.0.4 | 375 | 286 | 273 | 1.49 | 1.0 | 3 |
1.6 | 560 | - | - | |||||
EPYC 7F52 | 32/64/32 | 4.0.4 | 454 | - | - | 1.24 | 1.0 | 3 |
1.6 | 564 | - | - | |||||
EPYC 7H12 | 128/256/128 | 4.0.4 | 430 | - | - | 1.32 | 1.0 | 3 |
1.6 | 568 | - | - | |||||
Gold 6140 | 36/72/36 | 4.0.4 | 681 | 460 | 348 | 1.24 | 1.0 | 3 |
1.6 | 845 | - | - | |||||
Gold 6240 | 36/72/36 | 4.0.4 | 632 | 412 | 302 | 1.25 | 1.0 | 3 |
1.6 | 788 | - | - |
Remarks:
- Cores: phyiscal/physical+logical/cores/used
- ompi 4.0.4: mpirun -N <cores> --mca pm ucx pastra-gfortran9_3-openmpi4_0_3 ASTRA_Benchmark.in
- ompi 1.6: -npernode <cores> --mca btl_openib_device_param_files mca-btl-openib-device-params.ini ASTRA_Benchmark.in
- ratio: runtime for ompi 1.6 vs ompi 4.0.4 for single node run
Scaling over cores for ompi 4.0.4 vs compat-openmpi16
CPU | ompi | Cores | Nodes | runtime (s) for number of cores used (ZSTOP=1.0) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
8 | 12 | 16 | 24 | 32 | 36 | 48 | 64 | 96 | 128 | 192 | 256 | ||||
EPYC 7F52 | 32/64 | 1 | 1569 | 1019 | 794 | 564 | 454 | - | 575 | 483 | - | - | - | - | |
- | 1388 | 1027 | 746 | 550 | - | 638 | 576 | - | - | - | - | ||||
EPYC 7402 | 4.0.4 | 48/96 | 1 | 1989 | 1388 | 1032 | 683 | 564 | - | 444 | 591 | 585 | - | - | - |
1.6 | - | 1617 | 1322 | 1322 | 744 | - | 540 | 673 | 658 | - | - | - | |||
EPYC 7542 | 64/128 | 1 | 1749 | 1219 | 919 | 667 | 525 | - | 416 | 380 | 528 | 579 | - | - | |
- | 1637 | 1257 | 895 | 729 | - | 559 | 564 | 556 | 668 | - | - | ||||
EPYC 7642 | 4.0.4 | 96/192 | 1 | 2179 | 1478 | 1122 | 752 | 589 | - | 433 | 396 | 393 | - | 723 | - |
1.6 | - | - | - | 944 | 827 | - | 612 | - | 586 | - | - | - | |||
EPYC 7H12 | 128/256 | 1 | 2258 | 1495 | 1146 | 803 | 621 | - | 452 | 393 | 374 | 429 | - | 1018 | |
- | 1747 | 1386 | 930 | 741 | - | 511 | 505 | 499 | 594 | - | - | ||||
Gold 6140 | 36/72 | 1 | 2107 | 1482 | 1147 | 827 | 707 | 685 | 972 | 884 | |||||
Gold 6240 | 36/72 | 1 | 1947 | 1364 | 1050 | 776 | 677 | 655 | 888 | 825 |
Remarks:
- Cores: phyiscal/physical+logical/cores/used
- Highlighted: number of cores used == number of physical cores
Concurrent processes
CPU | ompi | Cores | runtime (s) per job for concurrent processes. #concurrent*#threads. cores=#physical_cores/#concurrent | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2*1 | 2*2 | 4*1 | 4*2 | 8*1 | 8*2 | 12*1 | 12*2 | 16*1 | 16*2 | ||||
EPYC 7402 | 4.0.4 | 48/96 | 392 | 380 | 375 | 304 | 368 | 275 | 363 | 265 | 431 | 262 | |
EPYC 7642 | 4.0.4 | 96/192 | 289 | 410 | 231 | 241 | 217 | 184 | 210 | 169 | 162 | 206 | |
EPYC 7542 | 4.0.4 | 64/128 | 319 | 327 | 283 | 239 | 274 | 205 | 371 | 402 | 373 | 192 | |
EPYC 7F52 | 4.0.4 | 32/64 | 401 | 368 | 404 | 325 | 465 | 322 | - | 460 | - | - | |
EPYC 7H12 | 4.0.4 | 128/256 | 257 | 421 | 186 | 238 | 159 | 158 | 159 | 169 | 153 | 125 | |
Gold 6140 | 4.0.4 | 36/72 | 630 | 644 | 714 | 543 | 579 | 710 | 740 | 479 | - | 507 | |
Gold 6240 | 4.0.4 | 36/72 | 552 | 581 | 648 | 488 | 519 | 638 | 665 | 434 | - | 460 |
Remarks:
- Cores: physical/physical+logical cores
- For more than 8 concurrent mpi-jobs runs occasionally get stuck. Memory might play a rolle for #concurrent>8.
- mpi-jobs don't always finish at the same time. There can be delay of up to 60s between first and last thread to finish (which is however just 2% of the runtime)
- Runtime: for example on EPYC 7542 there are 64 physical cores.
- For 2*1 there are 2 concurrent processes, each using 64/2=32 core, so in total 64 cores which is the number of physical cores
- For 2*2 there are 2 concurrent processes, each using 64/2*2=64 cores, so in total 128 cores which is the total number of cores
- Runtime is the runtime per job, so the total time (all mpi-processes finished) divided by the number of concurrent processes