Maxwell : Infiniband tests

Disclaimer: very naive approach. Can certainly be done much better (for example with real-life tests with asapo). Might nevertheless give a very rough idea.

Summary

  • IPOIB bandwidth is rather susceptible to load on receiver, aggregated loss is about 30%
  • with all physical cores busy the bandwidth is about 40Gb/s and only when using multiple streams (8-16)
    • 5Gib/s is the most to expect under such conditions for server/receiver with 100Gb/s HDR connections
    • a larger number of concurrent streams doesn't help (at best)
  • IPOIB bandwidth varies a lot and is also quite susceptible to "unrelated" cross-traffic
  • RMDA speed seems largely unaffected by CPU load or other factors

Configuration 1

3 hosts on 2 different switches and different capabilities


senderreceiver 1receiver 2
hostnamemax-wn113max-wn112max-wn064/65
CPU1dual AMD EPYC 75F3 32-coredual AMD EPYC 75F3 32-coredual AMD EPYC 7402 24-Core
Cores /Threads264 (128HT)64 (128HT)48 (96HT)
IB devices3ConnectX-6ConnectX-6ConnectX-6
IB Speed4100 Gb/sec 100 Gb/sec 100 Gb/sec 
PCI-E5PCI-E Gen4PCI-E Gen4PCI-E Gen3
IPOIB6max-wn113-ibmax-wn112-ibmax-wn064-ib
IPOIB MTU6204420442044
IB switch max-ib-l308 max-ib-l308max-ib-l303

1: lshw -C cpu   2: lscpu | egrep 'Model name|Socket|Thread|NUMA|CPU\(s\)'    3: lspci  | grep -i mell

4: ibstatus  5: as root: /usr/sbin/dmidecode | grep PCI, /usr/sbin/lspci -vv | grep -E 'PCI bridge|LnkCap'

6: ifconfig ib

Configuration 2

3 hosts on 2 different switches and 9dentical capabilities


senderreceiver 1receiver 2
hostnamemax-wng056max-wng058max-wng060
CPUAMD EPYC 7543 32-CoreAMD EPYC 7543 32-CoreAMD EPYC 7543 32-Core
Cores /Threads64 (128HT)64 (128HT)96 (192HT)
IB devicesConnectX-6ConnectX-6ConnectX-6
IB Speed100 Gb/sec 100 Gb/sec 100 Gb/sec 
PCI-EPCI-E Gen4PCI-E Gen4PCI-E Gen4
IPOIBmax-wng056-ibmax-wng058-ibmax-wn060-ib
IPOIB MTU204420442044
IB switch max-ib-l308 max-ib-l308max-ib-l306

Testing TCP bandwidth with IPOIB

test is simply using iperf:

  • start iperf on server: iperf -s
  • start iperf on client: iperf -c max-wn112-ib -t 60 -i 5 -f g -P <number of threads)
host / number of threadsP=1  Gbit/s2481632
max-wn11224.123.3  (§)42.2

90.5 (%)

83.7

66.1

max-wn064

19.1

25.3  (§)

15.9 (§)21.3 (§)23.826.3

(§) varies a lot. Bandwidth sometime not even 30% of value listed

(%) connect failed: Operation now in progress

Remarks: cpu load on sender is low. cpu load on receiver is 1.0*number_of_threads

Do the same test with max-wn065 as sender and max-wn064 as receiver. These two are sitting on the same switch like max-wn112/3, but have PCI-Gen3 and less powerful CPUs

host / number of threadsP=124816
max-wn06418.615.916.318.216.2

Numbers are very volatile and can easily vary by a factor of two between different runs of exactly identical commands.

Do the same test with max-wng056 as sender and max-wng058/60 as receiver

Identical to configuration 2, just different hosts (due to availability)

host / number of threadsP=12481632
max-wng05821.426.233.671.081.477.0
max-wng06014.026.141.580.283.977.4

Variability of IPOIB bandwidth for 50 consecutive runs with 8 resp. 1 concurrent threads

CPU load on receiver

Test the behavior when the receiver is under some stress. For the fast receiver (max-wn112) the loss is about 20% for moderate stress, 46% for very high stress.

  • set number of iperf threads to 8
  • create load with stress-ng --cpu <ncores>

load=0load=8load=16load=32load=64load=# cores
max-wn11290.5

 78.8 (%)

70.0

73.0 (%)

72.8

48.4
max-wn06421.327.919.325.7  (%)

25.2

18.7

(%) connect failed: Operation now in progress

Concurrent sends from 4 different hosts

Test run with 4 senders with 2 concurrent threads each, 1 receiver listening on 4 different ports, so it's fairly equivalent to 8 concurrent threads. Behavior is much more smooth. Load on the receiver reduces bandwidth by about 30%. 

load 0: aggregated average bandwidth: 64.4 Gbit/s

load 64: aggregated average bandwidth: 43.6 Gbit/s

load 128: aggregated average bandwidth: 47.3 Gbit/s

Testing bandwidth with RDMA

test is simply using iperf to "stream data".

  • start receiver: ib_write_bw -F --report_gbits -a -d mlx5_0
  • start sender: ib_write_bw -F --report_gbits -a <receiver>

load=0load=8load=16load=32load=64load=# cores
max-wn11298.7498.7598.7298.7498.5398.60
max-wn06498.7498.7698.6598.7598.7498.75

The bandwidth is always close to the limit and doesn't seem to be affected by CPU capabilities, or load on the receiver, or location in the IB network.

Attachments: