Page tree

P100 on E5-2640

P100 on Silver-4110

V100
Hardware
Hostmax-wng005

max-p3ag028


CPU

E5-2640 v4 @ 2.40GHz

cpu MHz : 1202.343

cache size : 25600 KB

Silver 4114 CPU @ 2.20GHz

cpu MHz : 800.000

cache size : 14080 KB


Memory256GB

768GB


GPU

P100-PCIE-16GB

P100-PCIE-16GB


BUS & Numa nodebus:0x02 numa: 2

bus:0x86 numa: 1


CUDA Tests
Bandwidth (MB/s)

Host→ Device      11709

Device→Host       12849

Device→Device 500636

12049

12863

500300


p2pBandwidthLatencyTest (GB/s)

UNI P2P Disabled 346

UNI P2P Enabled 347

BI P2P Disabled 358

BI P2P Enabled 357

504

504

512

513


convolutionFFT2D (Mpix/s)

built-in R2C / C2R   6088

custom R2C / C2R   6144

updated custom R2C / C2R   6069

6088

6107

7812


simpleMultiCopy   (GB/s)

Host→ Device      11.9

Device→Host       12.7

Kernel    417.1

Serialized exec      10.6

4 Streams      19.4

12.2

12.7

1248.0

12.0

20.6


matrixMul (GFlop/s)420.01766
matrixMulCUBLAS (GFlop/s)2321.7

5579


Benchmarks
shmembench

using 32bit operations :   1777.72 GB/sec

using 64bit operations :   1815.44 GB/sec

using 128bit operations :   1816.02 GB/sec

7614.55 GB/sec

7890.97 GB/sec

7662.91 GB/sec


constbench

using 32bit operations :     650.68 GB/sec

using 64bit operations :  2247.59 GB/sec

using 128bit operations :  2402.28 GB/sec


2778.73 GB/sec

9522.36 GB/sec

10218.13 GB/sec


cachebench

Read only accesses:

int1:   1000.92 GB/sec

int2:   2106.63 GB/sec

int4:   2379.41 GB/sec

max:  2379.41 GB/sec

Read-write accesses:

int1:   2229.74 GB/sec

int2:   2210.64 GB/sec

int4:   2127.50 GB/sec

max:   2229.74 GB/sec


int1:   2121.66 GB/sec

int2:   2310.82 GB/sec

int4:   2379.64 GB/sec

max:   2379.64 GB/sec


int1:   2230.04 GB/sec

int2:   2211.12 GB/sec

int4:   2126.41 GB/sec

max:   2230.04 GB/sec


gpu-burn (Gflops/s)7990

7990


Others
Tensorflow Resnet 50images/sec: 214.9images/sec: 210.41




  • No labels