mllab-amd01.desy.de is equipped with AMD EPYC 7351 (32-cores), 64GB ram and a Vega 10 Radeon Instinct MI25 GPU. This is a brief description of the setup for Ubuntu 18.04. A prior installation with Centos_7 was unsuccessful (GPU throws errors).
System installation
Following instructions under https://rocm.github.io/ROCmInstall.html, https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-install-basic.md,
wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add - echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list apt update apt install rocm-dkms reboot # check the GPU is properly recognized dmesg | grep -i kfd /opt/rocm/opencl/bin/x86_64/clinfo /opt/rocm/bin/rocminfo # add packages apt install rocm-dkms rocm-dev rocm-libs rocm-device-libs hsa-ext-rocr-dev hsakmt-roct-dev hsa-rocr-dev rocm-opencl \ rocm-opencl-dev rocm-utils rocm-profiler cxlactivitylogger miopen-hip miopengemm libnuma-dev apt install build-essential clang clang-format clang-tidy cmake cmake-qt-gui g++-multilib libunwind-dev libfftw3-dev \ libelf-dev libncurses5-dev libpthread-stubs0-dev gfortran libboost-program-options-dev libssl-dev libboost-dev \ libboost-system-dev libboost-filesystem-dev rpm apt-utils pkg-config apt install python-numpy python-dev python-wheel python-mock python-future python-pip python-yaml python-setuptools apt install python3-numpy python3-dev python3-wheel python3-mock python3-future python3-pip python3-yaml python3-setuptools # users apparently need to be in group video to access the GPU usermod -a -G video ...
Samples in user space
https://gpuopen.com/rocm-tensorflow-1-8-release/ has some basic tensorflow examples for using the AMD GPU. The system installation instructions on the page are outdated.
# check gpu access /opt/rocm/bin/rocminfo # check opencl /opt/rocm/opencl/bin/x86_64/clinfo wget https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cpp wget https://raw.githubusercontent.com/bgaster/opencl-book-samples/master/src/Chapter_2/HelloWorld/HelloWorld.cl g++ -I /opt/rocm/opencl/include/ ./HelloWorld.cpp -o HelloWorld -L/opt/rocm/opencl/lib/x86_64 -lOpenCL ./HelloWorld # tensorflow pip3 install --user tensorflow-rocm git clone https://github.com/tensorflow/models.git cd ~/models/tutorials/image/imagenet python3 classify_image.py
Links
- ROCm Platform Installation Guide for Linux
- Hardware to Play ROCm
- Installation troubleshooting
- https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/blob/develop-upstream/rocm_docs/tensorflow-install-basic.md
- AMD ROCm GPU support for TensorFlow
- Preparing a machine to run with ROCm and docker
- ROCM Tensorflow Docker Container
- Tensorflow on AMD GPU: some examples to test
Related articles