Installing TensorFlow 2.9.1 by source on a HPC machine: Pascal supercomputer

Prerequisite

  • spack installation
  • cuda installation
  • cudnn installation (from nvidia website)
  • go installation

Environment Load

# load the environment
module load gcc/8.3.1
module load cuda/11.6.1
module load python/3.8.2
spack load cudnn@8.4.0.27-11.6-x86_64
spack load go

 Install bazel

# Install bazel as bazelisk
spack load go
go install github.com/bazelbuild/bazelisk@latest
export PATH=$PATH:$(go env GOPATH)/bin

 Clone repository

git clone https://github.com/tensorflow/tensorflow.git
cd tensorflow
git checkout tags/v2.9.1

 Setup Python environment

python3 -m venv ./venv
source venv/bin/activate
pip install pip numpy wheel packaging
pip install keras_preprocessing --no-deps

 Configure Tensorflow

  • headers for cuda and cudnn: <Include all headers paths>
    • /usr/tce/packages/cuda/cuda-11.6.1/nvidia/include,/usr/tce/packages/cuda/cuda-11.6.1/nvidia/lib64,/usr/tce/packages/cuda/cuda-11.6.1/nvidia/,/usr/WS2/iopp/software/cudnn-linux-x86_64-8.4.0.27_cuda11.6/include,/usr/WS2/iopp/software/cudnn-linux-x86_64-8.4.0.27_cuda11.6/lib,/usr/WS2/iopp/software/cudnn-linux-x86_64-8.4.0.27_cuda11.6,
  • GCC path: /usr/tce/packages/gcc/gcc-8.3.1/bin/gcc 
  • Compute_capability: 3.5,5.0,6.0,7.0,8.0
./configure 

 Configure Certificates

export TMP=/tmp/$USER
STORE=~/security/cacerts
mkdir -p ~/security

# get certificates for required website using
keytool -printcert -sslserver <controller-host>:<controller-port>-rfc > cert_rfc.out

# split ceritficate into files and import using
keytool -import -file root_ca.pem -alias root_ca -keystore ${STORE}

 Compile Tensorflow

bazelisk --output_base=$iopp/bazel --server_javabase=$JAVA_HOME --host_jvm_args=-Djavax.net.ssl.trustStore=$STORE build --config=opt --config=cuda --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" //tensorflow/tools/pip_package:build_pip_package

 Create a Pip Package

./bazel-bin/tensorflow/tools/pip_package/build_pip_package ./pip/tf

 Install Pip Package

pip install ./pip/tf/tensorflow-*.whl

Comments