Using Infiniband

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How does infiniband hardware differ from ethernet?

Objectives
  • Provide tools, diagnostics, and references on setting up infiniband communication.

Infiniband hardware devices were created to support high-performance computing applications by “short-cutting” around the TCP/IP network stack. In traditional messaging, small packets are sent between two end-points that perform a complicated connection handshake. This involves a lot of buffering and queue wait times, which slow things down. Infiniband hardware provides a remote-direct memory access mechanism for hosts to communicate with one another with fewer requirements around acknowledgments and message sizes.

This comes at a cost. Infiniband devices have their own network addressing mechanism distinct from (but similar to) ethernet MAC addresses. Also, they require special drivers, network configuration utilities, and API calls.

The API calls are simplest, since they are usually handled for you by an MPI library. Those MPI libraries mostly use either the ibverbs interface or else something like UCX built on top of ibverbs. You’ll run into those keywords as you install MPI.

Installing Infiniband

The other two steps are complicated. Here are some notes that could be helpful built by following the RedHat Infiniband Guide and rdmamojo.

# allow users to pin unlimited memory
sudo sh -c "cat >/etc/security/limits.d/rdma.conf" <<.
@users    soft    memlock     unlimited
@users    hard    memlock     unlimited
.
# add self to the users group
sudo usermod -a -G users $USER

# setup the IB manager
sudo yum install -y opensm libibverbs-utils infiniband-diags
sudo systemctl enable --now opensm

# Testing
ibstat # locate Port GUID-s
sudo ibswitches # list switches
sudo sminfo # list sm communication info

ibv_devices
ibv_devinfo mlx4_0

Next, test communication between two hosts. For example, 10.52.3.83 and 10.52.3.178.

On one side, run

ibv_rc_pingpong

On abother,

ibv_rc_pingpong 10.52.3.178

Installing MPI

Next, installing MPI using infiniband is easy using spack:

git clone --depth=1 https://github.com/spack/spack.git
spack install -v mpich netmod=ucx

create a hostfile listing nodes, 1 per line:

# mpi hostfile
10.52.3.83
10.52.3.178

Then run a test application enabling the ucx fabric

BIN=$BIN/bin/mpirun -n 48 -env UCX_NET_DEVICES=mlx4_0:1 -f hostfile hostname

Next, run HPL and check speed:

spack install -v hpl ^mpich netmod=ucx # doesn't seem to activate infiniband
spack install -v hpl ^openmpi fabrics=verbs

Remember, for each run, you should create a separately named input and run-script,

MPIRUN=/home/cc/spack/opt/spack/linux-centos8-haswell/gcc-8.3.1/openmpi-4.1.1-6ykqelh2aqoxk6bbmdp7o63xi33puppg/bin/mpirun
HPL=/home/cc/spack/opt/spack/linux-centos8-haswell/gcc-8.3.1/hpl-2.3-cce3avecwhxksw6eysaklvuipvb2cdo4/bin/xhpl

export btl_openib_allow_ib=true
$MPIRUN -n 48 --mca btl openib,self,vader --mca btl_openib_allow_ib true --hostfile ../hostfile $HPL | tee HPL.out

Key Points

  • Infiniband networks do not carry standard ethernet traffic, requiring special .