Using GPUs

Overview

Teaching: 2 min
Exercises: 25 min
Questions
  • What modifications to the benchmarks are required to utilize GPUs?

Objectives
  • Demonstrate methods for using GPUs with HPL and HPCG.

Using GPUs is more difficult than CPU-only codes because separate compilers are needed, and because computations run on the GPU are “offloaded” from the CPU. An offloaded computation is a function that is packed up and sent to the GPU, executed there (usually while the CPU does something else), and then eventually waited on by the CPU.

Importantly for this exercise, the standard HPL code distributed by netlib does not include support for running computations on GPUs. Instead, hardware vendors have supplied these themselves. NVIDIA offers custom programs that run HPL through its (free) developer program. More recently, they have also created a container running HPL and HPCG.

Rather than work with these, I suggest trying the free alternative HPL-GPU or (older) HPL-CUDA. The second was easier to compile. In addition to following the instructions, you should also set the TOPdir variable to the top directory.

To make the most performant binary, it is important to enable (or at least check) what hardware-optimizations exist. For CPU-s it’s well-known that you can check the processor options from /dev/cpuinfo. For GPUs, the most important property is the compute capability. A quick reference chart is here.

Key Points

  • Using GPUs requires modifications to the standard HPL and HPCG programs.