HPL and HPCG: Glossary

Key Points

Introduction
  • Linear systems in both dense and sparse form are a universal theme in scientific computing.

  • Dense and sparse matrices have different optimal algorithms.

Computational Kernels
  • Most applications are memory-bound, which complicates parallelization.

  • Compute-bound applications depend on peak theoretical flops.

  • Getting good performance from parallel solvers is hard.

BLAS
  • Linking with vendor-optimized libraries is a pain in the neck.

  • Standard BLAS/LAPACK doesn’t use co-processors.

HPL
  • HPL requires tuning matrix and tile sizes to achieve peak performance.

  • Weak scaling requires very large problem sizes for large supercomputers.

HPCG
  • HPCG requires tuning job launch configurations (like NUMA) to achieve peak communication bandwidth.

  • Few-node performance should be an indicator of full-scale performance.

Summary
  • Cluster configuration has reproducibility challenges.

  • Spend extra time to plan well-defined performance tests.

Using GPUs
  • Using GPUs requires modifications to the standard HPL and HPCG programs.

Using Infiniband
  • Infiniband networks do not carry standard ethernet traffic, requiring special .

Scripting OpenStack

Glossary

References