HPC coding

Modified

November 14, 2024

This section outlines useful best practices to consider when coding and writing new software on an HPC.

Coding practices

Code coverage, testing, and continuous integration

Testing is a critical aspect of coding that should be performed regularly throughout a project. Here are some primary types of tests to consider:

  • Regression Test: Given a specific input, the code is tested to reproduce the expected output.
  • Unit Test: Tests the smallest units of the software (e.g., individual functions) to identify bugs, particularly under extreme input and output conditions.
  • Continuous Integration: A suite of tests runs automatically every time the software code is updated, helping to catch bugs before anyone uses the code.

Additional aspects to test include the performance and scalability of the code, usability, and response to all intended types of input data.

While unit and regression tests are valuable, they may become unfeasible as the codebase grows in size and complexity. Therefore, it’s advisable to use continuous integration and implement simple yet representative tests that cover the entire codebase, enabling the early detection of bugs before end-users encounter them. Code coverage tools are available for various programming languages and can also be used for testing code deployed on GitHub version control.

Link Description
pyTest Package to test python code
Cmake Tool to test both C, C++ and Fortran code
Travis CI Tool for continuous integration in most of the used programming languages. Works on Git version control.
covr Package to test coverage reports for R

Code styling

An essential aspect of code is its readability by others. To achieve this, a clean and consistent coding style should be employed throughout the project. Some languages have established preferred coding styles, which can often be enforced in certain IDEs (e.g. Visual Studio Code). While you can adopt your own coding style, it should prioritize readability and be consistently applied across the entire project. Here are some general code styling tools:

Tool & Link Description
styleguide Google guide for coding styles of the major programming languages
awesome guidelines A guide to coding styles covering also documentations, tools and development environments

Click on the callout below if you want to learn about language-specific tools for code formatting.

Language Formatted tools
Python Black, yapf, read intro Pythonic rules
R formatR, read post R style
Snakemake Snakefmt
Bash/Shell ShellIndent
C/C++ GNUIndent, GreatCode
Perl PerlTidy
Javascript beautifier
MATLAB/Octove MISS_HIT
Java Google Java format, JIndent
CSS CSSTidy
HTML Tidy
Tip

Quick Tip: If you use VS Code as your main text editor, you can enable automatic code formatting in your browser. Go to your preferences page in JSON mode and add:

json
"editor.formatOnSave": true

Packaging a coding project

When developing software that includes multiple newly implemented functions, organizing these functions into a package can be beneficial for reuse and easy sharing. This approach is particularly straightforward for coding projects in both Python and R, allowing developers to streamline their workflow and enhance collaboration.

Link Description
pyPA python packaging user guide
R package development Develop an R package using Rstudio

Code documentation

When developing software, it’s essential to create documentation that clearly explains the usage of each code element. For software packages, there are tools available that can automatically generate documentation by utilizing function declarations and any accompanying text included as strings within the code.

Tool & Link Description
MkDocs A generator for static webpages, with design and themes targeted to documentation pages, but also other type of websites. This website is itself made with MkDocs.
Python - mkdocstrings Python handler to automatically generate documentation with MkDocs
pdoc3 A package who creates automatically the documentation for your coding projects. It is semi automatic (infers your dependencies, classes, … but adds a description based on your docstrings)
pdoc3 101 How to run pdoc to create an html documentation
R-Roxygen2 A package to generate R documentation - it can be used also with Rcpp
Sphinx Another tool to write documentation - it produces also printable outputs. Sphinx was first created to write the python language documentation. Even though it is a tool especially thought for python code, it can be used to generate static webpages for other projects.

Parallel programming

An HPC system can be used for both sequential and parallel programming. Sequential programming involves writing computer programs that execute one instruction at a time, following a logical sequence of steps. In contrast, parallel programming allows multiple instructions to run simultaneously.

While there is typically only one approach to writing sequential code, there are various methods for creating parallelized code. ## Parallel Programming

OpenMP (Multithreading)

One popular approach to parallel programming is through OpenMP, where developers write sequential code and identify specific sections that can be parallelized into threads using a fork-join mechanism. Each thread operates independently and has its own allocated memory, as illustrated in the figure below from ADMIN magazine.

OpenMP Diagram

If the execution times of threads vary, some may have to wait for others to complete when they need to be joined (e.g. to collect data), leading to inefficient use of execution time. It is the programmer’s responsibility to balance the distribution of threads to optimize performance.

Modern CPUs inherently support OpenMP, particularly multicore CPUs, which can run threads independently. OpenMP is available as an extension for programming languages such as C and Fortran, and is commonly used to parallelize for loops that create performance bottlenecks in software execution.

Link Description
Video Course A video course hosted by ARCHER UK, linked to the first lesson with access to all subsequent lessons.
OpenMP Starter Guide A beginner’s guide to OpenMP.
Wikitolearn OpenMP Course An OpenMP course available on Wikitolearn.
MIT OpenMP Course A comprehensive course from MIT that also covers MPI usage.

MPI (Message Passing Interface)

MPI facilitates the distribution of data among different processes that cannot otherwise access it. This is illustrated in the image below from LLNL.

MPI Diagram

Although MPI is often considered difficult to learn, this reputation stems from the explicit programming required for message passing.

Link Description
Video Course A video course hosted by ARCHER UK, linked to the first lesson with access to all subsequent lessons.
MPI Starter Guide A beginner’s guide to MPI.
PRACE Course A PRACE course on the MOCC platform, FutureLearn.

GPU Programming

GPUs (Graphics Processing Unit) serve as computing accelerators, significantly enhancing the performance of heavy linear algebra applications, such as deep learning. A GPU typically comprises numerous specialized processing units, enabling extreme parallelization of computer code, as shown in the figure below from astrocomputing.

GPU Computing

AMD and Nvidia are the two primary GPU manufacturers, with Nvidia maintaining a dominant position in the market for many years. Danish HPCs Type 1 and 2 feature various Nvidia graphic card models, while Type 5 (LUMI) includes the latest AMD Instinct cards. The distinction between AMD and Nvidia primarily lies in their programming dialects, necessitating specific coding for multithreading tailored to each GPU brand.

Nvidia CUDA

CUDA is a dialect of C++ that also offers various libraries for popular languages and frameworks (e.g., Python, PyTorch, MATLAB, etc.).

Link Description
Nvidia Developer Training Training resources for CUDA programming provided by Nvidia.
CUDA Book Archive An archive of books focused on CUDA programming.
Advanced CUDA Books A collection of advanced books for CUDA programming.
pyCUDA Resources for coding in CUDA using Python.

AMD HIP

HIP is a newly introduced dialect for AMD GPUs that can be compiled for both AMD and Nvidia hardware. The advantage of HIP is that it allows CUDA code to be converted to HIP code with minimal adjustments by the programmer.

The LUMI HPC consortium has organized courses focused on HIP programming and CUDA-to-HIP conversion. Check their page for upcoming courses.

Link Description
Video Introduction 1 An introductory video on HIP programming.
Video Introduction 2 A second introductory video on HIP programming.
AMD Programming Guide The official programming guide for HIP from AMD.