HPC coding
This section outlines useful best practices to consider when coding and writing new software on an HPC.
Coding practices
Code coverage, testing, and continuous integration
Testing is a critical aspect of coding that should be performed regularly throughout a project. Here are some primary types of tests to consider:
- Regression Test: Given a specific input, the code is tested to reproduce the expected output.
- Unit Test: Tests the smallest units of the software (e.g., individual functions) to identify bugs, particularly under extreme input and output conditions.
- Continuous Integration: A suite of tests runs automatically every time the software code is updated, helping to catch bugs before anyone uses the code.
Additional aspects to test include the performance and scalability of the code, usability, and response to all intended types of input data.
While unit and regression tests are valuable, they may become unfeasible as the codebase grows in size and complexity. Therefore, it’s advisable to use continuous integration and implement simple yet representative tests that cover the entire codebase, enabling the early detection of bugs before end-users encounter them. Code coverage tools are available for various programming languages and can also be used for testing code deployed on GitHub version control.
Link | Description |
---|---|
pyTest | Package to test python code |
Cmake | Tool to test both C , C++ and Fortran code |
Travis CI | Tool for continuous integration in most of the used programming languages. Works on Git version control. |
covr | Package to test coverage reports for R |
Code styling
An essential aspect of code is its readability by others. To achieve this, a clean and consistent coding style should be employed throughout the project. Some languages have established preferred coding styles, which can often be enforced in certain IDEs (e.g. Visual Studio Code). While you can adopt your own coding style, it should prioritize readability and be consistently applied across the entire project. Here are some general code styling tools:
Tool & Link | Description |
---|---|
styleguide | Google guide for coding styles of the major programming languages |
awesome guidelines | A guide to coding styles covering also documentations, tools and development environments |
Click on the callout below if you want to learn about language-specific tools for code formatting.
Language | Formatted tools |
---|---|
Python | Black, yapf, read intro Pythonic rules |
R | formatR, read post R style |
Snakemake | Snakefmt |
Bash/Shell | ShellIndent |
C/C++ | GNUIndent, GreatCode |
Perl | PerlTidy |
Javascript | beautifier |
MATLAB/Octove | MISS_HIT |
Java | Google Java format, JIndent |
CSS | CSSTidy |
HTML | Tidy |
Quick Tip: If you use VS Code as your main text editor, you can enable automatic code formatting in your browser. Go to your preferences page in JSON mode and add:
json
"editor.formatOnSave": true
Packaging a coding project
When developing software that includes multiple newly implemented functions, organizing these functions into a package can be beneficial for reuse and easy sharing. This approach is particularly straightforward for coding projects in both Python and R, allowing developers to streamline their workflow and enhance collaboration.
Link | Description |
---|---|
pyPA | python packaging user guide |
R package development | Develop an R package using Rstudio |
Code documentation
When developing software, it’s essential to create documentation that clearly explains the usage of each code element. For software packages, there are tools available that can automatically generate documentation by utilizing function declarations and any accompanying text included as strings within the code.
Tool & Link | Description |
---|---|
MkDocs | A generator for static webpages, with design and themes targeted to documentation pages, but also other type of websites. This website is itself made with MkDocs. |
Python - mkdocstrings | Python handler to automatically generate documentation with MkDocs |
pdoc3 | A package who creates automatically the documentation for your coding projects. It is semi automatic (infers your dependencies, classes, … but adds a description based on your docstrings) |
pdoc3 101 | How to run pdoc to create an html documentation |
R-Roxygen2 | A package to generate R documentation - it can be used also with Rcpp |
Sphinx | Another tool to write documentation - it produces also printable outputs. Sphinx was first created to write the python language documentation. Even though it is a tool especially thought for python code, it can be used to generate static webpages for other projects. |
Parallel programming
An HPC system can be used for both sequential and parallel programming. Sequential programming involves writing computer programs that execute one instruction at a time, following a logical sequence of steps. In contrast, parallel programming allows multiple instructions to run simultaneously.
While there is typically only one approach to writing sequential code, there are various methods for creating parallelized code. ## Parallel Programming
OpenMP (Multithreading)
One popular approach to parallel programming is through OpenMP, where developers write sequential code and identify specific sections that can be parallelized into threads using a fork-join mechanism. Each thread operates independently and has its own allocated memory, as illustrated in the figure below from ADMIN magazine.
If the execution times of threads vary, some may have to wait for others to complete when they need to be joined (e.g. to collect data), leading to inefficient use of execution time. It is the programmer’s responsibility to balance the distribution of threads to optimize performance.
Modern CPUs inherently support OpenMP, particularly multicore CPUs, which can run threads independently. OpenMP is available as an extension for programming languages such as C and Fortran, and is commonly used to parallelize for loops that create performance bottlenecks in software execution.
Link | Description |
---|---|
Video Course | A video course hosted by ARCHER UK, linked to the first lesson with access to all subsequent lessons. |
OpenMP Starter Guide | A beginner’s guide to OpenMP. |
Wikitolearn OpenMP Course | An OpenMP course available on Wikitolearn. |
MIT OpenMP Course | A comprehensive course from MIT that also covers MPI usage. |
MPI (Message Passing Interface)
MPI facilitates the distribution of data among different processes that cannot otherwise access it. This is illustrated in the image below from LLNL.
Although MPI is often considered difficult to learn, this reputation stems from the explicit programming required for message passing.
Link | Description |
---|---|
Video Course | A video course hosted by ARCHER UK, linked to the first lesson with access to all subsequent lessons. |
MPI Starter Guide | A beginner’s guide to MPI. |
PRACE Course | A PRACE course on the MOCC platform, FutureLearn. |
GPU Programming
GPUs (Graphics Processing Unit) serve as computing accelerators, significantly enhancing the performance of heavy linear algebra applications, such as deep learning. A GPU typically comprises numerous specialized processing units, enabling extreme parallelization of computer code, as shown in the figure below from astrocomputing.
AMD and Nvidia are the two primary GPU manufacturers, with Nvidia maintaining a dominant position in the market for many years. Danish HPCs Type 1 and 2 feature various Nvidia graphic card models, while Type 5 (LUMI) includes the latest AMD Instinct cards. The distinction between AMD and Nvidia primarily lies in their programming dialects, necessitating specific coding for multithreading tailored to each GPU brand.
Nvidia CUDA
CUDA is a dialect of C++ that also offers various libraries for popular languages and frameworks (e.g., Python, PyTorch, MATLAB, etc.).
Link | Description |
---|---|
Nvidia Developer Training | Training resources for CUDA programming provided by Nvidia. |
CUDA Book Archive | An archive of books focused on CUDA programming. |
Advanced CUDA Books | A collection of advanced books for CUDA programming. |
pyCUDA | Resources for coding in CUDA using Python. |
AMD HIP
HIP is a newly introduced dialect for AMD GPUs that can be compiled for both AMD and Nvidia hardware. The advantage of HIP is that it allows CUDA code to be converted to HIP code with minimal adjustments by the programmer.
The LUMI HPC consortium has organized courses focused on HIP programming and CUDA-to-HIP conversion. Check their page for upcoming courses.
Link | Description |
---|---|
Video Introduction 1 | An introductory video on HIP programming. |
Video Introduction 2 | A second introductory video on HIP programming. |
AMD Programming Guide | The official programming guide for HIP from AMD. |