HPC intro

Modified

August 19, 2025

Course Overview

⏰ Total Time Estimation: 3 hours
📁 Supporting Materials:
👨‍💻 Target Audience: Ph.D., MSc, anyone interested in HPC systems.
👩‍🎓 Level: Advanced.
🔒 License: Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.
💰 Funding: This project was funded by the Novo Nordisk Fonden (NNF20OC0063268).

Module Topics

HPC systems and HPC architecture
National HPC platforms
Setting up HPC environments (software requirements, job scheduling, resource management)
Data management and optimization
Programming and parallel computing
Jobs on HPC clusters (submissions, benchmarking, debugging and monitoring)
Advanced topics

Basics of High-Performance Computing (HPCs)

High-Performance Computing (HPC) involves connecting a large number of computing hardware components to execute many operations simultaneously. A supercomputer consists of various hardware types, typically organized in this hierarchy:

CPU: The unit that executes a sequence of instructions. A CPU may contain multiple cores, allowing independent execution of several instruction chains.
Node: A single computer within an HPC system.
Cluster: A group of interconnected nodes that communicate and can work together on a single task.

HPC systems also have a dedicated storage component connected to one or more types of storage hardware, typically referred to as a parallel file system or distributed storage system.

A node can consist of one or multiple CPUs and RAM memory. The RAM (Random Access Memory) serves as temporary storage that helps manage the data required for running tasks quickly but does not perform computations or persist data after the system shuts down. There are other types of nodes containing different hardware combinations. The most common hardware that can be found in a node beyond RAM and CPUs is:

GPU: A graphics processing unit, originally designed for gaming and graphic software, but now used for its computational power. It is particularly efficient in executing repetitive linear algebra operations across multiple parallel processes. Nvidia and AMD are the primary GPU manufacturers.
FPGA: A programmable hardware component capable of accelerating specific operations far faster than a CPU. It is often used to optimize processes traditionally handled by CPUs.

Schematic of components of an HPC [IMAGE]

Nodes

There are two types of nodes on a cluster:

login nodes (also known as head or submit nodes).
compute nodes (also known as worker nodes).

Job scheduler

Note

Several job scheduler programs are available, and SLURM is among the most widely used. In the next section, we’ll explore SLURM in greater detail, along with general best practices for running jobs.

Filesystem

The filesystem consists of all the directories and files accessible to a given process.

Scratch
Users working space

Exercise

I have an omics pipeline that produces a large number of files, resulting in a couple of terabytes of data after processing and analysis. The project will continue for a few more years, and I’ve decided to store the data in the scratch folder. Do you agree with this decision, and why? What factors should be considered when deciding which data to retain and where to store it?

Hint

Typically, scratch storage is not backed up, so it’s not advisable to rely on it for important data. At a minimum, ensure you back up the raw data and the scripts used for processing. This way, if some processed files are lost, you can replicate the analyses.

When deciding which data to keep on the HPC, back up, or delete, consider the following:

Processing Time: Evaluate how long each step of the analysis takes to run. There may be significant computational costs associated with re-running heavy data processing steps.
Storage Management: Use tools like Snakemake to manage intermediate files. You can configure Snakemake to automatically delete intermediate files once the final results are produced, helping you manage storage more efficiently.

Kernel

The kernel is essential for managing multiple programs on your machine, each of which runs as a process. Even if you write code assuming full control over the CPU and memory, the kernel ensures that multiple processes can run simultaneously without interfering with each other. It does this by scheduling time for each process and translating virtual memory addresses into physical ones, ensuring security and preventing conflicts.

The kernel also ensures that processes can’t access each other’s memory or directly modify the hard drive, maintaining system security and stability. For example, when a process needs to write to a file, it asks the kernel to do so through a system call, rather than writing directly.

In conclusion, it plays a crucial role in managing the CPU, memory, disk, and software environment. By mediating access to these resources, it maintains process isolation, security, and the smooth operation of your system.

Kernel primary roles:

Interfaces with hardware to facilitate program operations
Manages and schedules the execution of processes
Regulates and allocates system resources among processes

Before start using an HPC

High-Performance Computing (HPC) systems might be organized differently, but there is typically an HPC administration team you can contact to understand how your specific HPC is structured. Key information you should seek from them includes:

The types of compute nodes available.
The storage options you can access and the amount allocated per user.
Whether a job scheduler software is in use, and if so, which one. You can also request a sample submission script to help you get started.
The policy on who bears the cost of using the HPC resources.
Whether you can install your own software and create custom environments.

Be nice

If your HPC system doesn’t have a job scheduler in place, we recommend using the nice command. This command allows you to adjust and manage the scheduling priority of your processes, giving you the ability to run tasks with lower priority when sharing resources with others. By using nice, you can ensure that your processes do not dominate the CPU, allowing other users’ tasks to run smoothly. This is particularly useful in environments where multiple users are working on the same system without a job scheduler to automatically manage resource allocation.

Exercise 1: General HPC

Describe how a typical HPC is organised: nodes, job scheduler and filesystem.
What are the roles of a login node and a compute node? how do they differ?
Describe the role of a job scheduler
What are the differences between scratch and home storage and when each should be used?
What is a kernel?

Key areas of HPC use

Quantum mechanics
Complex physical simulations (CFD)
Weather forecasting
Molecular modeling
Machine learning with big data
Bioinformatics

Academic applications

HPC systems offer immense computational power, but they are not limited to large-scale projects. You can request anything from a single CPU and some RAM to entire nodes. Danish HPC systems are available for various academic purposes, including:

Research projects: access to computing power
Collaboration: Easier data and settings sharing for collaborative projects.
Student exercises in classroom teaching: Pre-installed software or package management, saving time on configuration, especially in teaching environments where students may face issues with different operating systems or software versions.
Student projects: students are not authorized to request resources independently. It is the responsibility of the lecturer or professor to obtain resources through the front office or facilitator. Once resources are allocated, students can be invited to access the project.

The Danish HPC ecosystem emphasizes teaching and training new users, so applications for resources related to courses and student projects are strongly encouraged. As a reminder, students cannot request resources themselves. Professors or lecturers are responsible for obtaining resources through the front office, and students can be granted access to the allocated project.

Here is an overview of the different Danish HPC systems

Feature	Computerome	GenomeDK	UCloud	Gefion	LUMI
CPU Nodes	692 thin / 55 fat (50k cores)	52 thin / 60 fat (15k cores)	6592 vCPUs	382 / 112 core each	2048 / 128 core each
GPU Nodes	40 NVIDIA V100s	2 NVIDIA L40S	16 NVIDIA H100s	191 NVIDIA DGX / 8 H100s	2978 / 4 AMD M250x
Storage	50 PB	23 PB	3 PB	? Provided by DDN	~120 PB
Security	ISAE 3000 + ISO 27001	ISAE 3000 + ISO 27001	ISO 27001	NA	ISAE 3000 + ISO 27001
Sandbox	No	No	Yes (1000 core-hours)	Application based	Application based
Sensitive Data	Yes, private clouds	Yes, ‘closed zones’	Yes	Not yet	Not yet
Env Management	Conda (& Docker?)	Singularity/Apptainer	Conda	NA	Singularity/Apptainer
OS	CentOS 7	AlmaLinux 8	UCloud GUI	? NVIDIA Enterprise	SUSE LES 15 / CRAY

HPC access

HPC systems allow multiple users to log in simultaneously and utilize a portion of the system’s resources, usually allocated by an administrator. Exceeding these assigned resources will terminate the running software. In Denmark, there are two ways to access an HPC:

Interactive Interface: You can log in using a user-friendly interface (UCloud Supercomputer and Documentation).
Command Line Interface: Requires knowledge of the UNIX shell language (here a good introduction the command-line).

Typically, users first access a login node, which has limited computational resources and is used for tasks like file management and small-scale code testing. Users may be assigned:

A certain number of CPUs (and possibly GPUs or FPGAs)
A specific amount of RAM
Storage capacity
An amount of total time for using these resources

We recommend to directly contact the local front office to discuss resource availability.

What can I run from a login node

A straightforward rule: do not run anything on the login node to prevent potential problems. If the login node crashes, the entire system may need to be rebooted, affecting everyone. Remember, you’re not the only one using the HPC—so be considerate of others. For easy, quick tasks, request an interactive access to one of the compute nodes.

In principle, the only activities you should perform on the login node include:

Your active login session.
[OPTIONAL] A terminal multiplexer, such as TMUX, SCREEN, or similar.
Submitting jobs to the queueing system, whether regular or interactive.

Sources

Useful links

Acknowledgements

Copyright

CC-BY-SA 4.0 license