An introduction to the GDK system and basic commands tinyurl.com/GDKslides
Bioinformatics Research Centre, Nat
GenomeDK, Health
Biomedicine, Health
2026-01-22
These slides are both a presentation and a small reference manual
We will try out some commands during the workshop
Official reference documentation: genome.au.dk
Practical help:
Samuele (BiRC, MBG) - samuele@birc.au.dk
Drop in hours:
General mail for assistance
support@genome.au.dk
10:00-11:00: What is a HPC, GenomeDK, How it works, File System, virtual environments
11:00-12:00: desktop interface, new environment, transfer data, interactive job
12:45-14:00: Hands-on
Webpage: https://hds-sandbox.github.io/GDKworkshops/
Slides will always be up to date in this webpage
Learn your way around the basics of the GenomeDK cluster.
GenomeDK is a High performance computing (HPC) cluster, i.e. a set of interconnected computers (nodes). GenomeDK has:
HPC cluster from the backside
One node being mounted
A single node with CPU and RAM
A CPU (Central Processing Unit) executes instructions from programs.
A CPU has multiple cores, each core can execute instructions independently at the cost of reduced bandwidth
Powershell (Windows) or Terminal (MacOS, Linux) are powerful tools to interact with a HPC cluster.
Terminal for MacOS and Linux. This is called powershell in Windows.
Interactive desktop
GenomeDK also has an interactive desktop interface at desktop.genome.au.dk which can be used to access the cluster through a web browser. More on that later.
Terminals look old, but it gives you enormous flexibility through efficient commands and scripting possibilities.
An old “Dumb Terminal” from the 1970s
Creating an account happens through this form at genome.au.dk

Logging into GenomeDK happens through the command 1
When first logged in, setup the 2-factor authentication by
showing a QR-code with the command
scanning it with your phone’s Authenticator app 2.
It is nice to avoid writing the password at every access. If you are on the cluster, exit from it to go back to your local computer
Now we set up a public-key authentication. We generate a key pair (public and private):
Always press Enter and do not insert any password when asked.
and create a folder on the cluster called .ssh to contain the public key
and finally send the public key to the cluster, into the file authorized_keys
After this, your local private key will be tested against GenomeDK’s public key every time you log in.
Folders and files follow a hierarchy
/ is the root folder of the filesystem - nothing is above thathome and faststorage are two of the folders in the root/faststorage/project and linked to your home
Log in: ssh USERNAME@login.genome.au.dk
Note
Run a command = Type a command + Enter
pwd, You should see your home folder: /home/USERNAME
/home/USERNAME is an example of path.pwd shows your current folder (WD, Working Directory)Run ls . to show the content of your WD (the dot .)
Run mkdir -p GDKintro to create a GDKintro folder
Run echo "hello" > ./GDKintro/file.txt to write hello in a file
Use ls -lh ./GDKintro to see if the text file is there with some info.
Relative and absolute paths
/home/USERNAME starts from the root /. It is an absolute path../GDKintro starts from the WD. It is a relative path.Look at the File system tree and answer to the following questions:

After log in, you will find yourself into your private home folder, denoted by ~ or equivalently /home/username. Your prompt will look like this:
which follows the format [username@node current_folder].
Warning
We now set the WD into GDKintro and remove all text files in it. Then we download a zipped fastq file, unzip it, and print a preview!
rm *.txt removes all files ending with .txt. The symbol * is a wildcard for the file name
Forever away
There is no trash bin - removed files are lost forever - with no exception
head prints the first lines of a text file
Useful utility 1: less file reader. less is perfect for exploring (big) text files: you can scroll with the arrows, and quit pressing q. Try
The very first sequence you see should be
@HISEQ_HU01:89:H7YRLADXX:1:1101:1116:2123 1:N:0:ATCACG
TCTGTGTAAATTACCCAGCCTCACGTATTCCTTTAGAGCAATGCAAAACAGACTAGACAAAAGGCTTTTAAAAGTCTA
ATCTGAGATTCCTGACCAAATGT
+
CCCFFFFFHHHHHJJJJJJJJJJJJHIJJJJJJJJJIJJJJJJJJJJJJJJJJJJJHIJGHJIJJIJJJJJHHHHHHH
FFFFFFFEDDEEEEDDDDDDDDD
Challenge yourself
Search online man less (or with less --help) how to look for a specific word in a file with less. Then visualize the data with less, and try to find if there is any sequence of ten adjacent Ns (which is, ten missing nucleotides). Then, answer the question below
Useful utility 2: nano text editor. It open, edits and saves text files. Very useful for changes on the fly.
Try nano data.fastq. Change a base in the first sequence,
then press Ctrl+O to save (give it a new file name newData.fastq and press Enter)
press Ctrl+X to exit. If you use ls you can see the new saved file.
what is a project
Projects are contained in /faststorage/project/ and linked in your home, and are simple folders with some perks:
Common-sense in project creation
bulkRNA_mouse, bulkRNA_human, bulkRNA_apes with the same invited usersbulkRNA_studies with subfolders bulkRNA_mouse, bulkRNA_human, bulkRNA_apes.Request a project (after login on GDK) with the command
After GDK approval, a project folder with the desired name appears in ~ and /faststorage/project. You should be able to set the WD into that folder:
or
Only the creator (owner) can see the project folder. You (and only you) can add an user
or remove it
Users can also be promoted to have administrative rights in the project
or demoted from those rights
You can see globally monthly used resources of your projects with
Example output:
More detailed usage: by users on a selected project
You can see how many resources your projects are using with
Example output:
project period billing hours storage (TB) backup (TB) storage files backup files
ngssummer2024 sarasj 2024-7 77.98 0.02 0.00 528 0
ngssummer2024 sarasj 2024-8 0.00 0.02 0.00 528 0
ngssummer2024 savvasc 2024-7 223.21 0.02 0.00 564 0
ngssummer2024 savvasc 2024-8 0.00 0.02 0.00 564 0
ngssummer2024 simonnn 2024-7 173.29 0.01 0.00 579 0
ngssummer2024 simonnn 2024-8 0.00 0.01 0.00 579 0Accounting Tips
grep to isolate specific users and/or months:
Example:
Private files or folders
Have a coherent folder structure - your future self sends manu thanks.
Example of structure, which backs up raw data and analysis
If your project has many users, a good structure can be
MUST-KNOWs for a GDK project
Backup cost >>> Storage cost >> Computation cost
No preinstalled software on GenomeDK
You install and manage your software and its dependencies inside virtual environments
Each project needs specific software versions dependent on each other for reproducibility - without interferring with other projects.
Definition
A virtual environment keeps project-specific softwares and their dependencies separated
A package manager is a software that can retrieve, download, install, upgrade packages easily and reliably
How virtual envs work: packages at different versions are kept separated into folders, together with all system files needed to make them work.
Conda is both a virtual environment and a package manager.
A newer virtual env and package manager
A package manager puts together the dependency trees of requested packages to find all compatible dependencies versions.
Figure: A package’s dependency tree with required versions on the edges
To install a specific package in your environment, search it on anaconda.org:
Figure: search DeSeq2 for R
Channels
packages are archived in channels. conda-forge and bioconda include most of the packages for bioinformatics and data science.
conda-forge packages are often the most up-to-date.
First of all, we open the desktop interface to GenomeDK at desktop.genome.au.dk. Choose the open frontend for the login.
The desktop session will be operative even if you close and reopen your browser afterwards!
The terminal will work as if you logged into the frontend (The desktop is logged into the front-end node already). You can also use the browser!
Open the terminal and run the command below to install pixi:
After that, make the system recognize pixi
Change your WD with the one we created earlier, where we have the file data.fastq
Initiate a new pixi environment into the folder:
This will also use conda-forge and bioconda as channels for package installation.
What are channels?
Channels are repositories where packages are stored.
conda-forge and bioconda are two of the most used channels for bioinformatics and data science and contain virtually any package you may need.
Note that we specified channels in the order of priority: conda-forge has higher priority than bioconda when installing packages, so you will always search for a package first in conda-forge and then in bioconda.
Use the file browser and open the GDKintro folder
You can see some new files. pixi.toml contains info pixi will use to create your environment.
Open pixi.toml with the text editor, and make sure you have the two channels conda-forge and bioconda as you required with the pixi init command. You can always add more channels if you find out later that you need them for specific packages
Now get back to the terminal and install some packages. Your working directory MUST be the same where pixi.toml is located!
The terminal will look like this at the end
Now open the pixi.toml file. You should see all the installed packages with related information.
Exercise Cont’d
Be sure your WD is in the folder GDKintro. Then run
Open the file environment.yml. It looks very similar to pixi.toml and is compatible with conda to recreate your environment.
Let’s zip those files into one:
Data can be downloaded/uploaded in two ways:
from the command line of a local computer
using an interactive interface (Filezilla)
How to download the environment files to our computer? Open a terminal on your computer and run this command:
scp needs your login and the absolute path to the file. We give also the download destination as the WD on the local computer (.)
You can transfer data with an interactive software, such as Filezilla, which has an easy interface. Download Filezilla.
When done, open Filezilla and use the following information on the login bar:
login.genome.au.dkGenomeDK username and password22Press on Quick Connect. As a result, you will establish a secure connection to GenomeDK. On the left-side browser you can see your local folders and files. On the right-side, the folders and files on GenomeDK starting from your home.
Download the environment.zip file. You need to right-click on it and choose Download
You can do exactly the same to upload files from your local computer!
Running programs on a computing cluster happens through jobs.
Learn how to get hold of computing resources to run your programs.
A computational task executed on requested HPC resources (computing nodes), which are handled by the queueing system (SLURM).
The command gnodes will tell you if there is heavy usage across the computing nodes
Usage of computing nodes. Each node has a name (e.g. cn-1001). The symbols for each node mean running a program (0), assigned to an user (_) and available (.)
If you want to venture more into checking the queueing status, Moi has done a great interactive script in R Shiny for that.
Front-end nodes are limited in memory and power, and should only be for basic operations such as
starting a new project
small folders and files management
small software installations
data transfer
and in general you should not use them to run computations. This might slow down all other users on the front-end.
Useful to run a non-repetitive task interactively
Examples:
splitting by chromosome that one bam file you just got
open Rstudio and Jupyterlab
compress/decompress multiple files, maybe in parallel
Once you exit from the job, anything running into it will stop.
You can also run an interactive job on GenomeDK desktop. Go back to it and use the terminal to go into the GDKintro folder:
Now run an interactive job. Use 8g of RAM, 2 cores, and choose 01:00:00 hours. Choose the account using the name of one of your projects, or delete it if you do not have projects.
You will have to wait in queue. When you get the resources, the node in use is shown in the prompt. Below, for example, the node is s21n32.
[USERNAME@s21n32 ~]$
Now, run rstudio or jupyterlab (your choice!) from the pixi environment:
The packages available in Rstudio and Jupyterlab are the ones installed in your environment. More on this will be in our Advanced GenomeDK workshop.
Please fill out this form :)
A lot of things we could not cover
use the official documentation!
ask for help, use drop in hours
try out stuff and google yourself out of small problems
Slides updated over time, use as a reference
Future workshops about advanced usage and pipelines
