HPC Lab
  • Home
  • HPC Launch
  • HPC Pipes
  • Workshop
  1. HPC Pipes
  2. Welcome to the HPC-Pipes workshop
  • HPC Launch
    • Welcome to the HPC-Launch workshop
    • Day 1
      • HPC setup
      • HPC file transfers
      • Git and Github
    • Day 2
      • Project structure
      • Package managers
      • Queueing systems
      • Archiving
      • Final Quiz
  • HPC Pipes
    • Welcome to the HPC-Pipes workshop
    • Day 1
      • Package managers: Conda
      • Package managers: Pixi
      • Containers: Apptainer
      • Containers: Docker
      • Snakemake
    • Day 2
      • Snakemake advanced
      • Snakemake - envs
      • Nextflow
  • UCloud setup
    • UCloud project workspace
    • SSH on UCloud
    • GitHub on UCloud
    • Conda on UCloud

On this page

  • Welcome to the HPC-Pipes workshop
  • Course requirements
    • Reading material (optional)
  • Agenda
    • Day 1
    • Day 2
    • Course material
  • Discussion and feedback
  1. HPC Pipes
  2. Welcome to the HPC-Pipes workshop

Welcome to the HPC-Pipes workshop

Course requirements

PLEASE READ BEFORE COURSE!

Whether you need to install some software, depends on which option you choose to follow the course and run the exercises. We recommend creating an account on one of the Danish academic HPC platforms (UCloud or GenomeDK) but you could run all exercises locally (computer icon).

  • UCloud: Recommended for users who need (or plan) to use this HPC platform, for those with limited experience with HPCs, and for anyone curious to explore how this platform works.
  • GenomeDK: Recommended only for users who already have experience working with HPC systems, clusters, or supercomputers. Familiarity with the Unix command line and queueing systems is expected.

If you have chosen UCloud, we provide with all necessary software pre-installed but you will need to follow these instructions to create an account and join our workspace Sandbox_workshop. You will need your institutional email to proceed. Unfortunately, this will not work for those without a university email.

  1. Create an account on UCloud with your institution’s credentials
  2. Use the link below to join our workspace where you will find a setup environment1

  1. Install Docker to run the container exercises:
    • Docker - click on Download Docker Desktop (recommended).
  2. You’re all set! You will receive instructions on how to navigate through UCloud during the course.

If you have chosen GenomeDK,

  1. Create an account via this request form at https://console.genome.au.dk/user-requests/create and select the Open zone. In the Reason field, write something like: I am currently enrolled in a course called “HPC-pipes: Workflow Languages and Portable Environments for HPC”, organised by the National Project Health Data Science Sandbox.
  2. Write your genomedk username in this excelsheet.
  3. Optional (if you would like to run Docker commands locally; otherwise, you will work with Apptainer on GenomeDK):
    • Docker - click on Download Docker Desktop.
  4. You’re all set! You will receive instructions on how to navigate through GenomeDK during the course.

If you prefer to run the exercises on your personal laptop or a different server, please ensure you have the following software installed:

One of the package managers:

  • conda - miniconda or miniforge recommended. Here is the link to install miniforge: https://github.com/conda-forge/miniforge/
  • OR pixi

One container software:

  • Docker - click on Download Docker Desktop.
  • OR Apptainer. You will also need a Linux virtual machine.

Both workflow mgmt systems:

  • snakemake use conda for this!
  • nextflow
ImportantWindows users

Note for Windows users: Installing Docker Desktop requires administrative privileges on your computer. You will be prompted to enter your KU username and password during the installation process. Additionally, Docker Desktop depends on the Windows Subsystem for Linux (WSL), which will also need to be installed.

Reading material (optional)

  • The Turing way. It offers comprehensive guidance on reproducible research practices, including setting up computational environments and managing reproducible workflows.
  • Mölder, Felix, et al. “Sustainable data analysis with Snakemake.” F1000Research 10 (2021). Link to article. Best practices using Snakemake to develop your pipelines.
  • Check our content on HPC pipes.

Agenda

Day 1

Time Activity Time Activity
8:45 Morning coffee (optional)
9:00 Intro to HPC & onboarding 13:00 Software mgmt II - containers
9:45 Software mgmt I - intro 13:20 Software mgmt II - docker, apptainer
10:30 Coffee break 14:15 Coffee break
10:45 Software mgmt I - conda envs 14:30 Computations mgmt I - smk structure
12:00 Lunch break 15:15 Computations mgmt I - exercises

Conda exercises Pixi exercises Apptainer exercises Docker exercises Snakemake I exercises

Day 2

Time Activity Time Activity
8:45 Morning coffee (optional)
9:00 Computations mgmt II - smk dynamic config 13:00 Computations mgmt III - nf
9:45 Computations mgmt II - Ex. smk integration 14:15 Coffee break
10:15 Coffee break 14:30 Computations mgmt III - nf exercises
10:30 Computations mgmt II - Ex. smk implementation 15:00 Build your own pipeline
12:00 Lunch break 15:30 Wrap up

Snakemake advanced exercises Snakemake + envs exercises Nextflow exercises

Course material

Discussion and feedback

We hope you enjoyed the workshop. As data scientists, we also would be really happy for some quantifiable info and feedback - we want to build things that the Danish health data science community is excited to use. Please, fill up the feedback form before you head out for the day 2.

 

Nice meeting you and we hope to see you again!

About the National Sandbox project

The Health Data Science Sandbox aims to be a training resource for bioinformaticians, data scientists, and those generally curious about how to investigate large biomedical datasets. We are an active and developing project seeking interested users (both trainees and educators). All of our open-source materials are available on our Github page and can be used on a computing cluster! We work with both UCloud, GenomeDK and Computerome, the major Danish academic supercomputers.

Footnotes

  1. link activated a week before the workshop.↩︎

  2. link activated on the day of the workshop.↩︎

Copyright

CC-BY-SA 4.0 license