HPC Lab
  • Home
  • HPC Launch
  • HPC Pipes
  • Workshop
  1. HPC Pipes
  2. Welcome to the HPC-Pipes workshop
  • UCloud setup
    • UCloud project workspace
    • SSH on UCloud
    • GitHub on UCloud
    • Conda on UCloud
  • HPC Launch
    • Welcome to the HPC-Launch workshop
    • Managing data
    • Knowledge Checks
  • HPC Pipes
    • Welcome to the HPC-Pipes workshop
    • Day 1
      • Day 1 - Part 1
      • Day 1 - Part 2
    • Day 2
      • Day 2 - Part 3
      • Day 2 - Part 4
      • Day 2 - Part 5

On this page

  • Welcome to the HPC-Pipes workshop
  • Course requirements
    • Using UCloud for exercises
    • Reading material (optional)
  • Agenda
    • Day 1
    • Day 2
    • Download the slides
  • Discussion and feedback
  1. HPC Pipes
  2. Welcome to the HPC-Pipes workshop

Welcome to the HPC-Pipes workshop

PLEASE READ BEFORE COURSE!

Course requirements

Required preparation

You are expected to complete the required setup, including tool installation (Docker) and account creation (UCloud).

  • Docker - click on Download Docker Desktop.

Note for Windows users: Installing Docker Desktop requires administrative privileges on your computer. You will be prompted to enter your KU username and password during the installation process. Additionally, Docker Desktop depends on the Windows Subsystem for Linux (WSL), which will also need to be installed.

As for other software, we will provide access to a Danish HPC platform, UCloud, with all necessary software pre-installed. Please read Using UCloud for exercises carefully.

If you prefer to run the exercises on your personal laptop or a different server, please ensure you have the following software installed:

  • conda - miniconda or miniforge recommended.
  • snakemake use conda for this!
  • nextflow
  • Apptainer, formerly known as Singularity.

Using UCloud for exercises

Warning

Follow the instructions below if you have an account at a Danish university. You will need your institutional email to proceed. Unfortunately, this will not work for those without a university email.

  1. Create an account on UCloud with your institution’s credentials
  2. Use the link below to join our workspace where you will find a setup environment1

  1. You’re all set! You will receive instructions on how to navigate through UCloud during the course.

Reading material (optional)

  • The Turing way. It offers comprehensive guidance on reproducible research practices, including setting up computational environments and managing reproducible workflows.
  • Mölder, Felix, et al. “Sustainable data analysis with Snakemake.” F1000Research 10 (2021). Link to article. Best practices using Snakemake to develop your pipelines.
  • Check our content on HPC pipes.

Agenda

Day 1

Time Activity Time Activity
8:45 Morning coffee (optional)
9:00 Intro to HPC & onboarding 13:00 Software mgmt III (docker)
9:45 Software mgmt I (conda) 13:45 Computations mgmt I (smk)
10:30 Coffee break 14:15 Coffee break
10:45 Software mgmt II (conda) 14:30 Computations mgmt I (smk)
12:00 Lunch break 15:30 Wrap-up

Environments exercises Snakemake I exercises

Day 2

Time Activity Time Activity
8:45 Morning coffee (optional)
9:00 Computations mgmt II (smk) 12:00 Lunch break
9:45 Exercise - smk integration 13:00 Exercise nf + wrap up
10:15 Coffee break 14:15 Coffee break
10:30 Exercise - smk implementation 14:30 Build your own pipeline
11:15 Computations mgmt III (nf) 15:00 Wrap-up

Snakemake II exercises Snakemake III exercises Nextflow I exercises

Download the slides

Discussion and feedback

We hope you enjoyed the workshop. As data scientists, we also would be really happy for some quantifiable info and feedback - we want to build things that the Danish health data science community is excited to use. Please, fill up the feedback form before you head out for the day 2.

 

Nice meeting you and we hope to see you again!

About the National Sandbox project

The Health Data Science Sandbox aims to be a training resource for bioinformaticians, data scientists, and those generally curious about how to investigate large biomedical datasets. We are an active and developing project seeking interested users (both trainees and educators). All of our open-source materials are available on our Github page and can be used on a computing cluster! We work with both UCloud, GenomeDK and Computerome, the major Danish academic supercomputers.

Footnotes

  1. link activated a week before the workshop.↩︎

  2. link activated on the day of the workshop.↩︎

Copyright

CC-BY-SA 4.0 license