Welcome to the HPC-Launch workshop

Agenda

Time Activity Time Activity
8:45 Morning coffee (optional)
9:00 Introduction to the Sandbox project 12:00 Lunch break
9:15 Introduction to HPC: the basics 13:00 Step-by-step: solutions I
10:15 Coffee break 14:15 Coffee break
10:30 DK HPC resources, access, and intro to UCloud 14:30 Step-by-step: solutions II
11:15 Intro to RDM for health data science 16:00 Discussions & Wrap-up

Course requirements

Required preparation

You are expected to complete the required setup, including tool installation and account creation.

  • Git for version control of your projects

  • A Zenodo account for archiving and sharing your research outputs

  • Python

  • pip for managing Python packages

  • Cookicutter for creating folder structure templates (pip install cookiecutter)

  • md5sum

    Terminal
    # ---- cookiecutter -----
    pip install cookiecutter
    
    # ---- md5sum from coreutils package-----
    # On Ubuntu/Debian
    apt-get install coreutils
    # On macOS
    brew install coreutils
  • Highly recommended: a GitHub account for hosting and collaborating on projects

Note: If you encounter any issues, we will grant access to a Danish HPC platform where all the necessary software is pre-installed. Please read the next section carefully.

Using UCloud for exercises

  1. Create an account on UCloud
  2. Use the link below to join our workspace where you will find a setup environment1

 

Invite link to UCloud workspace

 

Discussion and feedback

We hope you enjoyed the workshop. As data scientists, we also would be really happy for some quantifiable info and feedback - we want to build things that the Danish health data science community is excited to use. Please, fill-up the feedback fork [LINK] before you head out for the day 2.

 

You can download our RDM roadmap here.

Nice meeting you and we hope to see you again!

About the National Sandbox project

The Health Data Science Sandbox aims to be a training resource for bioinformaticians, data scientists, and those generally curious about how to investigate large biomedical datasets. We are an active and developing project seeking interested users (both trainees and educators). All of our open-source materials are available on our Github page and can be used on a computing cluster! We work with both UCloud, GenomeDK and Computerome, the major Danish academic supercomputers.

Footnotes

  1. link activated a week before the workshop.↩︎

  2. link activated on day of the workshop.↩︎