Welcome to the HPC-Launch workshop
PLEASE READ BEFORE COURSE!
Course requirements
You are expected to complete the required setup, including tool installation and account creation.
Git for version control of your projects
pip for managing Python packages
Cookicutter for creating folder structure templates (
pip install cookiecutter
)md5sum. See below how to install
Terminal
# ---- cookiecutter ----- pip install cookiecutter # ---- md5sum from coreutils package----- # On Ubuntu/Debian apt-get install coreutils # On macOS brew install coreutils
Highly recommended
- GitHub account for hosting and collaborating on projects
- Zenodo account for archiving and sharing your research outputs
- DeiC DMP
If you run into any issues installing the software, don’t worry! We will provide access to a Danish HPC platform, UCloud, with all the necessary software pre-installed. Please read the next section carefully.
Using UCloud for exercises
Follow the instructions below if you have an account at a Danish university. You will need your institutional email to proceed. Unfortunately, this will not work for those without a university email.
- Create an account on UCloud with your institution’s credentials
- Use the link below to join our workspace where you will find a setup environment1
Invite link to UCloud workspace
- You’re all set! You will receive instructions on how to navigate through UCloud during the course.
Reading material
About Research Data Management (RDM):
About High-Performance Computing (HPC):
Agenda
Time | Activity | Time | Activity |
---|---|---|---|
8:45 | Morning coffee (optional) | ||
9:00 | Introduction to the Sandbox project | 12:00 | Lunch break |
9:15 | Introduction to HPC: the basics | 13:00 | Step-by-step: solutions I |
10:15 | Coffee break | 14:15 | Coffee break |
10:30 | DK HPC resources, access, and intro to UCloud | 14:30 | Step-by-step: solutions II |
11:15 | Intro to RDM for health data science | 16:00 | Discussions & Wrap-up |
Discussion and feedback
We hope you enjoyed the workshop. As data scientists, we also would be really happy for some quantifiable info and feedback - we want to build things that the Danish health data science community is excited to use. Please, fill up the feedback form before you head out for the day 2.
You can download our RDM roadmap here.
About the National Sandbox project
The Health Data Science Sandbox aims to be a training resource for bioinformaticians, data scientists, and those generally curious about how to investigate large biomedical datasets. We are an active and developing project seeking interested users (both trainees and educators). All of our open-source materials are available on our Github page and can be used on a computing cluster! We work with both UCloud, GenomeDK and Computerome, the major Danish academic supercomputers.