Sandbox Workshop

Sandbox Workshop dates: 18-19 April 2024

Welcome to the homepage for in-person workshops introducing the Health Data Science Sandbox to new users. Thanks for joining us!

Agenda

Day 1

Time Activity
9:00 Morning coffee (optional)
9:30 Introduction to the Sandbox project
10:00 Introduction to HPC: the basics
10:30 Coffee break
10:45 DK HPC resources, access, and intro to UCloud
11:15 UCloud demo: using apps and running jobs
12:00 Lunch break
13:00 Proteomics App
14:15 Coffee break
14:30 Proteomics App
16:00 End of day!

Day 2

Time Activity
9:00 Morning coffee (optional)
9:30 RDM intro for health data science
10:30 Coffee break
10:45 Step-by-step guide: simple solutions
12:00 Lunch break
13:00 Transcriptomics App
14:30 Coffee break
14:45 Transcriptomics App
15:45 Wrap-up and feedback
16:00 End of day and goodbye!

Workshop

The Health Data Science Sandbox aims to be a training resource for bioinformaticians, data scientists, and those generally curious about how to investigate large biomedical datasets. We are an active and developing project seeking interested users (both trainees and educators). All of our open-source materials are available on our Github page and can be used on a computing cluster! We work with both UCloud, GenomeDK and Computerome, the major Danish academic supercomputers. See our HPC Access page for more info on each setup.

Access Sandbox resources

Our first choice is to provide all the training materials, tutorials, and tools as interactive apps on UCloud, the supercomputer located at the University of Southern Denmark. Anyone using these resources needs the following:

  1. a Danish university ID so you can sign on to UCloud via WAYF1.

 

for UCloud Access click here

 

  1. basic ability to navigate in Linux/RStudio/Jupyter. You don’t need to be an expert, but it is beyond our ambitions (and course material) to teach you how to code from zero and how to run analyses simultaneously. We recommend a basic R or Python course before diving in.

  2. For workshop participants: Use our invite link to the correct UCloud workspace that will be shared on the day of the workshop. This way, we can provide you with compute resources for the active sessions of the workshop2 Click the link below after your first uCloud access and accept the invite that shows.

 

Invite link to uCloud workspace

   

Note

Our apps can run on other clusters, simply by pulling a so-called docker container. You only need to have either docker or singularity installed on the cluster. GenomeDK supports singularity and thus can run our learning material as well. Ask us if you want to help the apps out of uCloud. Instructions will soon be available within our HPC access instructions.

Our OMICS apps

The agenda starts with an introduction to High Performance Computing (HPC) and uCloud. You will try two apps during the workshop, but we are developing others, and have deployed three apps already.

 

Image

Proteomics Sandbox: Our sandbox modern with a suite of proteomics analysis tools, used for example in clinical proteomics. This app is not alone, since our data scientist Jacob has also made the app ColabFold on UCloud, with methods for protein structure prediction.

 

Image

Transcriptomics Sandbox : Our sandbox for bulk or single-cell RNA sequencing analysis and visualization - amongst others two regular workshops and provides stand-alone visualization tools. In the next update, we will introduce advanced tutorials for more complex single-cell RNA sequencing analysis from some of our supported courses.

 

Image

Genomics Sandbox: Our sandbox NGS data analysis and applications range from genome assembly to variant calling to metagenomics. We have currently a semester-long population genomics course and an NGS course with many applications (alignment, VCF analysis, bulk-RNA data, single-cell RNA sequencing)

Discussion and feedback

We hope you enjoyed the live demo. If you have broader questions, suggestions, or concerns, now is the time to raise them! If you are totally toast for the day, remember that you can check out longer versions of our tutorials as well as other topics and tools in each of the Sandbox modules or join us for a multi-day in-person course (follow our news here).

As data scientists, we also would be really happy for some quantifiable info and feedback - we want to build things that the Danish health data science community is excited to use. Please answer these 5 questions for us before you head out for the day 3.

 

Nice meeting you and we hope to see you again!

Footnotes

  1. Other institutions (e.g. hospitals, libraries, …) can log on through WAYF. See all institutions here↩︎

  2. To use Sandbox materials outside of the workshop: remember that each new user has hundreds of hours of free computing credit and around 50GB of free storage, which can be used to run any uCloud software. If you run out of credit (which takes a long time) you’ll need to check with the local DeiC office at your university about how to request compute hours on UCloud. Contact us at the Sandbox if you need help or want more information.↩︎

  3. link activated on day one of the workshop.↩︎