Computational Research Data Management

Authors

Alba Refoyo Martinez

Jose Alejandro Romero Herrera

Published

November 30, 2023

Modified

May 3, 2024

Practical RDM workshop

We offer workshops on practical RDM for biodata. Keep an eye on the upcoming events on the Sandbox website.

Research Data Management for biological data

The course “Research Data Management (RDM) for biological data” is designed to provide participants with foundational knowledge and practical skills in handling the extensive data generated by modern studies, with a focus on Next Generation Sequencing (NGS) data. It emphasizes the importance of Open Science and FAIR principles in managing data effectively. This course covers essential principles and best practices guidelines in data organization, metadata annotation, version control, and data preservation. These principles are explored from a computational perspective, ensuring participants gain hands-on experience in applying them to real-world scenarios in their research labs. Additionally, the course delves into FAIR principles and Open Science, promoting collaboration and reproducibility in research endeavors. By the course’s conclusion, attendees will possess essential tools and techniques to address the data challenges prevalent in today’s NGS research landscape, as well as in other related fields to health and bioinformatics.

Course Overview
  • 📖 Syllabus:
  1. Data Lifecycle Management
  2. Data Management Plans (DMPs)
  3. Data Organization and storage
  4. Documentation standards for biodata
  5. Version Control and Collaboration
  6. Processing and analyzing biodata
  7. Storing and sharing biodata
Course Requirements
  • Basic understanding Next Generation Sequencing data and formats.
  • Command Line experience
  • Basic programming experience
  • Quarto or Mkdocs tools

This course offers participants with an in-depth introduction to effectively managing the vast amounts of data generated in modern studies. Throughout the program, emphasis is placed on practical understanding of RDM principles and the importance of efficient handling of large datasets. In this context, participants will learn the necessity of adopting Open Science and FAIR principles for enhancing data accessibility and reusability.

Participants will acquire practical skills for organizing data, including the creation of folder and file structures, and the implementation of metadata to facilitate data discoverability and interpretation. Special attention is given to the development of Data Management Plans (DMPs) with examples tailored to omics data, ensuring compliance with institutional and funding agency requirements while maintaining data integrity. Attendees will also gain insights into the establishment of simple databases and the use of version control systems to track changes in data analysis, thereby promoting collaboration and reproducibility.

The course concludes with a focus on archiving and data repositories, enabling participants to learn strategies for preserving and sharing data for long-term scientific usage. By the end of the course, attendees will be equipped with essential tools and techniques to effectively navigate the challenges prevalent in today’s research landscape. This will not only foster successful data management practices but also enhance collaboration within the scientific community.

Course Goals

By the end of this workshop, you should be able to apply the following concepts in the context of Next Generation Sequencing data:

  • Understand the Importance of Research Data Management (RDM)
  • Familiarize Yourself with FAIR and Open Science Principles
  • Draft a Data Management Plan for your own Data
  • Establish File and Folder Naming Conventions
  • Enhance Data with Descriptive Metadata
  • Implement Version Control for Data Analysis
  • Select an Appropriate Repository for Data Archiving
  • Make your data analysis and workflows reproducible and FAIR
Warning

This is a computational workshop that focuses primarily on the digital aspect of our data. While wet lab Research Data Management (RDM) involving protocols, instruments, reagents, ELM or LIMS systems is integral to the entire RDM process, it won’t be covered in this course.

As part of effective data management, it’s crucial to prioritize strategies that ensure security and privacy. While these aspects are important, please note that they won’t be covered in our course. However, we highly recommend enrolling in the GDPR course offered by Center for Health Data Science, specially if you’re working with sensitive data. This course specifically focuses on GDPR compliance and will provide you with valuable insights and skills in managing data privacy and security.

Acknowledgements

  • RDMkit, ELIXIR (2021) Research Data Management Kit. A deliverable from the EU-funded ELIXIR-CONVERGE project (grant agreement 871075).
  • University of Copenhagen Research Data Management Team.
  • Martin Proks and Sarah Lundregan, Brickman Lab, NNF Center for Stem Cell Biology (reNEW), University of Copenhagen.
  • Richard Dennis, Data Steward, NNF Center for Stem Cell Biology (reNEW), University of Copenhagen.
  • NBISweden.