Skip to content

Research Data Management for NGS dataΒΆ

Updated: November 30, 2023

Research Data Management (RDM) for Next Generation Sequencing (NGS) data is a foundational course aimed at providing participants with fundamental knowledge and practical skills in handling the extensive data generated through modern NGS studies in the context of Open Science and FAIR principles. This course covers essential principles of RDM practices, such as data organization, metadata annotation, version control and archiving, enabling researchers to manage NGS data with confidence. Participants will also gain insights into FAIR principles and Open Science, fostering collaboration and reproducibility in NGS research. By the end of the course, attendees will be equipped with essential tools and techniques to navigate the data challenges prevalent in today's NGS research landscape.


Authors

J.A. Romero Herrera


Data Scientist


Overview

πŸ“– Syllabus:

  1. What is Research Data Management and why it is important
  2. What is NGS data
  3. Data Life Cycle
  4. Open Science and FAIR principles
  5. Data Management plans
  6. Folder and file structures applied to NGS data
  7. Metadata applied to NGS data
  8. Create a database of your data and projects
  9. Version control of your data analysis
  10. Archiving and repositories

πŸ•° Total Time Estimation: X hours

πŸ“ Supporting Materials:

πŸ‘¨β€πŸ’» Target Audience: PhD, MsC, anyone interested in RDM for NGS data.

πŸ‘©β€πŸŽ“ Level: Beginner.

πŸ”’ License: Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

πŸͺ™ Funding: This project was funded by the Novo Nordisk Fonden (NNF20OC0063268).

Course Requirements

  • Basic understanding Next Generation Sequencing data and formats.
  • Command Line experience
  • Basic programming experience
  • Mkdocs and mkdocs material

This course provides participants with an overall introduction to effectively manage the vast amounts of data generated in modern NGS studies. Participants will gain a practical understanding of RDM principles and the significance of handling NGS data efficiently. The course covers the unique characteristics of NGS data, its life cycle, and the importance of adopting Open Science and FAIR principles for data accessibility and reusability.

Throughout the course, participants will learn useful skills for organizing NGS data, including creating folder and file structures and implementing metadata to enhance data discoverability and interpretation. Data management plans (DMPs) tailored to NGS data will be explored, ensuring data integrity and compliance with institutional and funding agency requirements. Attendees will also gain insights into setting up simple databases and using version control systems to track changes in data analysis, promoting collaboration and reproducibility.

The course concludes with a focus on archiving and data repositories, enabling participants to preserve and share NGS data for long-term scientific usage. By the end of the course, attendees will be equipped with the necessary tools and techniques to navigate the challenges prevalent in today's NGS research landscape, fostering successful data management practices and enhancing collaboration in the scientific community.

Goals

By the end of this workshop, you should be able to apply the following concepts in the context of Next Generation Sequencing data:

  • Understand what is RDM and why it is important
  • Understand FAIR and Open Science Principles
  • Write a Data Management Plan for your NGS data
  • Structure and establish naming conventions for your files and folders
  • Add relevant metadata to your data
  • Version control your data analysis
  • Select a repository to archive your data
  • Make your data analysis and workflows reproducible

AcknowledgementsΒΆ

  • University of Copenhagen Research Data Management Team.
  • Martin Proks and Sarah Lundregan, Brickman Lab, NNF Center for Stem Cell Biology (reNEW), University of Copenhagen.
  • Richard Dennis, Data Steward, NNF Center for Stem Cell Biology (reNEW), University of Copenhagen.
  • NBISweden.
  • RDMkit, Elixir Research Data Management Platform.

Feedback formΒΆ

We would greatly appreciate to know your thoughts about this workshop. Please, follow this link to answer 9 fast questions!


Last update: November 30, 2023
Created: November 30, 2023