Nextflow
Nextflow is a workfow management system that offers scalable and portable NGS data analysis pipelines, facilitating data processing across diverse computing environments. It streamlines and automates various data analysis steps, enabling parallel processing and seamless integration with existing tools.
Basics
Read more about the basics here. Let’s talk about the main elements:
processes: are the different tasks from a workflow. They are executed independently, are isolated from each other and can be written in any scripting language.
channels: for example, input and output. Each process can defined one or more!
modules: is a script that contains functions, processes, and workflows.
include { process } from './process_module'
The interaction between the processes, and ultimately the pipeline execution flow itself, is implicitly defined by these input and output declarations.
Job execution
nextflow run <pipeline_name> --cpus <n> --mem <n>GB
If a job fails, the pipeline will stop. However, there are some processes directives that can help you handle some errors.
errorStrategy
: key to record failures but avoid stopping the pipeline.
process ignoreAnyError {
errorStrategy 'ignore'
script:
<your command string here>
}
process retryIfFail {
errorStrategy 'retry'
maxRetries 2
memory { task.attempt * 10.GB}
script:
<your command string here>
}
Useful command line interface
# dry-run
nextflow run main.nf -dry-run
# List processes
nextflow run main.nf -process.list
# Using configuration file
nextflow run main.nf -c my.config
# Trace execution (logging)
nextflow run main.nf -trace
# Resume previous run (interrupted)
nextflow run main.nf -resume
Cluster execution
Whether you run the pipeline locally or on an HPC, you can find the Nextflow executor compatible with your environment. Executors manage how and where tasks are executed.
nextflow run <pipeline_name> -profile slurm
Config files
Configuration files are used to specify settings, parameters and other configurations for the pipeline. Find Nextflow documentationhere.
Nextflow allows you to define parameters directly within the main.nf
file, enabling their use in the workflow logic. Additionally, NextFlow supports the definition of parameters in external configuration files, such as nextflow.config
. These parameters can then be accessed and utilized within the .nf
file, offering flexibility in managing workflow behavior and ensuring consistency across different runs.
The hierarchy of how parameters will be used is as follows:
- parameters defined on the command line using
--paramname
- parameters defined in the user config file(s) supplied via
-c my.config
(in the order that they are provided) - parameters defined in the default config file
nextflow.config
- parameters defined within the
.nf
file
Note that if the user specifies -C my.config
(capital C) then only that config file will be read, and the nextflow.config
file will be ignored.
Defining resources
process {
withName: my_task {
cpus = 4
memory = '8 GB'
time = '2h'
}
}
Best practices
- Document your pipeline: overview fo what the workflow does, description of the outputs (results), description of the input and other required files.
- Metadata: author, doi, name, version.
- Attach a test dataset so that others can easily run it.
- Create a
--help
documentation for all your Nextflow scripts so others can easily use and understand them. - Make your workflow easy to read and understand: using whitespaces, comments, name output channels
- Make your workflow modular to avoid duplicate code
nf-core
nf-core is a collaborative platform that provides high-quality, standardized, and peer-reviewed bioinformatics pipelines built using Nextflow. These pipelines are designed to be portable, reproducible, and scalable across various computing environments, from local setups to cloud-based platforms and high-performance computing (HPC) clusters. nf-core also ensures best practices by offering documentation and continuous integration testing for all pipelines, promoting consistency in bioinformatics workflows.
If you want to contribute, start by building your pipeline using an nf-core template.