Workflows & environments

Modified

November 14, 2024

Integration between workflows and software environments

Snakemake or Nextflow pipelines are essentially code scripts that require an appropriate computational environment to run properly. Let’s explore the challenges of managing computational environments for workflows.

You can use a single common environment for all tasks in a workflow, which is generally recommended unless there are conflicting dependencies (for example, if one task requires a different version of a library than another). Alternatively, you might use separate environments if you’re reusing a task from another workflow and don’t want to alter its existing environment, or if a rarely run task has a large environment. In such cases, creating a dedicated environment for that task can help reduce the overall resource usage of the workflow.

Snakemake

Snakemake has built support for tasks environments:

  • Conda
  • Environment modules
  • Singularity
rule ...:
  conda: "path/to/env.yml"
  shell:
    "somecommand {output}"

Nested environments with Docker for reproducibility Two-level environment:

  • Outer container
  • Inner container

Nextflow

Enable conda directives in the pipeline configuration file (e.g. nextflow.config).

conda.enabled = true

Alternatively, it can be specified by setting the variableNXF_CONDA_ENABLED=true in your environment or by using the -with-conda command line option.

process foo {
  conda 'bwa samtools multiqc'  # conda package YourNameSurname
  conda '/path/to/my-env.yaml'  # conda environment file 

  '''
  your_command --here
  '''
}
Environment Manager Link
Docker Nextflow Containers
Singularity/Apptainer Nextflow Containers
Conda Nextflow Conda Integration

It is recommended to specify environments in a separate configuration profile when possible to allow the execution via command line and enhance portability:

profiles {
  conda {
    process.conda = 'samtools'
  }

  docker {
    process.container = 'biocontainers/samtools'
    docker.enabled = true
  }
}

This allows the execution either with Conda or Docker specifying -profile conda or -profile docker when running the workflow script.