HPC Lab
  • Home
  • HPC Launch
  • HPC Pipes
  • Workshop
  1. HPC Launch
  2. Day 2
  3. Queueing systems
  • HPC Launch
    • Welcome to the HPC-Launch workshop
    • Day 1
      • HPC setup
      • HPC file transfers
      • Git and Github
    • Day 2
      • Project structure
      • Package managers
      • Queueing systems
      • Archiving
      • Final Quiz
  • HPC Pipes
    • Welcome to the HPC-Pipes workshop
    • Day 1
      • Package managers: Conda
      • Package managers: Pixi
      • Containers: Apptainer
      • Containers: Docker
      • Snakemake
    • Day 2
      • Snakemake advanced
      • Snakemake - envs
      • Nextflow
  • UCloud setup
    • UCloud project workspace
    • SSH on UCloud
    • GitHub on UCloud
    • Conda on UCloud

On this page

  • Job submission with SLURM
  1. HPC Launch
  2. Day 2
  3. Queueing systems

Queueing systems

WarningNew Terminal Job configuration

The Slurm workload manager is installed and configured in the Terminal app so we will be using:

  1. Enter a job name (descriptive of the task, e.g.: slurm myname)
  2. Select the time (in hours): 1h.
  3. Number of nodes: 2
  4. Machine type: and the machine type (selecting a 2 CPU standard node with 6GB memory).
  5. Select folders to use:
    • <Member Files: Username/hpcLaunch>
    • /shared/HPCLab_workshop
  6. Additional Parameters.
    • Enable tmux > true.
    • Slurm cluster > true (alternatively, run the following command on any terminal init_slurm_cluster)
    • Initialization: /shared/HPCLab_workshop/setup.sh

Job submission with SLURM

Here is a list of SLURM common commands:

sbatch → submit a shell script to the queue
squeue → see all the jobs in the queue
squeue -u USERNAME → see your jobs only
scancel JOBID → cancel the job with the specified ID
scancel -u USERNAME → cancel all your jobs
sacct --user=$USER → get efficiency information about your job
srun → launch tasks within hob allocation. It can be used in batch scripts and on the command line
srun --pty bash -i → to run interactive jobs 
sinfo → info about SLURM nodes and partitions

If you are unsure how to use the commands run man <SLURM_command> on the terminal.

ExerciseExercise 1: Submitting a Batch Job with SLURM

In this exercise, you will prepare sequencing data, create a software environment, write a SLURM batch script, and submit an alignment job to the cluster queueing system.

  1. Create a new subdirectory called batchUCloud inside your hpcLaunch/day2 folder and move (cd) into it:
mkdir -p hpcLaunch/day2/batchUCloud
cd /work/hpcLaunch/day2/batchUCloud
  1. Download the input data for the exercise
# FASTQ files 
wget https://github.com/hartwigmedical/testdata/raw/master/100k_reads_hiseq/TESTX/TESTX_H7YRLADXX_S1_L001_R1_001.fastq.gz \
     -O ./data.fastq.gz

wget https://github.com/hartwigmedical/testdata/raw/master/100k_reads_hiseq/TESTX/TESTX_H7YRLADXX_S1_L001_R2_001.fastq.gz \
     -O ./data2.fastq.gz

wget https://github.com/hartwigmedical/testdata/raw/master/100k_reads_hiseq/TESTX/TESTX_H7YRLADXX_S1_L002_R1_001.fastq.gz \
     -O ./data3.fastq.gz

wget https://github.com/hartwigmedical/testdata/raw/master/100k_reads_hiseq/TESTX/TESTX_H7YRLADXX_S1_L002_R2_001.fastq.gz \
     -O ./data4.fastq.gz

# Reference genome
wget http://genomedata.org/rnaseq-tutorial/fasta/GRCh38/chr22_with_ERCC92.fa \
     -O ref.fasta
  1. Uncompress the FASTQ files:
gunzip data*.fastq.gz
  1. Create a Conda environment containing bwa-mem2 and samtools.
conda create -c conda-forge -c bioconda --prefix /work/hpcLaunch/day2/envs/alignment bwa-mem2 samtools
  1. Create a Batch script called align.sh using a text editor (e.g. nano) and add the following content to the file:
align.sh
#!/bin/bash
#SBATCH --cpus-per-task=2
#SBATCH --mem=4g
#SBATCH --time=00:30:00

# Initialise conda
source /work/HPCLab_workshop/miniconda3/etc/profile.d/conda.sh

# Activate environment
conda activate /work/hpcLaunch/day2/envs/alignment

sleep 1m

# Index reference
bwa-mem2 index ref.fasta

# Align reads and sort BAM file
bwa-mem2 mem -t 2 ref.fasta \
    data.fastq \
    | samtools sort \
        -@ 3 \
        -n \
        -O BAM \
    > data.bam

exit 0
  1. Submit the batch script to SLURM:
sbatch align.sh
  1. Monitor the job and the cluster nodes

Check available compute nodes

sinfo -N -l

You should see output similar to:

NODELIST   NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
node1          1    CLOUD*        idle 2       1:2:1   6000        0      1   (null) none  

Check your own currently submitted jobs:

squeue --me

Find the JOBID of your running or pending job(s) and note the state. Is the job running?

  1. What if you would need to cancel the job?

Type the command you would use to cancel the job you submitted, knowing the JOBID=1

Did the batch job run successfully?

ExerciseExercise 2: Submitting a Batch array

In this exercise, you will run the same alignment operation on multiple FASTQ files in parallel using a SLURM job array.

  1. Create directories for output files and log files, then generate a list containing all FASTQ files:
mkdir -p results logs

#  List of FASTQ files
ls *.fastq > fastq_list.txt
  1. Open a text editor and create a new batch script called align_array.sh.
align_array.sh
#!/bin/bash
#SBATCH --cpus-per-task=2
#SBATCH --mem=5g
#SBATCH --time=00:30:00
#SBATCH --array=1-4%2
#SBATCH --job-name=alignArray
#SBATCH --output=logs/align_%A_%a.out
#SBATCH --error=logs/align_%A_%a.err

set -euo pipefail

# Initialise conda
source /work/HPCLab_workshop/miniconda3/etc/profile.d/conda.sh
# Activate environment
conda activate /work/hpcLaunch/day2/envs/alignment

mapfile -t fastqs < fastq_list.txt
fq="${fastqs[$((SLURM_ARRAY_TASK_ID-1))]}"
sample=$(basename "$fq" .fastq)

bwa-mem2 mem -t "$SLURM_CPUS_PER_TASK" ref.fasta "$fq" \
  | samtools sort -@ 1 -O BAM \
  > "results/${sample}.bam"

exit 0
  1. Submit and monitor the array job:
sbatch align_array.sh
  1. Monitor the job
  2. Check that outputs and logs are separated by array task id (%a) and parent job id (%A). Open of the log files!
ls results/
ls logs/

How many files you have in the results folder?

Do you have any *.err file?

Tipset -euo pipefail

Make your pipelines easier to debug by using this command to avoid silent errors and incomplete outputs.

set -euo pipefail
     -e # Exit immediately if any command returns a non-zero exit status
     -u # Treat the use of undefined variables as an error
     -o pipefail # Make a pipeline fail if any command in the pipeline fails, not just the last one
NoteTaskID SLURM_ARRAY_TASK_ID

With --array=1-4%2, SLURM creates 4 tasks and runs at most 2 concurrently.

Each task gets its own SLURM_ARRAY_TASK_ID, used to pick one input file from fastq_list.txt.

TipBonus exercise: Job dependencies

Chained Jobs with Dependencies

In this exercise, you will build a simple 3-step pipeline using SLURM job dependencies.

  1. Prepare a second batch script called index_array.sh:
index_array.sh
#!/bin/bash
#SBATCH --cpus-per-task=1
#SBATCH --mem=5g
#SBATCH --time=00:30:00
#SBATCH --array=1-4
#SBATCH --job-name=indexArray
#SBATCH --output=logs/index_%A_%a.out
#SBATCH --error=logs/index_%A_%a.err

set -euo pipefail

# Initialise conda
source /work/HPCLab_workshop/miniconda3/etc/profile.d/conda.sh
# Activate environment
conda activate /work/hpcLaunch/day2/envs/alignment

# Select the BAM file corresponding to the current array task
bams=(results/*.bam)
bam="${bams[$((SLURM_ARRAY_TASK_ID-1))]}"

# Index BAM file
samtools index "$bam"

exit 0

Then, create one last file, report_job.sh, that generates a simple summary file listing all BAM and BAI files produced by the pipeline.

report_job.sh
#!/bin/bash
#SBATCH --cpus-per-task=1
#SBATCH --mem=1g
#SBATCH --time=00:10:00
#SBATCH --job-name=arrayReport
#SBATCH --output=logs/report_%j.out

set -euo pipefail

echo "BAM files" > results/summary.txt
ls -1 results/*.bam >> results/summary.txt

echo "" >> results/summary.txt

echo "BAI files" >> results/summary.txt
ls -1 results/*.bam.bai >> results/summary.txt

exit 0

The aim of this exercise, is to submit all jobs at once so that:

  • indexing starts only after alignment succeeds
  • reporting starts only after indexing succeeds

How do we do that?

ALIGN_ID=$(sbatch --parsable align_array.sh)

INDEX_ID=$(sbatch --parsable \
    --dependency=afterok:${ALIGN_ID} \
    index_array.sh)

REPORT_ID=$(sbatch --parsable \
    --dependency=afterok:${INDEX_ID} \
    report_job.sh)

echo "ALIGN=${ALIGN_ID} INDEX=${INDEX_ID} REPORT=${REPORT_ID}"

squeue --me -n alignArray,indexArray,arrayReport

After all jobs have completed, inspect the generated summary file:

cat results/summary.txt

How many lines does your text file contain?

TipAfterok

–dependency=afterok: ensures that the next job starts only after the specified job completes successfully.

If the parent job fails, the dependent jobs remain pending and are marked with an unsatisfied dependency.

TipBonus exercise 2: Interactive jobs

Open a tmux session and start an interactive Bash job using srun with a duration of 2 minute. After launching the job, detach from the tmux session and use squeue command to check how the job is listed. Finally, once the job time has elapsed, reconnect if needed and terminate the session/job.

HintSolution

srun –cpus-per-task=1 –time=00:02:00 –pty bash

ExerciseExercise 3: Job monitoring

Onces, you have run the sbatch jobs and are completed. Looks at the status information about historical jobs.

# all jobs from this session
sacct --user=ucloud
# display info about a specific job 
sacct -j <JOBID>

What is the State of all your jobs?

Copyright

CC-BY-SA 4.0 license