Queueing systems
The Slurm workload manager is installed and configured in the Terminal app so we will be using:
- Enter a job name (descriptive of the task, e.g.: slurm myname)
- Select the time (in hours): 1h.
- Number of nodes: 2
- Machine type: and the machine type (selecting a 2 CPU standard node with 6GB memory).
- Select folders to use:
<Member Files: Username/hpcLaunch>/shared/HPCLab_workshop
- Additional Parameters.
- Enable tmux > true.
- Slurm cluster > true (alternatively, run the following command on any terminal
init_slurm_cluster) - Initialization:
/shared/HPCLab_workshop/setup.sh
Job submission with SLURM
Here is a list of SLURM common commands:
sbatch → submit a shell script to the queue
squeue → see all the jobs in the queue
squeue -u USERNAME → see your jobs only
scancel JOBID → cancel the job with the specified ID
scancel -u USERNAME → cancel all your jobs
sacct --user=$USER → get efficiency information about your job
srun → launch tasks within hob allocation. It can be used in batch scripts and on the command line
srun --pty bash -i → to run interactive jobs
sinfo → info about SLURM nodes and partitions
If you are unsure how to use the commands run man <SLURM_command> on the terminal.
In this exercise, you will prepare sequencing data, create a software environment, write a SLURM batch script, and submit an alignment job to the cluster queueing system.
- Create a new subdirectory called batchUCloud inside your hpcLaunch/day2 folder and move (
cd) into it:
mkdir -p hpcLaunch/day2/batchUCloud
cd /work/hpcLaunch/day2/batchUCloud- Download the input data for the exercise
# FASTQ files
wget https://github.com/hartwigmedical/testdata/raw/master/100k_reads_hiseq/TESTX/TESTX_H7YRLADXX_S1_L001_R1_001.fastq.gz \
-O ./data.fastq.gz
wget https://github.com/hartwigmedical/testdata/raw/master/100k_reads_hiseq/TESTX/TESTX_H7YRLADXX_S1_L001_R2_001.fastq.gz \
-O ./data2.fastq.gz
wget https://github.com/hartwigmedical/testdata/raw/master/100k_reads_hiseq/TESTX/TESTX_H7YRLADXX_S1_L002_R1_001.fastq.gz \
-O ./data3.fastq.gz
wget https://github.com/hartwigmedical/testdata/raw/master/100k_reads_hiseq/TESTX/TESTX_H7YRLADXX_S1_L002_R2_001.fastq.gz \
-O ./data4.fastq.gz
# Reference genome
wget http://genomedata.org/rnaseq-tutorial/fasta/GRCh38/chr22_with_ERCC92.fa \
-O ref.fasta- Uncompress the FASTQ files:
gunzip data*.fastq.gz- Create a Conda environment containing bwa-mem2 and samtools.
conda create -c conda-forge -c bioconda --prefix /work/hpcLaunch/day2/envs/alignment bwa-mem2 samtools- Create a Batch script called
align.shusing a text editor (e.g.nano) and add the following content to the file:
align.sh
#!/bin/bash
#SBATCH --cpus-per-task=2
#SBATCH --mem=4g
#SBATCH --time=00:30:00
# Initialise conda
source /work/HPCLab_workshop/miniconda3/etc/profile.d/conda.sh
# Activate environment
conda activate /work/hpcLaunch/day2/envs/alignment
sleep 1m
# Index reference
bwa-mem2 index ref.fasta
# Align reads and sort BAM file
bwa-mem2 mem -t 2 ref.fasta \
data.fastq \
| samtools sort \
-@ 3 \
-n \
-O BAM \
> data.bam
exit 0- Submit the batch script to SLURM:
sbatch align.sh- Monitor the job and the cluster nodes
Check available compute nodes
sinfo -N -lYou should see output similar to:
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
node1 1 CLOUD* idle 2 1:2:1 6000 0 1 (null) none
Check your own currently submitted jobs:
squeue --meFind the JOBID of your running or pending job(s) and note the state. Is the job running?
- What if you would need to cancel the job?
Type the command you would use to cancel the job you submitted, knowing the JOBID=1
Did the batch job run successfully?
In this exercise, you will run the same alignment operation on multiple FASTQ files in parallel using a SLURM job array.
- Create directories for output files and log files, then generate a list containing all FASTQ files:
mkdir -p results logs
# List of FASTQ files
ls *.fastq > fastq_list.txt- Open a text editor and create a new batch script called
align_array.sh.
align_array.sh
#!/bin/bash
#SBATCH --cpus-per-task=2
#SBATCH --mem=5g
#SBATCH --time=00:30:00
#SBATCH --array=1-4%2
#SBATCH --job-name=alignArray
#SBATCH --output=logs/align_%A_%a.out
#SBATCH --error=logs/align_%A_%a.err
set -euo pipefail
# Initialise conda
source /work/HPCLab_workshop/miniconda3/etc/profile.d/conda.sh
# Activate environment
conda activate /work/hpcLaunch/day2/envs/alignment
mapfile -t fastqs < fastq_list.txt
fq="${fastqs[$((SLURM_ARRAY_TASK_ID-1))]}"
sample=$(basename "$fq" .fastq)
bwa-mem2 mem -t "$SLURM_CPUS_PER_TASK" ref.fasta "$fq" \
| samtools sort -@ 1 -O BAM \
> "results/${sample}.bam"
exit 0- Submit and monitor the array job:
sbatch align_array.sh- Monitor the job
- Check that outputs and logs are separated by array task id (%a) and parent job id (%A). Open of the log files!
ls results/
ls logs/How many files you have in the results folder?
Do you have any *.err file?
Make your pipelines easier to debug by using this command to avoid silent errors and incomplete outputs.
set -euo pipefail -e # Exit immediately if any command returns a non-zero exit status
-u # Treat the use of undefined variables as an error
-o pipefail # Make a pipeline fail if any command in the pipeline fails, not just the last one
With --array=1-4%2, SLURM creates 4 tasks and runs at most 2 concurrently.
Each task gets its own SLURM_ARRAY_TASK_ID, used to pick one input file from fastq_list.txt.
Chained Jobs with Dependencies
In this exercise, you will build a simple 3-step pipeline using SLURM job dependencies.
- Prepare a second batch script called
index_array.sh:
index_array.sh
#!/bin/bash
#SBATCH --cpus-per-task=1
#SBATCH --mem=5g
#SBATCH --time=00:30:00
#SBATCH --array=1-4
#SBATCH --job-name=indexArray
#SBATCH --output=logs/index_%A_%a.out
#SBATCH --error=logs/index_%A_%a.err
set -euo pipefail
# Initialise conda
source /work/HPCLab_workshop/miniconda3/etc/profile.d/conda.sh
# Activate environment
conda activate /work/hpcLaunch/day2/envs/alignment
# Select the BAM file corresponding to the current array task
bams=(results/*.bam)
bam="${bams[$((SLURM_ARRAY_TASK_ID-1))]}"
# Index BAM file
samtools index "$bam"
exit 0Then, create one last file, report_job.sh, that generates a simple summary file listing all BAM and BAI files produced by the pipeline.
report_job.sh
#!/bin/bash
#SBATCH --cpus-per-task=1
#SBATCH --mem=1g
#SBATCH --time=00:10:00
#SBATCH --job-name=arrayReport
#SBATCH --output=logs/report_%j.out
set -euo pipefail
echo "BAM files" > results/summary.txt
ls -1 results/*.bam >> results/summary.txt
echo "" >> results/summary.txt
echo "BAI files" >> results/summary.txt
ls -1 results/*.bam.bai >> results/summary.txt
exit 0The aim of this exercise, is to submit all jobs at once so that:
- indexing starts only after alignment succeeds
- reporting starts only after indexing succeeds
How do we do that?
ALIGN_ID=$(sbatch --parsable align_array.sh)
INDEX_ID=$(sbatch --parsable \
--dependency=afterok:${ALIGN_ID} \
index_array.sh)
REPORT_ID=$(sbatch --parsable \
--dependency=afterok:${INDEX_ID} \
report_job.sh)
echo "ALIGN=${ALIGN_ID} INDEX=${INDEX_ID} REPORT=${REPORT_ID}"
squeue --me -n alignArray,indexArray,arrayReportAfter all jobs have completed, inspect the generated summary file:
cat results/summary.txtHow many lines does your text file contain?
–dependency=afterok:
If the parent job fails, the dependent jobs remain pending and are marked with an unsatisfied dependency.
Open a tmux session and start an interactive Bash job using srun with a duration of 2 minute. After launching the job, detach from the tmux session and use squeue command to check how the job is listed. Finally, once the job time has elapsed, reconnect if needed and terminate the session/job.
srun –cpus-per-task=1 –time=00:02:00 –pty bash
Onces, you have run the sbatch jobs and are completed. Looks at the status information about historical jobs.
# all jobs from this session
sacct --user=ucloud
# display info about a specific job
sacct -j <JOBID>What is the State of all your jobs?