Chapter 2 Submitting Batch Jobs

When running programs that take a very long time to complete, it’s impractical to wait for them to run on your local machine or cluster interactively. Instead, you can submit these programs as batch jobs to a High-Performance Computing (HPC) cluster. This tutorial will guide you through creating and submitting a SLURM job script to run a batch job on an HPC cluster. We will be using the sbatch command to submit the job.

  1. Create a SLURM Job Script:

A SLURM job script is a Bash script that contains directives for the SLURM workload manager. These directives specify resources such as the number of nodes, CPU cores, memory, job duration, and more. Below is a sample SLURM job script:

YourFileName.slurm:

#!/bin/bash
#SBATCH -N 3 # Requests 3 node for the job
#SBATCH -c 24 # Requests 24 CPU core
#SBATCH --mem-per-cpu=128G # Allocates 128 GB of memory per CPU core
#SBATCH --time=0-00:15:00 # 15 minutes
#SBATCH --output=my.stdout  # Directs the standard output to a file named "my.stdout"
#SBATCH --error=my.stderr # Directs the standard error to a file named "my.stderr"
#SBATCH --mail-user=abac123@case.edu # Specifies the email address to receive job notifications.
#SBATCH --mail-type=ALL # Sends email notifications for all events (job start, end, fail, etc.)
#SBATCH --job-name="just_a_test" # Names the job "just_a_test"

# Put commands for executing job below this line
# example:
module load Python 
python --version
  1. Save the Job Script:

Save the script with a .slurm extension. For example, save it as YourFileName.slurm.

  1. Access the HPC Cluster:

Connect to the HPC cluster using cluster/_pioneer Shell Access.

  1. Navigate to the Directory Containing Your SLURM Script:

Use the cd command to navigate to the directory where you saved YourFileName.slurm.

  1. Submit the SLURM Job Script:

Use the sbatch command to submit your job script to the SLURM scheduler:

sbatch YourFileName.slurm
  1. Monitor the Job: You can check the progress of the job in the Job/Active Jobs section.

  2. Check Job Output:

Once the job completes, check the output file (my.stdout in this example) for the results of your job. If the job failed, you can check the my.stderr file for the reason.