Using SBATCH to Submit Jobs¶

SBATCH is a non-blocking command, meaning there is not a circumstance where running the command will cause it to hold. Even if the resources requested are not available, the job will be thrown into the queue and will start to run once resources become available. The status of a job can be seen using squeue while it is pending or running and sacct at any time.

squeue --me
sacct -j YOUR_JOBID

SBATCH is based around running a single file. That being said, you shouldn't need to specify any parameters in the command other than sbatch <batch file>, because you can specify all parameters in the command inside the file itself.

The following is an example of a batch script. Please note that the top of the script must start with #!/bin/bash (or whatever interpreter you need, if you don't know, use bash), and then immediately follow with #SBATCH <param> parameters. An example of common SBATCH parameters and a simple script is below, this script will allocate 4 CPUs and one GPU in the GPU partition.

#!/bin/bash
#SBATCH -c 4  # Number of Cores per Task
#SBATCH --mem=8192  # Requested Memory
#SBATCH -p gpu  # Partition
#SBATCH -G 1  # Number of GPUs
#SBATCH -t 01:00:00  # Job time limit
#SBATCH -o slurm-%j.out  # %j = job ID

module load cuda/10
/modules/apps/cuda/10.1.243/samples/bin/x86_64/linux/release/deviceQuery

This script should query the available GPUs, and print only one device to the specified file. Feel free to remove/modify any of the parameters in the script to suit your needs.

If you need to run the same type of job over many inputs, different parameters, or even just some number of iterations, you should look into using Job Arrays to simplify your workflow.

Email¶

Slurm can send you emails based on the status of your job via the --mail-type argument.

Common mail types are BEGIN, END, FAIL, INVALID_DEPEND, and REQUEUE. See the sbatch man page

Example:

srun --mail-type=BEGIN hostname

or:

#!/bin/bash
#SBATCH --mail-type=BEGIN
hostname

There is also the --mail-user argument, but this is optional. Our mail server knows the email you used to register your Unity account.

Time Limit Email - Preventing Loss of Work¶

When your job reaches its time limit, it will be killed, even if it's 99% of the way through its task. Without checkpointing, all those CPU hours will be for nothing and you will have to schedule the job all over again.

One way to prevent this is to check on your job's output as it approaches its time limit. You can specify --mail-type=TIME_LIMIT_80, and Slurm will email you if 80% of the time limit has passed and your job is still running. Then you can check on the job's output and determine if it will finish in time. If you think that your job will not finish in time, you can email us at hpc@umass.edu and we can extend your job's time limit.