Using SRUN to Submit Jobs¶

Note

Usually, if you have to run a single application multiple times, or if you are trying to run a non-interactive application, you should use sbatch instead of srun, since sbatch allows you to specify parameters in the file, and is non-blocking (see below).

SRUN is a so-called blocking command, as in it will not let you execute other commands until this command is finished (not necessarily the job, just the allocation). For example, if you run srun /bin/hostname and resources are available right away, the job will be sent out and the result saved into a file. If resources are not available, you will be stuck in the command while you are pending in the queue.

Please note that like sbatch, you can run a batch file using srun.

The command syntax is srun <options> [executable] <args>

Options is where you can specify the resources you want for the executable, or define. The following are some of the options available; to see all available parameters run man srun.

-c <num> Number of CPUs (threads) to allocate to the job per task
-n <num> The number of tasks to allocate (for MPI)
-G <num> Number of GPUs to allocate to the job
--mem <num>[K|M|G|T] Memory to allocate to the job (in MB by default)
-p <partition> Partition to submit the job to

To run an interactive job (in this case a bash prompt), the command might look like this (--pty is the important option):

srun -c 6 -p cpu --pty bash

To run an application on the cluster that uses a GUI, you must use an interactive job, in addition to the --x11 argument:

srun -c 6 -p cpu --pty --x11 xclock

Note

You cannot run an interactive/gui job using the sbatch command, you must use srun.

Email¶

Slurm can send you emails based on the status of your job via the --mail-type argument.

Common mail types are BEGIN, END, FAIL, INVALID_DEPEND, and REQUEUE. See the sbatch man page

Example:

srun --mail-type=BEGIN hostname

or:

#!/bin/bash
#SBATCH --mail-type=BEGIN
hostname

There is also the --mail-user argument, but this is optional. Our mail server knows the email you used to register your Unity account.

Time Limit Email - Preventing Loss of Work¶

When your job reaches its time limit, it will be killed, even if it's 99% of the way through its task. Without checkpointing, all those CPU hours will be for nothing and you will have to schedule the job all over again.

One way to prevent this is to check on your job's output as it approaches its time limit. You can specify --mail-type=TIME_LIMIT_80, and Slurm will email you if 80% of the time limit has passed and your job is still running. Then you can check on the job's output and determine if it will finish in time. If you think that your job will not finish in time, you can email us at hpc@umass.edu and we can extend your job's time limit.