Job Scripts

To submit a job to the cluster requires a job script. This is a shell script which uses special commands to define the task to be run and the resources required. The commands in the script execute programs as if they are being run through a terminal. As the job will be run by the queuing system, the programs must execute without requiring interaction with the user (so you cannot run a graphical application from a job script).

Writing a job-submission script

  1. Create a new text file for your script (you can use any editor, such as Mousepad, vim, emacs, nano). However, it is important that the file is saved as a plain text file. It is convenient to use a file extension, such as .slr, to remind us that it is a SLURM job script – for example, myjob.slr.

    The first line should contain the command interpreter to use (usually Bash)

    #!/bin/bash
    

    Each line of the script contains commands to run as part of the job. For example, the following would execute a minimal Python program:

    python3 -c 'print("Hello World!")'
    

    Add comments to your script to remind you what commands and parameters you used and why:

    # This is a comment
    
  2. Specify the hardware resources required for the job:

    #SBATCH --ntasks=1
    #SBATCH --mem=8000MB
    

    The --ntasks parameter indicates how many CPU cores on the compute node should be used to run the task. If your program is not parallelised, this should be 1. The --mem parameter specifies the total amount of memory (per node) required for the job. Specifying resources requires some prior knowledge or experience of the task and its requirements.

  3. Specify the time required for the job. The following command would request 6 hours.

    #SBATCH --time=06:00:00
    

    The --time parameter specifies the expected time the job will take to run (wall time). If your job exceeds this wall time, it will be terminated.

  4. Specify the cluster partition (queue) to use. This should be compatible with the requested job time (use sinfo to see the maximum job times for each partition).

    #SBATCH --partition=short
    
  5. Add any other option SLURM parameters. For example, to specify a name for the job, to make it easier to differentiate between multiple jobs in the queue, you can use:

    #SBATCH --job-name=myjob1
    

Complete script

An example of the complete job script, showing the essential components, is given below:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --mem=8000MB
#SBATCH --time=06:00:00
#SBATCH --job-name=myjob1
#SBATCH --partition=short
python3 -c 'print("Hello World!")'

Note

The current working directory will initially be the path from where the job script is submitted.

Specifying accounts

By default all jobs run under the aero_general account. To access some resources, your username must be associated with a different account and this account must be specified in the SLURM script used to run jobs requiring those resources. For example:

#SBATCH --account=my_account_name

where my_account_name should be updated appropriately. To see which accounts you are associated with, run:

sacctmgr show association user=myuserid

substituting your username for myuserid.

GPU jobs

To request GPU resources, you must specify a generic resource (GRES) request. This has the form:

#SBATCH --gres=gpu:GPUTYPE:NUM

You should replace GPUTYPE with the type of GPU you wish to use:

  • nvidia_a40 for NVIDIA A40 GPUs

  • tesla_t4 for NVIDIA T4 GPUs

You should also replace NUM with the number of GPUs required. This should almost always be 1.