Job Scripts
To submit a job to the cluster requires a job script. This is a shell script which uses special commands to define the task to be run and the resources required. The commands in the script execute programs as if they are being run through a terminal. As the job will be run by the queuing system, the programs must execute without requiring interaction with the user (so you cannot run a graphical application from a job script).
Writing a job-submission script
Create a new text file for your script (you can use any editor, such as Mousepad, vim, emacs, nano). However, it is important that the file is saved as a plain text file. It is convenient to use a file extension, such as
.slr, to remind us that it is a SLURM job script – for example,myjob.slr.The first line should contain the command interpreter to use (usually Bash)
#!/bin/bashEach line of the script contains commands to run as part of the job. For example, the following would execute a minimal Python program:
python3 -c 'print("Hello World!")'
Add comments to your script to remind you what commands and parameters you used and why:
# This is a commentSpecify the hardware resources required for the job:
#SBATCH --ntasks=1 #SBATCH --mem=8000MB
The
--ntasksparameter indicates how many CPU cores on the compute node should be used to run the task. If your program is not parallelised, this should be 1. The--memparameter specifies the total amount of memory (per node) required for the job. Specifying resources requires some prior knowledge or experience of the task and its requirements.Specify the time required for the job. The following command would request 6 hours.
#SBATCH --time=06:00:00The
--timeparameter specifies the expected time the job will take to run (wall time). If your job exceeds this wall time, it will be terminated.Specify the cluster partition (queue) to use. This should be compatible with the requested job time (use
sinfoto see the maximum job times for each partition).#SBATCH --partition=shortAdd any other option SLURM parameters. For example, to specify a name for the job, to make it easier to differentiate between multiple jobs in the queue, you can use:
#SBATCH --job-name=myjob1
Complete script
An example of the complete job script, showing the essential components, is given below:
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --mem=8000MB
#SBATCH --time=06:00:00
#SBATCH --job-name=myjob1
#SBATCH --partition=short
python3 -c 'print("Hello World!")'
Note
The current working directory will initially be the path from where the job script is submitted.
Specifying accounts
By default all jobs run under the aero_general account. To access some resources, your username must be associated with a different account and this account must be specified in the SLURM script used to run jobs requiring those resources. For example:
#SBATCH --account=my_account_name
where my_account_name should be updated appropriately. To see which accounts you are associated with, run:
sacctmgr show association user=myuserid
substituting your username for myuserid.
GPU jobs
To request GPU resources, you must specify a generic resource (GRES) request. This has the form:
#SBATCH --gres=gpu:GPUTYPE:NUM
You should replace GPUTYPE with the type of GPU you wish to use:
nvidia_a40for NVIDIA A40 GPUstesla_t4for NVIDIA T4 GPUs
You should also replace NUM with the number of GPUs required. This should almost always be 1.