Running advanced jobs on the HEC
The following sections highlight some of the more advanced features of the SGE commands. For an exhaustive list of all the features offered, please see the relevant online manual pages by logging on to the HEC and typing:
This will display the man page for the specified command.
The job scheduler cannot predict how much memory your jobs will consume, so it is important for you to declare the amount required so that the scheduler can make sure that it is sent to a compute node with enough memory to support it. Jobs over 500M in size are classed as large memory jobs and must specify their memory requirements. It's important to declare memory usage as closely as possible - specifying far more memory than a job requires means that the scheduler reserves that memory for you whether you use it or not, resulting in it not being available to other users. Bear in mind that in order to guard against jobs with run-away memory leaks which might adversely affect other users, the scheduler will terminate any job which exceeds its requested memory requirement. As a rule of thumb: request slightly more memory than your job requires, but not too much.
For jobs greater than 500 megabytes, calculate the amount of memory your job requires as closely as possible and submit the job with:
Where X is the amount memory to be reserved along with the unit, either M (megabytes) or G (gigabytes). E.g. 700M means 700 megabytes, 1.5G means 1.5 gigabytes. NOTE: Not specifying a unit may cause your job to crash.
Alternatively, you can add a similar directive directly into the job script by adding the line:
So a job which require 1.5 gigabytes would have the following in the job script:
For workloads such as Monte Carlo simulations and parameter studies, it is often necessary to run the same program multiple times, often with slightly different input parameters. Rather than create a unique job file for each run and submit each of them separately, SGE offers an array job option. When combined with a tailored job script, this allows you to submit multiple similar jobs with a single command.
Array jobs can be submitted by adding the -t directive to the job script, as in the following example:
#$ -q serial
#$ -N myjob
#$ -t 4-10:2
echo Job task $SGE_TASK_ID
./my_program < input.$SGE_TASK_ID.dat
This submits the job script myjob.com as a number of tasks, each task having its own unique index number. This index number can be used by the job script to perform slightly different actions each time, e.g. reading from a different input file (as in the above example), or passing a different set of parameters to your program for each task.
The number of tasks and the values of the task index numbers are controlled by the extra arguments following the -t directive. The format is x-y:z, where x is the first index number, y the last, and the optional :z gives the step increment. The above example submits the job script 4 times, with index numbers of 4, 6, 8 and 10 (ie, first = 4, last = 10, step = 2). NOTE: Index numbers must always be positive integers.
The index number is available to the job script via the environment variable $SGE_TASK_ID, and can be used by the job script to alter what exactly is run for each task. In the above example, it is used to change the input file sent to the user application my_program. Successive tasks will read input.4.dat, input.6.dat, input.8.dat and input.10.dat
The standard output and standard error files for each task will be unique; by default SGE will name the output file using the job name, the job ID, and the task ID.
Managing job arrays
Once you get the hang of writing flexible job scripts, job arrays make job submission much easier. They also make job management easier too. All tasks within the same job are given a different index number, but all have the same job id. An example output from qstat for an array job is given below. As with the job submission script format, each task's index number appears in ja-tak-ID field:
job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 204 0.50000 myjob testuser r 07/31/2013 10:48:02 firstname.lastname@example.org 1 1 204 0.50000 myjob testuser r 07/31/2013 10:48:02 email@example.com 1 2 204 0.50000 myjob testuser r 07/31/2013 10:48:02 firstname.lastname@example.org 1 3 204 0.50000 myjob testuser r 07/31/2013 10:48:02 email@example.com 1 4 204 0.50000 myjob testuser r 07/31/2013 10:48:02 firstname.lastname@example.org 1 5 204 0.50000 myjob testuser r 07/31/2013 10:48:02 email@example.com 1 6 204 0.50000 myjob testuser r 07/31/2013 10:48:02 firstname.lastname@example.org 1 7 204 0.50000 myjob testuser r 07/31/2013 10:48:02 email@example.com 1 8 204 0.50000 myjob testuser qw 07/31/2013 10:48:01 1 9-100:1
Note that tasks still queued-and-waiting are listed together on a single line. In the above example, tasks 1 through 8 are running, while tasks 9 through 100 are still waiting to run. If you want to stop all the tasks of job ID 204 at once, you can use the normal qdel command:
Note: For large job arrays, it may take several minutes to kill all jobs
If you want to stop individual jobs, you can suffix the job id with the individual task id. To stop just task ID 4 of job ID 204 above, we do:
To stop the tasks IDs 1-3 of job ID 204:
Managing short jobs
Care should be taken to avoid very short jobs - on the order of a few seconds to a few minutes - as these make very inefficient use of the cluster. It takes the system several seconds both to start and finish a job, and the scheduler itself works on 40 second cycles. Very short job therefore end up causing a lot of idle time on the system. To avoid this, consider bunching several short tasks together into a single job array element.
The example below gives a template for this type of solution. A job array originally of 10000 individual tasks each of which ran for only a few seconds has been converted into one containing just 10 tasks, with each task containing a loop to execute the next 1,000 tasks in sequence, depending on the job task ID it receives:
#$ -q serial
#$ -N myjob
#$ -t 1-9001:1000
echo Value received: $SGE_TASK_ID
echo Running $x to $y
for z in `seq $x $y`; do
echo Running task $z
myprogram < input.$z.data > output.$z.data
How it works: The -t job directive is set up the job array to run from 1 to 9001 in steps of 1000. This will result in 10 separate tasks, with SGE_TASK_ID containing values of 1, 1001, 2001, up to 9001. The shell variable x is set to the SGE_TASK_ID value, and shell variable y is set to the last value in the set of tasks to be run, using some simple shell arithmetic and the SGE_TASK_STEPSIZE shell variable, which is automatically set to the stepping size of current array (1000 in this case). The for loop sets shell variable z to all values between x and y in sequence using the standard unix tool seq. Each iteration of the loop will run myprogram using unique input and output files, based upon the value of z.
Note that while this example still produces 10,000 output files named output.$z.data for all values of z between 1 and 10,000, there are only 10 job tasks, so there will only be 10 stdout and stderr files.