Compute grid¶
We have a small cluster on which you can launch tasks. This grid uses slurm, which is the same system used at the Digital Research Alliance of Canada (formerly Compute Canada). Our resources are much more limited and the system configuration is different. However, the usage is very similar and most of the information on the wiki's task submission page is valid. https://docs.computecanada.ca/wiki/Running_jobs/en
Computing Grid Rules¶
- Don’t run jobs on the slurm machine
- Ensure that you use the right amount of memory for your job
- Ensure that you use the right number of cores required by your job
- Ensure that your jobs can run for at least 10 minutes
- Use the right partition for your work
- optimum for jobs less than 2 days (10 machines)
- optimumlong for jobs less than 7 days (1 machine)
- You should use /scratch for heavy I/O
Server Nodes¶
Here is a brief description of the machines that are available in the compute grid.
- 11 Dell PowerEdge R740 machines
- 512GB of memory
- 2 Intel Xeon Gold 6258R CPU @ 2.70GHz (56 cores total)
Billing¶
In the system, there is a billing of the resources consumed. This billing does not incur any charges to users but is used to determine what each has used and calculate task priority.
So if for example you request 8 processors, wether you use 8 or 1, as they have been reserved for you, they will be calculated as being used and invoiced.
slurm¶
Task submission¶
For all slurm commands (squeue, sbatch, etc), you must connect by ssh to the slurm machine.
sbatch¶
Command to submit a job to the cluster. You can put the parameters in the submission script or pass them on the command line or do a combination of the 2.
To submit a job, simply do the following:
$ sbatch «slurm_script_filename»
Here are the most used parameters:
--cpus-per-task
: Number of processors that will be used by the program. It is important that this number matches what the program does. Some software like cplex and gurobi try to use all processors unless they are restricted. In this example, we would have cplex which will run 56 parallel threads which will all run on a single processor which is not the right way to do it.--mem
: The memory the program needs. If this memory is exceeded, the task is cancelled.--time
: the time required to complete the task. If this limit is exceeded, the task is cancelled.--output
: To specify the name of the file that will contain the output of the program. In this specific case, we send this output to /dev/null which has the effect that no trace remains. If you need this output be sure to minimize the amount of information displayed and change the name of the output file.--partition
: Partition to use. By default, this will be optimum. This partition allows tasks up to 2 days. long partition optimumlong allows tasks up to 7 days but, there is only one machine available in this group.--nodelist=
: Use this option if you need to specify particular compute nodes on which to run. Multiple nodes can be separated with commas or specified as a list such as optimum[01-03]
The settings you choose will be used to see what resources are available to launch the task. Although there is no limit as such, the more you ask, the longer you will wait because it will be more difficult to obtain these resources.
Here is examples of slurm scripts
This script can be used to submit 1 job to the system.
If we have more than 1 job to submit, you should use the array option. This way, you submit 1 job but they will still be executed as separate tasks.
In this exemple, when calculating the number of instances we get 8.
This is why we have the option --array=1-8
. The 2 numbers have to
match otherwise you'll get unexpected behavior.
To submit a Matlab program in the computing grid, to program should not a be in graphic mode and should provide all the matlab parameters in a file.
** Large number of jobs submission If you have to submit a large number of jobs to the slurm, you need to understand the following.
ATTENTION
Submitting large numbers of jobs cannot work if not done correctly, as one can overload the scheduler. Please use the arrays as in example2.
If you submit multiple jobs through a custom submitting script with a loop consider using pause of 1 seconds between each sbatch submit. And make sure that your jobs takes at least 5 to 10 minutes to run, in order to allow the scheduler to process all the work needed to set up, run scheduled jobs.
squeue¶
This command is used to see the jobs that are in the system. By default, if you don't give any options, you'll see everyone's tasks. You can use the `-u` option to specify your username. You can use the `SQUEUE_FORMAT` variable to change the default view of the command. For example at the alliance they use:
export SQUEUE_FORMAT="%.15i%.8u%.12a%.14j%.3t%.10L%.5D%.4C%.10b%.7m%N (%r)"
Example commands:
squeue -u username
squeue -u username -t RUNNING
squeue -u username -t PENDING
the first command displays jobs for the specified user, the second, only it's currently running jobs and the last the jobs that are still on queue.
scancel¶
Program to cancel one or more tasks. Example:
bash
scancel 1234
scancel -u user
sacct
This command allows to have more information on a specific task. Available fields can be displayed with the `-e` option and then used
sacct¶
This command can be used to get additionnal information on a specific
task. The available fields can be seen by using the -e
option and then
used with the --format
option.
You can also see the status of completed jobs if they are still in the database. To see older jobs, you can specify a start date.
sacct --starttime 2022-04-17 --format=Account,User,JobID,Start,End,AllocCPUS,Elapsed,AllocTRES%30,CPUTime,AveRSS,MaxRSS,MaxRSSTask,MaxRSSNode,NodeList,ExitCode,State%20
If you don't want to specify the format everytim, you can use the environment variable `SACCT_FORMAT`. The following format is used at the alliance.
export SACCT_FORMAT=Account,User,JobID,Start,End,AllocCPUS,Elapsed,AllocTRES%30,CPUTime,AveRSS,MaxRSS,MaxRSSTask,MaxRSSNode,NodeList,ExitCode,State%20
sinfo¶
This command allows you to see the state of the calculation grid and the maximum durations of the partitions. You can see for example if there are machines down, if there are free machines or if machines have been suspended from the system for maintenance.
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
optimum* up 2-00:00:00 1 mix optimum01
optimum* up 2-00:00:00 9 idle optimum[02-10]
optimumlong up 7-00:00:00 1 idle optimum11
In this example, we see that all the machines are available and only 1 has tasks.
- idle: machines are fully available
- mix: machines are used partly
- drain: machines are under maintenance
sstat¶
Similar to sacct but only for currrently running jobs and only works on your own jobs.
/scratch¶
There's temporary directory /scratch available on the optimum machines as well as on the slurm frontend. You can use this space to copy your datasets before submitting your jobs.
This space has no backup so don't use it to store files you can't afford to lose.
Please use this space reasonably.