Getting Started with Slurm
The Slurm batch-queueing system provides the mechanism by which all jobs are submitted to the Argo Cluster and are scheduled to run on the compute nodes. Users cannot access the nodes directly unless Slurm has already started a job on that node first. Generally you will only log in to the login nodes.
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. It has similarities with the Sun Grid Engine scheduler and the Univa Grid Engine Scheduler, which were used previously on Argo. It also shares similarities with the open source resource manager TORQUE. So if you've used other clusters, much of what you know will transfer.
Here we give basic examples of how to submit jobs to the cluster. An
overview of all the commands that are part of Slurm can be found by
typing man slurm
at the Argo prompt
and visiting the Slurm
documentation page.
Preparing your Data
Information on how you can copy your data from your local machine to the
cluster can be found on the page Uploading
Data. Please note that the /home directory
is mounted as read-only on the compute nodes, which means that you
cannot write data to your home directory while running a job. A scratch
directory has been created for you in order to store job results for a
short period of time. The location of your scratch directory is
/scratch/
/
$SCRATCH
.
Available Partitions on the Cluster
Please see the table below for partition equivalence.
HOPPER
Hopper Partition | Timelimit | Allowed QOS |
---|---|---|
gpuq | 3-00:00:00 | gpu |
normal | 5-00:00:00 | All |
bigmem | 5-00:00:00 | All |
gpuq-contrib | 3-00:00:00 | hantil, ksun |
contrib | 5-00:00:00 | qtong, normal |
interactive | 0-12:00:00 | interactive |
Argo
Argo Partition | Timelimit | Allowed QoS |
---|---|---|
gpuq | 5-00:00:00 | All |
all-LoPri | 5-00:00:00 | All |
all-HiPri | 12:00:00 | All |
bigmem-HiPri | 12:00:00 | All |
bigmem-LoPri | 5-00:00:0 | All |
all-long | 10-00:00:0 | All |
bigmem-long | 10-00:00:0 | All |
contrib | 7-00:00:0 | contrib |
Submitting a Job
In batch mode with Slurm scripts
Jobs are submitted using the sbatch
command. This command has many options and a summary of these can be
found in the sbatch
documentation. The various sbatch
options along with the program to be run should be put inside a single
bash script and passed to sbatch
as
shown below:
> sbatch myscript.slurm
Note:This command must be run from one of the two login nodes.
Here is a sample script called myscript.slurm
that shows how to
set some common sbatch
parameters.
#!/bin/sh
## Give your job a name to distinguish it from other jobs you run.
#SBATCH --job-name=<MyJobName>
## General partitions: all-HiPri, bigmem-HiPri -- (12 hour limit)
## all-LoPri, bigmem-LoPri, gpuq (5 days limit)
## Restricted: CDS_q, CS_q, STATS_q, HH_q, GA_q, ES_q, COS_q (10 day limit)
#SBATCH --partition=<PartitionName>
## Separate output and error messages into 2 files.`
## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID`
#SBATCH --output=/scratch/%u/%x-%N-%j.out # Output file`
#SBATCH --error=/scratch/%u/%x-%N-%j.err # Error file`
## Slurm can send you updates via email
#SBATCH --mail-type=BEGIN,END,FAIL # ALL,NONE,BEGIN,END,FAIL,REQUEUE,..
#SBATCH --mail-user=<GMUnetID>@gmu.edu # Put your GMU email address here
## Specify how much memory your job needs. (2G is the default)
#SBATCH --mem=<X>G # Total memory needed per task (units: K,M,G,T)
## Specify how much time your job needs. (default: see partition above)
#SBATCH --time=<D-HH:MM> # Total time needed for job: Days-Hours:Minutes
## Load the relevant modules needed for the job
module load <module_name>
## Run your program or script
<command(s) to run your program>
Note 2:In the above script if "--mem=MEM_VALUE" is not used then the default memory configuration of 2G per core will be used. If you asked for 2 cores per node, then 4G per node will be assigned. If you need more than the default memory, then "--mem-per-cpu" or --"mem" option should be used. "--mem" and "--mem-per-cpu" are mutually exclusive.
You can find a more detailed version of this script here: Template.slurm, which was designed to be a template for writing general Slurm submissions scripts. It contains a number of optional commands, along with comments about how you might customize if for writing your own slurm submission script.
Interactively with `salloc'
If you wanted to instead work with the same resources, but interactively, this is also doable with the salloc
command.
The parameters that are usually specified in a batch script can be
combined in a single line with salloc
. Slurm will allocate you the specified resources and connect you directly to a compute node.
salloc
Will start an interactive session with the default 1 CPU, 2GB memory in the 'interactive' partition. To be more specific with the resources you need, you can add them in the command with the correct Slurm parameter:
salloc -p normal --nodes=1 --ntasks-per-node=12 --mem=5GB --time=0-0:30:00
The session on the compute node will persist until you either hit the set time limit or type in:
exit
Monitoring And Controlling Jobs
There are 4 main slurm commands which are used to monitor and control
jobs submitted to slurm. Here we go over them briefly. For all these
commands, typing in command
--help
will show the relevant options
they take and their usage.
squeue
The "squeue" command is used to view information about jobs in the Slurm scheduling queue. There are various options, among them the following are commonly used:
squeue # shows all users jobs (running as well as pending)
squeue -u userID1{,userID2...} # shows jobs for users in the list of comma separated user ids
squeue -j jobID1{,jobID2...} # shows jobs according to the list of comma separated job ids
sacct
This command is useful for giving you a summary of your recently launched jobs. Some options include:
sacct # displays detailed info of all jobs today (running, pending or finished)
sacct -X # displays summary of jobs today (running, pending, or finished)
sacct -X -S <Month/Day> # displays summary all jobs since a certain date
More info about the sacct can be found in the sacct documentation.
scancel
This command use to cancel a running/queued job.
scancel jobID # kills the job as specified by the jobID
There are various options to restrict the usage of scancel
. Please see
the scancel documentation for
details.
sstat
This command is used to display various status information of a running job/step. Key options include:
sstat jobID
Note: If no steps are associated with your job you will get sstat: error: no steps running for job jobID`
More info about sstat
can be found
in the sstat documentation.
sview
This command is used to open a GUI which can be used to monitor and
change the scheduler control options. However, some of the changes that
can be made are restricted to system administrators only (such as
creating new partition or editing existing partitions). Users need to
log into Argo using the "-Y" option (i.e., xserver) in order load the
sview GUI (see Logging into Argo page
for details). There is a command line alternative to
sview
which is called
sinfo
to get information about
various Slurm configuration settings. Please see the
sinfo documentation for details.
scontrol
This command, among other things, can be used to control job parameters of running/queued jobs. Some of the main options are shown below:
scontrol hold jobID1{ jobID2...} # removes pending jobs from queue, won't stop already running jobs
scontrol release jobID1{ jobID2...} # requeues jobs that were held by the previous command
scontrol suspend jobID
and scontrol resume jobID
can be
used to suspend and resume currently running jobs. Another useful way to
check job status/configuration is using the following command:
scontrol show job jobID
This will give detailed info about the running/pending jobID. If changes need to be made to a pending job, the user can use the following command:
scontrol update <jobSpecification>
scontrol show job 81
JobId=81 Name=Sleeper
GroupId=argo-user(242)
...
scontrol update JobID=81 Name=WakeUp
Note: Not all parameters can be changed (for example, the number of cpus used). Please see the scontrol documentation for details.
More Advanced Job Options
Running Parallel Jobs
To see how to run distributed/shared memory jobs (like MPI, Pthread etc) please see the How to Run Parallel Jobs on Argo page.
Running Job Arrays
Job arrays are useful when you want to run the same program or script over and over again, with only minor differences between each run. Examples include: running a stochastic simulation with a different random number seeds, or running a parameter sweep with minor changes to the parameters from one run to the next. Your program can retrieve a unique number associated with each particular run, and then use that number to derive the parameter values that you feel are most appropriate.
The basic form of the job array command is as follows:
#SBATCH --array= <start>-<end>
Where
These example programs (cpp_example_array_job, java_example_array_job, python_example_array_job, job_submission_script) are shown where the environment variable $SLURM_ARRAY_TASK_ID is used. Note that aside from the "--array" command, the rest of the job_submission_script describes the parameters and resources needed for a single run of the job.
There are a number of other forms and options that can be used with job arrays:
...
#Job arrays run multiple times
#Run this job n times starting at index 1
#SBATCH --array=<Job_Idx_Format>%<Job_Exe_Limit>
...
1. #SBATCH --array=1-30
2. #SBATCH --array=1,2,3,4
3. #SBATCH --array=1-30:2
Additionally you can limit how many jobs will run simultaneously by
specifying the Job_Exe_Limit
as
shown below:
#SBATCH --array=1-31:2%4
In the above example, only 4 jobs run at a time; each will have odd indices since we are incrementing by 2. The "%4" tells the scheduler to run only 4 jobs at a time. The Slurm workload manager will set up an environment variable $SLURM_ARRAY_TASK_ID which will assume values from what is given to the --array option. For example, if --array=1-4 is specified, 4 jobs will run and $SLURM_ARRAY_TASK_ID will assume values 1 through 4. To utilize the $SLURM_ARRAY_TASK_ID variable a method will be required from the used programming language that can provide access to the environment variable.
The default naming of the output and the error file of job arrays are "slurm-%A_%a.out" and "slurm-%A_%a.err" respectively. Here %A is going to be replaced by the jobID and %a is the array indices of one of the job instance. Please do not use %j in the output and error specification as this will force the creation of a new jobID for each task with is confusing and unnecessary.
To cancel one or more job arrays user can do the following:
scancel jobID #if jobID represents an arry job then this will cancel all its instances`
scancel jobID_idx #cancels only the instance with indices idx`
scancel jobID_[idx-idy] #cancels all instaces from idx to idy`
scontrol
command to modify/hold/update individual instances of jobs of a job
array. Details of which can be found
here.
Common Slurm Environment Variables
Variable | Description |
---|---|
$SLURM_JOB_ID | The Job ID |
$SLURM_JOBID | Deprecated. Same as $SLURM_JOB_ID |
$SLURM_SUBMIT_DIR | The path of the job submission directory. |
$SLURM_SUBMIT_HOST | The hostname of the node used for job submission. |
$SLURM_JOB_NODELIST | Contains the definition (list) of the nodes that is assigned to the job. |
$SLURM_NODELIST | Deprecated. Same as SLURM_JOB_NODELIST. |
$SLURM_CPUS_PER_TASK | Number of CPUs per task. |
$SLURM_CPUS_ON_NODE | Number of CPUs on the allocated node. |
$SLURM_JOB_CPUS_PER_NODE | Count of processors available to the job on this node. |
$SLURM_CPUS_PER_GPU | Number of CPUs requested per allocated GPU. |
$SLURM_MEM_PER_CPU | Memory per CPU. Same as --mem-per-cpu . |
$SLURM_MEM_PER_GPU | Memory per GPU. |
$SLURM_MEM_PER_NODE | Memory per node. Same as --mem . |
$SLURM_GPUS | Number of GPUs requested. |
$SLURM_NTASKS | Same as -n, --ntasks. The number of tasks. |
$SLURM_NTASKS_PER_NODE | Number of tasks requested per node. |
$SLURM_NTASKS_PER_SOCKET | Number of tasks requested per socket. |
$SLURM_NTASKS_PER_CORE | Number of tasks requested per core. |
$SLURM_NTASKS_PER_GPU | Number of tasks requested per GPU. |
$SLURM_NPROCS | Same as -n, --ntasks. See $SLURM_NTASKS. |
$SLURM_NNODES | Total number of nodes in the job’s resource allocation. |
$SLURM_TASKS_PER_NODE | Number of tasks to be initiated on each node. |
$SLURM_ARRAY_JOB_ID | Job array’s master job ID number. |
$SLURM_ARRAY_TASK_ID | Job array ID (index) number. |
$SLURM_ARRAY_TASK_COUNT | Total number of tasks in a job array. |
$SLURM_ARRAY_TASK_MAX | Job array’s maximum ID (index) number. |
$SLURM_ARRAY_TASK_MIN | Job array’s minimum ID (index) number. |
A full list of environment variables for Slurm can be found by visiting the Slurm page on environment variables