Skip to content

Getting Started with SLURM

The Slurm batch-queueing system provides the mechanism by which all jobs are submitted to the ARGO Cluster and are scheduled to run on the compute nodes. Users cannot access the nodes directly unless Slurm has already started a job on that node first. Generally you will only log in to the login nodes.

The Simple Linux Utility for Resource Management (Slurm) is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. It has similarities with the Sun Grid Engine scheduler and the Univa Grid Engine Scheduler, which were used previously on ARGO. It also shares similarities with the open source resource manager TORQUE. So if you've used other clusters, much of what you know will transfer.

Here we give basic examples of how to submit jobs to the cluster. An overview of all the commands that are part of Slurm can be found by typing man slurm at the ARGO prompt and visiting the Slurm documentation page.

Preparing your Data

Information on how you can copy your data from your local machine to the cluster can be found on the page Uploading Data. Please note that the /home directory is mounted as read-only on the compute nodes, which means that you cannot write data to your home directory while running a job. A scratch directory has been created for you in order to store job results for a short period of time. The location of your scratch directory is /scratch//. For your convenience this path is also stored in the environment variable $SCRATCH.

Submitting a Job

Jobs are submitted using the sbatch command. This command has many options and a summary of these can be found in the sbatch documentation. The various sbatch options along with the program to be run should be put inside a single bash script and passed to sbatch as shown below:

> sbatch myscript.slurm

Note: This command must be run from one of the two login nodes!

Here is a sample script called myscript.slurm that shows how to set some common sbatch parameters.

#!/bin/sh

## Give your job a name to distinguish it from other jobs you run.
#SBATCH --job-name=<MyJobName>

## General partitions: all-HiPri, bigmem-HiPri   --   (12 hour limit)
##                     all-LoPri, bigmem-LoPri, gpuq  (5 days limit)
## Restricted: CDS_q, CS_q, STATS_q, HH_q, GA_q, ES_q, COS_q  (10 day limit)
#SBATCH --partition=<PartitionName>

## Separate output and error messages into 2 files.`
## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID`
#SBATCH --output=/scratch/%u/%x-%N-%j.out  # Output file`
#SBATCH --error=/scratch/%u/%x-%N-%j.err   # Error file`

## Slurm can send you updates via email
#SBATCH --mail-type=BEGIN,END,FAIL         # ALL,NONE,BEGIN,END,FAIL,REQUEUE,..
#SBATCH --mail-user=<GMUnetID>@gmu.edu     # Put your GMU email address here

## Specify how much memory your job needs. (2G is the default)
#SBATCH --mem=<X>G        # Total memory needed per task (units: K,M,G,T)

## Specify how much time your job needs. (default: see partition above)
#SBATCH --time=<D-HH:MM>  # Total time needed for job: Days-Hours:Minutes


## Load the relevant modules needed for the job
module load <module_name>

## Run your program or script
<command(s) to run your program>
NOTE 1: You must modify some of the in the script above and replace them with appropriate values for the script to work (including the < and >)!

NOTE 2: In the above script if "--mem=MEM_VALUE" is not used then the default memory configuration of 2G per core will be used. If you asked for 2 cores per node, then 4G per node will be assigned. If you need more than the default memory, then "--mem-per-cpu" or --"mem" option should be used. "--mem" and "--mem-per-cpu" are mutually exclusive.

You can find a more detailed version of this script here: Template.slurm, which was designed to be a template for writing general Slurm submissions scripts. It contains a number of optional commands, along with comments about how you might customize if for writing your own slurm submission script.

Using srun to run jobs

srun can be used to run parallel jobs on a cluster managed by Slurm. The parameters that are usually specified in a batch script can be combined in a single line with srun. For example, to run a threaded job the following format can be used where the Python script "my_py3_script" is expected to spawn two threads:

srun --ntasks=1 --cpus-per-task=2 --output=/scratch/%u/slurm-%N-%j.out python my_py3_script.py

srun will first create a resource allocation in which to run the job and then run the script my_py3_script.py. An example to run an MPI job:

srun --nodes=2 --ntasks-per-node=4 --output=/scratch/%u/slurm-%N-%j.out ./my_mpi_program

In the above example, the srun will first allocate a total of 8 processes on 2 nodes through Slurm and execute the MPI-based parallel program. The srun command should only be used on login nodes.

Monitoring And Controlling Jobs

There are 4 main slurm commands which are used to monitor and control jobs submitted to slurm. Here we go over them briefly. For all these commands, typing in command--help will show the relevant options they take and their usage.

squeue

The "squeue" command is used to view information about jobs in the Slurm scheduling queue. There are various options, among them the following are commonly used:

squeue                            # shows all users jobs (running as well as pending)
squeue -u userID1{,userID2...}    # shows jobs for users in the list of comma separated user ids
squeue -j jobID1{,jobID2...}      # shows jobs according to the list of comma separated job ids
Information about the other options can be found in squeue documentation.

sacct

This command is useful for giving you a summary of your recently launched jobs. Some options include:

sacct                             # displays detailed info of all jobs today (running, pending or finished)
sacct -X                          # displays summary of jobs today (running, pending, or finished)
sacct -X -S <Month/Day>           # displays summary all jobs since a certain date

More info about the sacct can be found in the sacct documentation.

scancel

This command use to cancel a running/queued job.

scancel jobID                     # kills the job as specified by the jobID

There are various options to restrict the usage of scancel. Please see the scancel documentation for details.

sstat

This command is used to display various status information of a running job/step. Key options include:

sstat  jobID

Note: If no steps are associated with your job you will get sstat: error: no steps running for job jobID

More info about sstat can be found in the sstat documentation.

sview

This command is used to open a GUI which can be used to monitor and change the scheduler control options. However, some of the changes that can be made are restricted to system administrators only (such as creating new partition or editing existing partitions). Users need to log into ARGO using the "-Y" option (i.e., xserver) in order load the sview GUI (see Logging into ARGO page for details). There is a command line alternative to sview which is called sinfo to get information about various Slurm configuration settings. Please see the sinfo documentation for details.

scontrol

This command, among other things, can be used to control job parameters of running/queued jobs. Some of the main options are shown below:

scontrol hold jobID1{ jobID2...}       # removes pending jobs from queue, won't stop already running jobs
scontrol release jobID1{ jobID2...}    # requeues jobs that were held by the previous command
Similarly, scontrol suspend jobID and scontrol resume jobID can be used to suspend and resume currently running jobs. Another useful way to check job status/configuration is using the following command:

scontrol show job jobID

This will give detailed info about the running/pending jobID. If changes need to be made to a pending job, the user can use the following command:

scontrol update <jobSpecification>
Where is in the same format as the output generated by the show command, which can be used to send to the update command. For example, below is a sample job configuration which was the output of the show command:

scontrol show job 81
JobId=81 Name=Sleeper
GroupId=argo-user(242)
...
If we wish to change the name of the job, we can use scontrol as follows:

scontrol update JobID=81 Name=WakeUp

Note: not all parameters can be changed (for example, the number of cpus used). Please see the scontrol documentation for details.

More Advanced Job Options

Running Parallel Jobs

To see how to run distributed/shared memory jobs (like MPI, Pthread etc) please see the How to Run Parallel Jobs on ARGO page.

Running Job Arrays

Job arrays are useful when you want to run the same program or script over and over again, with only minor differences between each run. Examples include: running a stochastic simulation with a different random number seeds, or running a parameter sweep with minor changes to the parameters from one run to the next. Your program can retrieve a unique number associated with each particular run, and then use that number to derive the parameter values that you feel are most appropriate.

The basic form of the job array command is as follows:

#SBATCH --array= <start>-<end>

Where and are integer numbers. This will cause your script to launch a series of sub-jobs, one for each value in the range from start to end. These numbers are called "array task IDs", and for any given sub-job, this number is stored in a system environment variable called $SLURM_ARRAY_TASK_ID. Your slurm script and your program can retrieve the array task ID associated with the current sub-job, and then use that information to set parameters. There are two basic approaches that can be used to retrieve the array task ID. The first is for your slurm submission script to pass it as a command-line parameter to your program, and the second is to have your program read this variable directly.

These example programs (cpp_example_array_job, java_example_array_job, python_example_array_job, job_submission_script) are shown where the environment variable $SLURM_ARRAY_TASK_ID is used. Note that aside from the "--array" command, the rest of the job_submission_script describes the parameters and resources needed for a single run of the job.

There are a number of other forms and options that can be used with job arrays:

...
#Job arrays run multiple times
#Run this job n times starting at index 1 
#SBATCH --array=<Job_Idx_Format>%<Job_Exe_Limit>
...
Here the specifies the indices which can be in the following formats:

1. #SBATCH --array=1-30
2. #SBATCH --array=1,2,3,4
3. #SBATCH --array=1-30:2
In the third option we increment the indices by 2 at a time. Additionally you can limit how many jobs will run simultaneously by specifying the Job_Exe_Limit as shown below:

#SBATCH --array=1-31:2%4

In the above example, only 4 jobs run at a time; each will have odd indices since we are incrementing by 2. The "%4" tells the scheduler to run only 4 jobs at a time. The SLURM workload manager will set up an environment variable $SLURM_ARRAY_TASK_ID which will assume values from what is given to the --array option. For example, if --array=1-4 is specified, 4 jobs will run and $SLURM_ARRAY_TASK_ID will assume values 1 through 4. To utilize the $SLURM_ARRAY_TASK_ID variable a method will be required from the used programming language that can provide access to the environment variable.

The default naming of the output and the error file of job arrays are "slurm-%A_%a.out" and "slurm-%A_%a.err" respectively. Here %A is going to be replaced by the jobID and %a is the array indices of one of the job instance. Please do not use %j in the output and error specification as this will force the creation of a new jobID for each task with is confusing and unnecessary.

To cancel one or more job arrays user can do the following:

scancel jobID            #if jobID represents an arry job then this will cancel all its instances`
scancel jobID_idx        #cancels only the instance with indices idx`
scancel jobID_[idx-idy]  #cancels all instaces from idx to idy`
Similarly, one can use the scontrol command to modify/hold/update individual instances of jobs of a job array. Details of which can be found here.

See Also