Getting Started with Slurm

The Slurm batch-queueing system provides the mechanism by which all jobs are submitted to the Argo Cluster and are scheduled to run on the compute nodes. Users cannot access the nodes directly unless Slurm has already started a job on that node first. Generally you will only log in to the login nodes.

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. It has similarities with the Sun Grid Engine scheduler and the Univa Grid Engine Scheduler, which were used previously on Argo. It also shares similarities with the open source resource manager TORQUE. So if you've used other clusters, much of what you know will transfer.

Here we give basic examples of how to submit jobs to the cluster. An overview of all the commands that are part of Slurm can be found by typing man slurm at the Argo prompt and visiting the Slurm documentation page.

Preparing your Data

Information on how you can copy your data from your local machine to the cluster can be found on the page Uploading Data. Please note that the /home directory is mounted as read-only on the compute nodes, which means that you cannot write data to your home directory while running a job. A scratch directory has been created for you in order to store job results for a short period of time. The location of your scratch directory is /scratch//. For your convenience this path is also stored in the environment variable $SCRATCH.

Available Partitions on the Cluster

Please see the table below for partition equivalence.

HOPPER

Hopper Partition	Timelimit	Allowed QOS
gpuq	3-00:00:00	gpu
normal	5-00:00:00	All
bigmem	5-00:00:00	All
gpuq-contrib	3-00:00:00	hantil, ksun
contrib	5-00:00:00	qtong, normal
interactive	0-12:00:00	interactive

Argo

Argo Partition	Timelimit	Allowed QoS
gpuq	5-00:00:00	All
all-LoPri	5-00:00:00	All
all-HiPri	12:00:00	All
bigmem-HiPri	12:00:00	All
bigmem-LoPri	5-00:00:0	All
all-long	10-00:00:0	All
bigmem-long	10-00:00:0	All
contrib	7-00:00:0	contrib

Submitting a Job

In batch mode with Slurm scripts

Jobs are submitted using the sbatch command. This command has many options and a summary of these can be found in the sbatch documentation. The various sbatch options along with the program to be run should be put inside a single bash script and passed to sbatch as shown below:

> sbatch myscript.slurm

Note:This command must be run from one of the two login nodes.

Here is a sample script called myscript.slurm that shows how to set some common sbatch parameters.

#!/bin/sh

## Give your job a name to distinguish it from other jobs you run.
#SBATCH --job-name=<MyJobName>

## General partitions: all-HiPri, bigmem-HiPri   --   (12 hour limit)
##                     all-LoPri, bigmem-LoPri, gpuq  (5 days limit)
## Restricted: CDS_q, CS_q, STATS_q, HH_q, GA_q, ES_q, COS_q  (10 day limit)
#SBATCH --partition=<PartitionName>

## Separate output and error messages into 2 files.`
## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID`
#SBATCH --output=/scratch/%u/%x-%N-%j.out  # Output file`
#SBATCH --error=/scratch/%u/%x-%N-%j.err   # Error file`

## Slurm can send you updates via email
#SBATCH --mail-type=BEGIN,END,FAIL         # ALL,NONE,BEGIN,END,FAIL,REQUEUE,..
#SBATCH --mail-user=<GMUnetID>@gmu.edu     # Put your GMU email address here

## Specify how much memory your job needs. (2G is the default)
#SBATCH --mem=<X>G        # Total memory needed per task (units: K,M,G,T)

## Specify how much time your job needs. (default: see partition above)
#SBATCH --time=<D-HH:MM>  # Total time needed for job: Days-Hours:Minutes


## Load the relevant modules needed for the job
module load <module_name>

## Run your program or script
<command(s) to run your program>

Note 1:You must modify some of the in the script above and replace them with appropriate values for the script to work (including the &lt and &gt).

Note 2:In the above script if "--mem=MEM_VALUE" is not used then the default memory configuration of 2G per core will be used. If you asked for 2 cores per node, then 4G per node will be assigned. If you need more than the default memory, then "--mem-per-cpu" or --"mem" option should be used. "--mem" and "--mem-per-cpu" are mutually exclusive.

You can find a more detailed version of this script here: Template.slurm, which was designed to be a template for writing general Slurm submissions scripts. It contains a number of optional commands, along with comments about how you might customize if for writing your own slurm submission script.

Interactively with `salloc'

If you wanted to instead work with the same resources, but interactively, this is also doable with the salloc command. The parameters that are usually specified in a batch script can be combined in a single line with salloc. Slurm will allocate you the specified resources and connect you directly to a compute node.

salloc

Will start an interactive session with the default 1 CPU, 2GB memory in the 'interactive' partition. To be more specific with the resources you need, you can add them in the command with the correct Slurm parameter:

salloc -p normal --nodes=1 --ntasks-per-node=12 --mem=5GB --time=0-0:30:00

The session on the compute node will persist until you either hit the set time limit or type in:

exit

Monitoring And Controlling Jobs

There are 4 main slurm commands which are used to monitor and control jobs submitted to slurm. Here we go over them briefly. For all these commands, typing in command--help will show the relevant options they take and their usage.

`squeue`

The "squeue" command is used to view information about jobs in the Slurm scheduling queue. There are various options, among them the following are commonly used:

squeue                            # shows all users jobs (running as well as pending)
squeue -u userID1{,userID2...}    # shows jobs for users in the list of comma separated user ids
squeue -j jobID1{,jobID2...}      # shows jobs according to the list of comma separated job ids

Information about the other options can be found in squeue documentation.

`sacct`

This command is useful for giving you a summary of your recently launched jobs. Some options include:

sacct                             # displays detailed info of all jobs today (running, pending or finished)
sacct -X                          # displays summary of jobs today (running, pending, or finished)
sacct -X -S <Month/Day>           # displays summary all jobs since a certain date

More info about the sacct can be found in the sacct documentation.

`scancel`

This command use to cancel a running/queued job.

scancel jobID # kills the job as specified by the jobID

There are various options to restrict the usage of scancel. Please see the scancel documentation for details.

`sstat`

This command is used to display various status information of a running job/step. Key options include:

sstat jobID

Note: If no steps are associated with your job you will get sstat: error: no steps running for job jobID`

More info about sstat can be found in the sstat documentation.

`sview`

This command is used to open a GUI which can be used to monitor and change the scheduler control options. However, some of the changes that can be made are restricted to system administrators only (such as creating new partition or editing existing partitions). Users need to log into Argo using the "-Y" option (i.e., xserver) in order load the sview GUI (see Logging into Argo page for details). There is a command line alternative to sview which is called sinfo to get information about various Slurm configuration settings. Please see the sinfo documentation for details.

`scontrol`

This command, among other things, can be used to control job parameters of running/queued jobs. Some of the main options are shown below:

scontrol hold jobID1{ jobID2...}       # removes pending jobs from queue, won't stop already running jobs
scontrol release jobID1{ jobID2...}    # requeues jobs that were held by the previous command

Similarly, scontrol suspend jobID and scontrol resume jobID can be used to suspend and resume currently running jobs. Another useful way to check job status/configuration is using the following command:

scontrol show job jobID

This will give detailed info about the running/pending jobID. If changes need to be made to a pending job, the user can use the following command:

scontrol update <jobSpecification>

Where is in the same format as the output generated by the show command, which can be used to send to the update command. For example, below is a sample job configuration which was the output of the show command:

scontrol show job 81
JobId=81 Name=Sleeper
GroupId=argo-user(242)
...

If we wish to change the name of the job, we can use scontrol as follows:

scontrol update JobID=81 Name=WakeUp

Note: Not all parameters can be changed (for example, the number of cpus used). Please see the scontrol documentation for details.

More Advanced Job Options

Running Parallel Jobs

To see how to run distributed/shared memory jobs (like MPI, Pthread etc) please see the How to Run Parallel Jobs on Argo page.

Running Job Arrays

Job arrays are useful when you want to run the same program or script over and over again, with only minor differences between each run. Examples include: running a stochastic simulation with a different random number seeds, or running a parameter sweep with minor changes to the parameters from one run to the next. Your program can retrieve a unique number associated with each particular run, and then use that number to derive the parameter values that you feel are most appropriate.

The basic form of the job array command is as follows:

#SBATCH --array= <start>-<end>

Where and are integer numbers. This will cause your script to launch a series of sub-jobs, one for each value in the range from start to end. These numbers are called "array task IDs", and for any given sub-job, this number is stored in a system environment variable called $SLURM_ARRAY_TASK_ID. Your slurm script and your program can retrieve the array task ID associated with the current sub-job, and then use that information to set parameters. There are two basic approaches that can be used to retrieve the array task ID. The first is for your slurm submission script to pass it as a command-line parameter to your program, and the second is to have your program read this variable directly.

These example programs (cpp_example_array_job, java_example_array_job, python_example_array_job, job_submission_script) are shown where the environment variable $SLURM_ARRAY_TASK_ID is used. Note that aside from the "--array" command, the rest of the job_submission_script describes the parameters and resources needed for a single run of the job.

There are a number of other forms and options that can be used with job arrays:

...
#Job arrays run multiple times
#Run this job n times starting at index 1 
#SBATCH --array=<Job_Idx_Format>%<Job_Exe_Limit>
...

Here the specifies the indices which can be in the following formats:

1. #SBATCH --array=1-30
2. #SBATCH --array=1,2,3,4
3. #SBATCH --array=1-30:2

In the third option we increment the indices by 2 at a time.

Additionally you can limit how many jobs will run simultaneously by specifying the Job_Exe_Limit as shown below:

#SBATCH --array=1-31:2%4

In the above example, only 4 jobs run at a time; each will have odd indices since we are incrementing by 2. The "%4" tells the scheduler to run only 4 jobs at a time. The Slurm workload manager will set up an environment variable $SLURM_ARRAY_TASK_ID which will assume values from what is given to the --array option. For example, if --array=1-4 is specified, 4 jobs will run and $SLURM_ARRAY_TASK_ID will assume values 1 through 4. To utilize the $SLURM_ARRAY_TASK_ID variable a method will be required from the used programming language that can provide access to the environment variable.

The default naming of the output and the error file of job arrays are "slurm-%A_%a.out" and "slurm-%A_%a.err" respectively. Here %A is going to be replaced by the jobID and %a is the array indices of one of the job instance. Please do not use %j in the output and error specification as this will force the creation of a new jobID for each task with is confusing and unnecessary.

To cancel one or more job arrays user can do the following:

scancel jobID            #if jobID represents an arry job then this will cancel all its instances`
scancel jobID_idx        #cancels only the instance with indices idx`
scancel jobID_[idx-idy]  #cancels all instaces from idx to idy`

Similarly, one can use the scontrol command to modify/hold/update individual instances of jobs of a job array. Details of which can be found here.

Common Slurm Environment Variables

Variable	Description
$SLURM_JOB_ID	The Job ID
$SLURM_JOBID	Deprecated. Same as $SLURM_JOB_ID
$SLURM_SUBMIT_DIR	The path of the job submission directory.
$SLURM_SUBMIT_HOST	The hostname of the node used for job submission.
$SLURM_JOB_NODELIST	Contains the definition (list) of the nodes that is assigned to the job.
$SLURM_NODELIST	Deprecated. Same as SLURM_JOB_NODELIST.
$SLURM_CPUS_PER_TASK	Number of CPUs per task.
$SLURM_CPUS_ON_NODE	Number of CPUs on the allocated node.
$SLURM_JOB_CPUS_PER_NODE	Count of processors available to the job on this node.
$SLURM_CPUS_PER_GPU	Number of CPUs requested per allocated GPU.
$SLURM_MEM_PER_CPU	Memory per CPU. Same as --mem-per-cpu .
$SLURM_MEM_PER_GPU	Memory per GPU.
$SLURM_MEM_PER_NODE	Memory per node. Same as --mem .
$SLURM_GPUS	Number of GPUs requested.
$SLURM_NTASKS	Same as -n, --ntasks. The number of tasks.
$SLURM_NTASKS_PER_NODE	Number of tasks requested per node.
$SLURM_NTASKS_PER_SOCKET	Number of tasks requested per socket.
$SLURM_NTASKS_PER_CORE	Number of tasks requested per core.
$SLURM_NTASKS_PER_GPU	Number of tasks requested per GPU.
$SLURM_NPROCS	Same as -n, --ntasks. See $SLURM_NTASKS.
$SLURM_NNODES	Total number of nodes in the job’s resource allocation.
$SLURM_TASKS_PER_NODE	Number of tasks to be initiated on each node.
$SLURM_ARRAY_JOB_ID	Job array’s master job ID number.
$SLURM_ARRAY_TASK_ID	Job array ID (index) number.
$SLURM_ARRAY_TASK_COUNT	Total number of tasks in a job array.
$SLURM_ARRAY_TASK_MAX	Job array’s maximum ID (index) number.
$SLURM_ARRAY_TASK_MIN	Job array’s minimum ID (index) number.

A full list of environment variables for Slurm can be found by visiting the Slurm page on environment variables