How to run R

Running Serial R Jobs

To use the optimized version of R compiled with OpenBLAS, you have to load the module R. Currently the most up-to-date R version is 3.5.1. Make sure you check all the available R modules installed on the cluster before submitting your script. Ensure that you specify the R module version you want to use.

If you type module show R/3.5.1, you will see the following:

------------------------------------------------------------------- /cm/shared/modulefiles/R/3.5.1:

module-whatis Adds R with openblas library to your environment. module load openblas/0.2.20 prepend-path PATH /cm/shared/apps/R/3.5.1/bin prepend-path LD_LIBRARY_PATH /cm/shared/apps/R/3.5.1/lib64 -------------------------------------------------------------------

You can submit batch R jobs with a Slurm submission script. At the end of your Slurm script, you can run your R script with the following command: RScript [options].R. To find what the options can be passed to RScript type R --help after you have loaded the R module. You need to load the R module explicitly inside your Slurm job submission file. NOTE: R uses ".RData" file in your current directory to load/save workspace every time it starts/finishes a session. This can significantly slow down execution of your job depending on the size of the ".RData" file.

It is advisable to use the following options "--no-restore --quiet --no-save" when starting a job. The option "--quiet" suppresses the start up messages, "--no-restore" directs R not to restore anything from ".RData", and "--no-save" ensures that the workspace is not saved to ".RData" at the end of the R session. Given below is a sample Slurm job submission script for a R job:

#!/bin/sh

## Specify the name for your job, this is the job name by which Slurm will ## refer to your job. This can be different from the name of your executable ## or the name of your script file #SBATCH --job-name My_R_Job

## General partitions: all-LoPri, all-HiPri, bigmem-LoPri, bigmem-HiPri, gpuq ## all-* Will run jobs on (almost) any node available ## bigmem-* Will run jobs only on nodes with 512GB memory ## *-HiPri Will run jobs for up to 12 hours ## *-LoPri Will run jobs for up to 5 days ## gpuq Will run jobs only on nodes with GPUs (40, 50, 55, 56) ## Restricted partitions: CDS_q, CS_q, STATS_q, HH_q, GA_q, ES_q, COS_q ## Provide high priority access for contributors #SBATCH --partition=all-HiPri

## Deal with output and errors. Separate into 2 files (not the default). ## May help to put your result files in a directory: e.g. /scratch/%u/logs/... ## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID #SBATCH --output=/scratch/%u/%x-%N-%j.out # Output file #SBATCH --error=/scratch/%u/%x-%N-%j.err # Error file #SBATCH --mail-type=BEGIN,END,FAIL # ALL,NONE,BEGIN,END,FAIL,REQUEUE,.. #SBATCH --mail-user=@gmu.edu # Put your GMU email address here

## Specifying an upper limit on needed resources will improve your scheduling ## priority, but if you exceed these values, your job will be terminated. ## Check your "Job Ended" emails for actual resource usage info. #SBATCH --mem=1M # Total memory needed for your job (suffixes: K,M,G,T) #SBATCH --time=0-00:02 # Total time needed for your job: Days-Hours:Minutes

## Load the relevant modules needed for the job module load R/3.5.1

## Start the job Rscript --no-restore --quiet --no-save RHello.R

Be sure to put your email address in the appropriate place. Below is a very simple RHello.R program that you can run with the above script.

#!/usr/bin/env Rscript

## Just output the text "Hello, world!" cat("Hello, world!\n")

Just note that the values for --mem and --time are set to be extremely low. You will want to increase these when you run your own programs.

Running Multi-threaded R Jobs

On the ARGO cluster, many of the R modules (including R/3.5.0 and R/3.5.1) are configured to use the OpenBLAS library. This means that R can take advantage of additional CPUs if they are available when performing linear algebra operations. To request additional CPUs for your job you can use the --cpus-per-task Slurm parameter. You should also make sure that the environment variable OPENBLAS_NUM_THREADS is set to an appropriate value greater than or equal to the number of cores requested.

Add the following to your Slurm submission script, replacing with the number of core you are requesting:

`#SBATCH --cpus-per-task # Request extra CPUs for threadsexport OPENBLAS_NUM_THREADS=`

Be aware that cannot exceed the number of cores on the node where your job will run. Making this value too high could cause Slurm to delay your job until an appropriate node becomes available, or just reject it completely.

There are some R packages that take advantage of the R parallel library to make use of additional CPUs cores e.g. parallelDist, Rdsm and RcppThread.

A common R programming practice with the parallel library is to set the number of cores for the parallel cluster using the parallel::detectCores() routine e.g.:

`mc.cores = parallel::detectCores()`

This can cause problems when running with SLURM, as SLURM will restrict your job to the cores requested but detectCores() will return the total number of cores on the node. Unless you are requesting a full node this will overload the cores in your job and may severly impact performance. The best practice for using the parallel library is to set mc.cores to the number of cores assigned by SLURM. This can be done by adding the code below to your R script:

# set the number of parallel worker according to SLURM nworkers <- Sys.getenv("SLURM_CPUS_PER_TASK") message("I have ", nworkers, " cores available")

Running Parallel R Jobs Using Rmpi

NOTE: Rmpi has been installed and tested for every version of R except R/3.4.1. All of them were compiled using the openmpi/3.1.3 module as a dependency.

Rmpi works with OpenMPI as a wrapper to spawn slave processes through your R script file. Hence OpenMPI needs to be loaded before using Rmpi. Detailed information about Rmpi can be found in Rmpi documentation.

Below is a sample job submission script that shows how to submit Rmpi jobs:

#!/bin/sh

## Specify the name for your job, this is the job name by which Slurm will ## refer to your job. This can be different from the name of your executable ## or the name of your script file. #SBATCH --job-name=RmpiHello

## General partitions: all-LoPri, all-HiPri, bigmem-LoPri, bigmem-HiPri, gpuq ## all-* Will run jobs on (almost) any node available ## bigmem-* Will run jobs only on nodes with 512GB memory ## *-HiPri Will run jobs for up to 12 hours ## *-LoPri Will run jobs for up to 5 days ## gpuq Will run jobs only on nodes with GPUs (40, 50, 55, 56) ## Restricted partitions: CDS_q, CS_q, STATS_q, HH_q, GA_q, ES_q, COS_q ## Provide high priority access for contributors #SBATCH --partition=all-HiPri

## Deal with output and errors. Separate into 2 files (not the default). ## May help to put your result files in a directory: e.g. /scratch/%u/logs/... ## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID #SBATCH --output=/scratch/%u/%x-%N-%j.out # Output file #SBATCH --error=/scratch/%u/%x-%N-%j.err # Error file #SBATCH --mail-type=BEGIN,END,FAIL # ALL,NONE,BEGIN,END,FAIL,REQUEUE,.. #SBATCH --mail-user=@gmu.edu # Put your GMU email address here

## You can improve your scheduling priority by specifying upper limits on ## needed resources, but jobs that exceed these values will be terminated. ## Check your "Job Ended" emails for actual resource usage info as a guide. #SBATCH --mem=1M # Total memory needed per task (units: K,M,G,T) #SBATCH --time=0-00:02 # Total time needed for job: Days-Hours:Minutes

## ----- Parallel Processes ----- ## Some libraries (MPI) implement parallelism using processes that communicate. ## This allows tasks to run on any set of cores in the cluster. Programs can ## use this approach in combination with threads (if designed to). #SBATCH --ntasks# Number of processes you plan to launch

# Optional parameters. Uncomment (remove one leading '#') to use. ##SBATCH --nodes# If you want some control over how tasks are # distributed on nodes.>= ##SBATCH --ntasks-per-node# If you want more control over how tasks are # distributed on nodes.=*

## Load the R module which also loads the OpenBLAS module module load R/3.5.1 ## To use Rmpi, you need to load the openmpi module module load openmpi/3.1.3

# R still wants to write files to our current directory, despite using # "--no-restore --quiet --no-save" below, so move someplace writable. ORIG_DIR=$PWD cd $SCRATCH

echo "Calling mpirun now!!!"

## Use "--no-restore --quiet --no-save" to be as quiet as possible. ## Note: You do not spawn the parallel processes directly through mpirun, but ## instead from inside your R script, hence parameter -np is set to 1. mpirun -np 1 Rscript --no-restore --quiet --no-save $ORIG_DIR/RmpiHello.R

Below is a parallel hello world Rmpi program that can be used to test the above script. Be sure to replace the with the appropriate value on the line where slaves are spawned.

## RmpiHello.R

## Load the R MPI package if it is not already loaded. if (!is.loaded("mpi_initialize")) { library("Rmpi") }

## Specify how many slave processes will be spawned. ## This must be 1 less than the number of tasks requested (master uses 1). mpi.spawn.Rslaves(nslaves=) # Change this to match your Slurm script

## In case R exits unexpectedly, automatically clean up ## resources taken up by Rmpi (slaves, memory, etc...) .Last <- function() { if (is.loaded("mpi_initialize")) { if (mpi.comm.size(1) > 0) { print("Please use mpi.close.Rslaves() to close slaves.") mpi.close.Rslaves() } print("Please use mpi.quit() to quit R") .Call("mpi_finalize") } }

## Tell all slaves to return a message identifying themselves mpi.remote.exec(paste("Hello, World from process ",mpi.comm.rank(),"of",mpi.comm.size()))

## Tell all slaves to close down, and exit the program mpi.close.Rslaves() mpi.quit()

Note. Number of slaves requested must be 1 less than the number of tasks requested as shown in the above scripts.

R Packages

For information about installing and managing R packages on ORC systems pleases see the page: How to manage R-Packages.