Skip to content

Differences Between ARGO and HOPPER

condoleARGO and Hopper have many similarities and important differences that users must be aware of.

Similarities

Shared Storage

Both Argo and Hopper share the same filesystems. That allows users to work seamlessly across the two clusters without having to transfer files between them.

  • /home - is mounted on both and subject to the same quota limits
  • /scratch - is mounted on both and subject to the same purging policies
  • /projects - is mounted on both

SLURM for Job Scheduling

While the names of the partitions and scheduling policies differ, the fact that they both use SLURM will make the migration to Hopper easier.

Modules for Software Provisioning

Both use modules to provide users dynamic access to software installed on the clusters. However, some differences between Environmental Modules used on Argo and LMod used on Hopper require users to make some adjustments.

Differences

Hardware

ARGO Hopper
CPUs Different generations of Intel (Sandy Bridge - Skylake) Intel Cascade Lake
AMD Opteron AMD Zen2(Rome) and soon Zen3(Milan)
GPUs 24x NVIDIA V100 8x NVIDIA A100
20x NVIDIA K80

Software

ARGO Hopper
OS CentOS 7.x CentOS 8.x
Linux Kernel 3.x 4.x
Base Compiler gnu/4.8.5 gnu/8.3.1, gnu/9.3.0

Container Use

ARGO

Container use on ARGO has not been extensive.

Hopper

We will provide Singularity containers of many applications as an alternative or complement to applications you might have been running natively.

You will also be able to use PodMan to build Docker containers or convert them to Singularity containers to run on Hopper and the NVIDIA DGX A100 server without needing a privileged account.

Interactive Sessions Open OnDemand

ARGO

Most of the interactive/graphical computing on ARGO was done using user-managed Jupyter Notebooks, Matlab, R, ... etc with port forwarding and/or X11 forwarding.

Hopper

Open OnDemand (OOD) is installed on Hopper to enable users to run those same applications from a web interface. It would allow you to run Python, R and Julia, ... etc Jupyter Notebooks, Matlab notebooks, Mathematica notebooks and a growing list of applications. You can also open a remote desktop session to visualize data very smoothly.

To access the OOD web server, point your browser to https://ondemand.orc.gmu.edu. At the moment, you would need to be on campus or GMU's private network (VPN) to access it.

Please see the documentation on Hopper's OOD installation . At the moment, you would need to be on campus or GMU's private network (VPN) to access it.

SLURM

Partition Names

The partition names and defaults on the Hopper cluster are different from the ARGO cluster. Please see the table below for partition equivalence.

ARGO Partition Timelimit Allowed QoS - Hopper Partition Timelimit Allowed QOS
gpuq 5-00:00:00 All - gpuq 3-00:00:00 gpu
[all,bigmem]-LoPri 5-00:00:00 All - normal 5-00:00:00 All
[all,bigmem]-HiPri 12:00:00 All - normal 5-00:00:00 All
[all,bigmem]-long 10-00:00:0 All - normal 5-00:00:00 All
[CDS,CS]_q 10-00:00:0 [cds,cs]qos - contrib* 7-00:00:00 qtong,normal
COS_q 10-00:00:0 phyqos - contrib* 7-00:00:00 qtong,normal
EMH_q 10-00:00:0 contrib - contrib* 7-00:00:00 qtong,normal
- interactive 12:00:00 interactive
- debug 1:00:00 All

In short, the equivalence between Argo and Hopper partitions is summarized in the table below.

ARGO Hopper
gpuq gpuq
all-LoPri, all-HiPri, bigmem-HiPri, bigmem-LoPri, all-long, bigmem-long normal
CDS_q, COS_q, CS_q, EMH_q contrib*

Warning

* Being a contributor on Argo does not automatically entitle users to submit to the contrib partition on Hopper and to enjoy all the privileges it grants. They would need to have contributed to the Hopper condo model to be considered a contributor on Hopper. However, all users can submit jobs to the contrib partition under the condition that their jobs can be preempted by a contributor's job at any time.

The newly-introduced interactive and debug partitions have no equivalents on Argo.

You can use the sinfo command to get information about the partitions.

  • debug - short test jobs can be submitted to the debug partition for a quick turnaround.
  • normal - this is the default partition.
  • interactive - jobs submitted via Open OnDemand (OOD) with --qos=interactive run in this partition
  • gpuq - GPU jobs need ton be submitted to the gpuq with --qos=gpu
  • contrib - contributors to our condo model can submit to the contrib partition with --qos=<group_account_name>. Non-contributors can submit jobs to the contrib partition with --qos=normal, but their jobs can be preempted by ones from contributors.

Nota Bene: having access to a contributor partition on Argo does not automatically grant access to the contrib partition on Hopper. Unless you have a valid Hopper contributor QoS, your jobs will be subject to preemption in the Contrib partition and may be killed.

Default Partitions and Time Limits

ARGO

The default partition is all-HiPri with a 12 hour walltime limit

Hopper

The default partition is normal with a 5 day walltime limit

Please note the different walltime limits for the other partitions above.

QoS (Quality of Service)

ARGO

$ sacctmgr list qos format=Name,Priority,MaxWall,MaxTres,MaxTRESPU
      Name   Priority     MaxWall       MaxTRES     MaxTRESPU
---------- ---------- ----------- ------------- -------------
    normal          0                   cpu=300 cpu=350,gres+
    cdsqos        100                   cpu=300 cpu=350,gres+
    phyqos        100                   cpu=300 cpu=350,gres+
     csqos        100                   cpu=800 cpu=320,gres+
  statsqos        100                   cpu=300 cpu=350,gres+
     hhqos        100                   cpu=300 cpu=350,gres+
     gaqos        100                   cpu=300 cpu=350,gres+
     esqos        100                   cpu=300 cpu=350,gres+
   testqos          0                   cpu=500 cpu=350,gres+
   contrib        100                   cpu=300 cpu=350,gres+

Hopper

$ sacctmgr list qos format=Name,Priority,MaxWall,MaxTres,MaxTRESPU
      Name   Priority     MaxWall       MaxTRES     MaxTRESPU
---------- ---------- ----------- ------------- -------------
    normal          0                                 cpu=600
     qtong          1
interacti+          0                                  cpu=12
  orc-test          0
       gpu          0

Preemption

ARGO

Preemption is enforced on the all-HiPri partition.

Hopper

Preemption is enforced on the contrib partition.

Contributors to our condo model can submit to the contrib partition with --qos=<name>. That will entitle them to higher priority over non-contributors within that partition. Non-contributors can submit jobs to the contrib partition with --qos=normal, but their jobs can be preempted (cancelled and requeued) by ones from contributors. So, please make sure to checkpoint your jobs regularly or have some other means of restarting them from an interrupted state.

Modules

Even though both clusters use modules to provision software, the Environmental Modules used on Argo and LMod have some key differences.

Flat vs. Hierarchical Structure

ARGO

ARGO uses a flat module scheme that displays all module available on the system at all time. When you search a package via module avail <package_name>, you will see modules matching <package_name> regardless of their dependencies.

Hopper

Hopper uses a hierarchical module scheme that displays only modules that are compatible with the particular compiler and MPI library you have loaded at a given time to avoid incompatibilities. Therefore, searching a package via module avail <package_name> will not necessarily show you all available versions of that package. A more comprehensive way of searching for packages is using the module spider <package_name> command. That will report all available packages matching <package_name>.

Info

On Hopper, you cannot load more than one version of the same application. For example, you can not load python/3.6.8 and python/3.8.6 at the same time. Loading one will automatically unload the other. In the rare cases where it is absolutely necessary to load more than one compiler or MPI library at a time, you can set export LMOD_EXPERT=1 to enable that feature.

When you list the available modules on Hopper, you will see applications grouped as follows

  • Core - essential system modules
  • Independent - packages without particular compiler or MPI library dependence. These are often applications packaged as pre-compiled static binaries
  • <COMPILER> - packages built using the given <COMPILER>
  • <Compiler>_<MPI-library> - packages built using the given <COMPILER> and <MPI-library>

As will be discussed below, the default compiler and MPI library on Hopper are GNU/9.3.0 and OpenMPI/4.0.4. Therefore, you will see these groups of modules when executing module avail

  • Core
  • Independent
  • GNU-9.3.0
  • GNU-9.3.0_OpenMPI-4.0.4

Base/Default Modules

ARGO

No modules are loaded by default at login on Argo unless you explicitly load them via your startup scripts (~/.bashrc, ~/.profile, ~/.bash_profile).

$ module list

No Modulefiles Currently Loaded.

Hopper

Since modules available to you depend on the compiler and MPI library loaded in your environment, Hopper loads a set of default modules including a default compiler (gnu9/9.3.0) and MPI library (openmpi4/4.0.4).

$ module list

Currently Loaded Modules:
  1) autotools   2) prun/2.0   3) gnu9/9.3.0   4) ucx/1.8.0   5) libfabric/1.10.1   6) openmpi4/4.0.4   7) ohpc  8) hosts/hopper

Warning

If you load modules or set up other environmental variables using your startup scripts on Argo, you will likely get errors and warning messages when logging into Hopper because those modules do not exist on Hopper with the same name. To avoid these issues, you can wrap some of the logic in your startup scripts to behave differently based on the cluster. Such logic would look like this:

# load the proper set of modules based on the cluster
export CLUSTER=`sacctmgr -n  show cluster format=Cluster|xargs`
export CNODE=`hostname -s`
if [ ${CLUSTER} == "argo" ]; then
  module load <ARGO_MODULE_NAME...>
  source <ARGO_FILE...>
  export ARGO_ENVIRONMENT=...
  ...
elif [ ${CLUSTER} == "hopper" ] && [ ${CNODE} == "dgx-a100-01" ]; then
  module load <DGX_MODULE_NAME...>
  source <DGX_FILE...>
  export DGX_ENVIRONMENT=...
  ...
else
  module load <HOPPER_MODULE_NAME...>
  source <HOPPER_FILE...>
  export HOPPER_ENVIRONMENT=...
  ...
fi

Module naming

ARGO

Modules are generally named as <package_name>/<package_version> on ARGO

Hopper

Depending on the source of the package, Lmod modules on Hopper can have longer names and aliases.

  • Spack-built packages are named <package_name>/<package_version>-<two-character-hash>
  • Some packages have useful aliases such as <package_name>/<package_version>-<two-character-hash> (mixed-precision)
  • Some important packages such as compilers, MPI and math libraries have global aliases appearing at the top when you execute module avail

Searching for modules

Info

In LMod running on Hopper, module spider searches the whole module tree to find matching modules whereas module avail will only search modules built with your current compiler and MPI library.

How you search for modules in Hopper is very different from Argo and we will use an example to demonstrate this key difference

Let's take the package nwchem to demonstrate differences in how you search for modules in the two systems.

ARGO

You can easily see that there are two versions of nwchem.

$ module avail nwchem
------------------------ /cm/shared/modulefiles ----------

nwchem/intel/6.8.1 nwchem/intel/7.0.2

Hopper

You can initially see that there is one version of nwchem built using GNU/9.3.0 compiler and OpenMPI/4.0.4 MPI library.

$ module avail nwchem

---------------------- GNU-9.3.0_OpenMPI-4.0.4 -----------------
   nwchem/7.0.2-m4

Use "module spider" to find all possible modules and extensions.
Use "module keyword key1 key2 ..." to search for all possible modules matching any of the "keys".
However, you are not seeing any other versions that are built with other compilers and MPI libraries. That's where module spider comes in.

$ ml spider nwchem

------------------------------
  nwchem:
------------------------------
     Versions:
        nwchem/6.8.1-ip
        nwchem/7.0.2-m4
        nwchem/7.0.2-mr

--------------------------------
  For detailed information about a specific "nwchem" package (including how to load the modules) use the module's full name.
  Note that names that have a trailing (E) are extensions provided by other modules.

You can now see there are three versions of nwchem on Hopper.

If you randomly try to load one of these modules, you may get an error message like this:

$ module load nwchem/6.8.1-ip

LMod has detected the following error: These module(s) or extension(s) exist but cannot be loaded as requested: "nwchem/6.8.1-ip"
   Try: "module spider nwchem/6.8.1-ip" to see how to load the module(s).

To see how you can load any one of them, you can run module spider on any particular version.

$ module spider nwchem/6.8.1-ip

---------------------------------
  nwchem: nwchem/6.8.1-ip
---------------------------------

    You will need to load all module(s) on any one of the lines below before the "nwchem/6.8.1-ip" module is available to load.

      intel/2020.2  impi/2020.2

    Help:
      High-performance computational chemistry software

The output above is telling us that to load this module, you would need the compiler and MPI library it was built with, namely intel/2020.2 and impi/2020.2.

$ module load intel/2020.2 impi/2020.2

LMod is automatically replacing "gnu9/9.3.0" with "intel/2020.2".

LMod is automatically replacing "openmpi4/4.0.4" with "impi/2020.2".

$ module load nwchem/6.8.1-ip

$ module list

Currently Loaded Modules:
  1) use.own   2) slurm-tools/1.0   3) autotools   4) prun/2.0   5) ohpc   6) intel/2020.2   7) impi/2020.2   8) nwchem/6.8.1-ip

Basic LMod Usage

The table below summarizes the most commonly used LMod commands. Please note that you can use ml as an alias or shortcut to module

Module Command Description
ml avail List available modules
ml list Show modules currently loaded
ml load/add package Load a selected module*
ml +package Load a selected module*
ml unload/rm package Unload a previously loaded module
ml -package Unload a previously loaded module
ml swap package1 package2 Unload package1 and load package*
ml purge Unload all loaded modules
ml reset Reset loaded modules to system defaults
ml display/show package Display the contents of a selected module
ml spider List all modules (not just available ones)
ml spider package Display description of a selected module
ml keyword key Search for available modules by keyword
ml, module help Display help, usage information for modules
ml use path Add path to the MODULPATH (module search path)
ml unuse path Remove path from the MODULPATH (module search path)

Info

* We have enabled the autoswap feature in Lmod such that loading a package while a conflicting package is loaded will automatically swap the modules. For example, tying to load python/3.8.6 while python/3.7.6 is loaded will automatically swap python/3.7.6 for python/3.8.6. Without the autoswap feature, you would have had to manually unload python/3.7.6 and load python/3.8.6

Click on the following clip to get the basic look and feel of Lmod in Hopper.

asciicast