NAMD on the DGX A100

Users can run NAMD on the DGX A100 as a native or containerized application. The two methods are described below

You can run different versions of NAMD on the Hopper cluster and the DGX A100 GPU server that is integrated to it.

If you want to take advantage of the GPUs, you can eiither use the Singularity containers we provide at /containers/dgx/Containers or the native application we provision using modules.

However, you can run your simulations using the CUDA-enabled NAMD 2.14 module.

If you prefer to use CPUs only, you can load up the NAMD 2.14 module as you would on Argo.

The two approaches, namely running native NAMD and containterized NAMD, are outlined below.

Using Native Application

2.14 GPU enabled

2.14 GPU-enabled (/opt/sw/dgx-a100/apps/namd/2.14/NAMD_2.14_Linux-x86_64-verbs-smp-CUDA)
module = namd/2.14-verbs-smp-cuda
sample run - /opt/sw/app-tests/NAMD/dgx/native/2.14-CUDA-enabled
sample batch submission file - /opt/sw/app-tests/NAMD/dgx/native/2.14-CUDA-enabled/run.slurm

2.14 CPU-only

2.14 cpu-only (/opt/sw/other/apps/namd/2.14/NAMD_2.14_Linux-x86_64-verbs-smp)
module - namd/2.14-verbs-smp
example directory - /opt/sw/app-tests/NAMD/hopper/native/2.14-CPUonly
sample batch submission file - /opt/sw/app-tests/NAMD/hopper/native/2.14-CPU-only/run.slurm

Using Containerized Application

We currently provide the following GPU/CUDA-enabled containers from NVIDIA

3.0-alpha GPU enabled

3.0-alpha
Container - /containers/dgx/Containers/namd/namd_3.0-alpha3-singlenode.sif
example directory - /opt/sw/app-tests/NAMD/dgx/containerized/3.0
sample batch submission file - /opt/sw/app-tests/NAMD/dgx/containerized/3.0/run*gpu*cores.slurm

2.13 GPU enabled

2.13
Container - /containers/dgx/Containers/namd/namd_2.13-singlenode.sif
example directory - /opt/sw/app-tests/NAMD/dgx/containerized/2.13
sample batch submission file - /opt/sw/app-tests/NAMD/dgx/containerized/2.13/run*gpu*cores.slurm

Benchmarks

Timings

Containerized GPU

NAMD/3.0-alpha]  grep -m 1 "Benchmark time" *log |sort -n -k 4 |column -t
1gpus-1cores.log:Info:   Benchmark  time:  1   CPUs  0.0014479   s/step  119.345  ns/day  0  MB  memory
1gpus-2cores.log:Info:   Benchmark  time:  2   CPUs  0.00138185  s/step  125.05   ns/day  0  MB  memory
1gpus-4cores.log:Info:   Benchmark  time:  4   CPUs  0.00136058  s/step  127.005  ns/day  0  MB  memory
1gpus-8cores.log:Info:   Benchmark  time:  8   CPUs  0.00134193  s/step  128.77   ns/day  0  MB  memory
1gpus-16cores.log:Info:  Benchmark  time:  16  CPUs  0.0013593   s/step  127.124  ns/day  0  MB  memory
1gpus-32cores.log:Info:  Benchmark  time:  32  CPUs  0.00142356  s/step  121.386  ns/day  0  MB  memory
-
-
NAMD/2.13]  grep -m 1 "Benchmark time" *log |sort -n -k 4 |column -t
1gpus-1cores.log:Info:   Benchmark  time:  1   CPUs  0.0129561   s/step  0.149955   days/ns  531.312  MB  memory
1gpus-2cores.log:Info:   Benchmark  time:  2   CPUs  0.00804309  s/step  0.0930913  days/ns  534.602  MB  memory
1gpus-4cores.log:Info:   Benchmark  time:  4   CPUs  0.00537852  s/step  0.0622514  days/ns  540.238  MB  memory
1gpus-8cores.log:Info:   Benchmark  time:  8   CPUs  0.0042346   s/step  0.0490116  days/ns  547.203  MB  memory
1gpus-16cores.log:Info:  Benchmark  time:  16  CPUs  0.00265262  s/step  0.0307016  days/ns  566.164  MB  memory
1gpus-32cores.log:Info:  Benchmark  time:  32  CPUs  0.0025912   s/step  0.0299907  days/ns  596.684  MB  memory

Native GPU

NAMD/2.14-CUDA-enabled]  grep -m 1 "Benchmark time" *log |sort -n -k 4|column -t
1gpus-8cores.log:Info:   Benchmark  time:  8   CPUs  0.0034427   s/step  0.039846   days/ns  582.902  MB  memory
2gpus-8cores.log:Info:   Benchmark  time:  8   CPUs  0.00342256  s/step  0.0396129  days/ns  878.598  MB  memory
1gpus-16cores.log:Info:  Benchmark  time:  16  CPUs  0.00267346  s/step  0.0309429  days/ns  594.891  MB  memory
2gpus-16cores.log:Info:  Benchmark  time:  16  CPUs  0.0021852   s/step  0.0252917  days/ns  912.77   MB  memory
1gpus-32cores.log:Info:  Benchmark  time:  32  CPUs  0.00239043  s/step  0.027667   days/ns  611.156  MB  memory
2gpus-32cores.log:Info:  Benchmark  time:  32  CPUs  0.00192482  s/step  0.022278   days/ns  941.094  MB  memory

CPU-only

NAMD/2.14-CPUonly]  grep -m 1 "Benchmark time" *log |sort -n -k 4 |column -t
1cores.log:Info:   Benchmark  time:  1   CPUs  0.486551   s/step  5.63137   days/ns  435.906  MB  memory
2cores.log:Info:   Benchmark  time:  2   CPUs  0.247587   s/step  2.86559   days/ns  461.664  MB  memory
4cores.log:Info:   Benchmark  time:  4   CPUs  0.133817   s/step  1.54881   days/ns  675.102  MB  memory
8cores.log:Info:   Benchmark  time:  8   CPUs  0.0682242  s/step  0.789632  days/ns  740.727  MB  memory
16cores.log:Info:  Benchmark  time:  16  CPUs  0.0364464  s/step  0.421833  days/ns  1299.59  MB  memory
32cores.log:Info:  Benchmark  time:  32  CPUs  0.0202405  s/step  0.234265  days/ns  2446.89  MB  memory
48cores.log:Info:  Benchmark  time:  48  CPUs  0.0141087  s/step  0.163296  days/ns  3602.52  MB  memory

Conclusions

GP-GPU runs are more than 1-2 orders of magnitude faster than CPU-only runs. However, the speedup from using multiple GPUs is small, at least for the APOA1 example.
NAMD 3.0 offloads almost all computations to the GPU while NAMD 2.1[3/4] split the computations between the CPU and GPU
NAMD 3.0 is 2-3 times faster than NAMD 2.1[3/4]
If using NAMD 3.0, there is no reason to request many CPU cores.

Caveats

the NAMD 3.0-alpha container has impressive performance, but it is unclear if it is suitable for production work (http://www.ks.uiuc.edu/Research/namd/alpha/3.0alpha/)
we have not tested multi-GPU or multi-node runs using the containers or native applications. We'll add such tests and examples later
since the DGX is the only server with GPUs on Hopper, we will need to take extra steps to make sure it is used equitably. Please refrain from reserving more GPUs than you truly need.
we realize the native (2.14) and containerized apps (2.13, 3.0) are different versions. 2.13 and 3.0 are the only ones available as containers. Please let us know if you want anything other than 2.14 as a native application.