Skip to content

FAQ

List of FAQs based on review of OS tickets

1.I need to use GPUs for my computations. Of the available GPU resources of the Hopper cluster, which node(s) or partitions should I use that is/are appropriate for my jobs?

We recommend submitting jobs to the partitions that have the A100:80GB nodes if your memory needs are not of the order of 1TB or more.

There are 24 A100:80GB nodes with 512GB memory each, whereas there are only 2 A100:40GB nodes with 1TB+ memory each. If your memory requirement is not of the order of 1TB, then using the A100:80GB will result in shorter wait times for your job to start.

Additionally, future plans for the A100:80GB nodes include partitioning into smaller slices, which will further increase their availability for jobs and reduce wait times even further.

2.I need to use GPUs for my computations. What are the 2-3 most important criteria I should consider in deciding which GPU nodes are most appropriate for my jobs?

You should have a good estimate of at least the following 2 items: amount of memory (RAM) needed for the job and time needed to complete the job. These determine the appropriate partition(s) for the job.

3.I am getting an out-of-memory error for my job. How do I resubmit the job to avoid this error?

In your Slurm script, increase the amount of memory requested via the appropriate slurm script directive (for example, #SBATCH --mem-per-task=50GB).

4.How do I determine the amount of memory my job needs before submitting a time-intensive batch job? How do I use this information to select the appropriate node(s)?

To determine the amount of memory needed for your jobs, we suggest that you examine your code to determine the size of arrays, the number of iterations, etc., that will need to fit in the memory to run the job. This is also a general good practice for any program that you write or use for your work.

5.My job requires a large amount of memory (>500GB). Which partition(s) or node(s) should I use?

For jobs requiring a large amount of memory, we suggest using nodes of the 'bigmem' partition.

To list all the partitions available and their corresponding nodes, you can use the command:

$ sinfo

To determine the maximum amount of memory available on a specific node, for example, the amd069 node on the bigmem partition, use the following command:

$ scontrol show node amd069 | grep mem | tail -n 1 | tr "," "\n" | sed -n '2 p'
mem=185G
Note: This also gives the correct format to specify the memory needed for the slurm script.

6.My job is currently pending (PD) for almost 2 days. I do not know how long the job will take. How should I specify the time option in the slurm script to avoid long wait times?

The time option, in the format day-hours:minutes:seconds, is specified by using the following directive in the slurm script:

#SBATCH --time=0-00:30:00

Most partitions on the Hopper cluster have a default time limit of 3 or 5 days for a submitted job to complete. If time is not specified in the slurm script, it defaults to the default time of the partition. Therefore, it is recommended that you specify the time you estimate your job to take, especially if it is significantly less than the maximum time.

7.Is Python installed on the Hopper cluster?

Yes, Python is installed. Only Python 3 versions are available. To find various available versions, use the command:

$ module avail python

Then use the following command to load Python for your use:

$ module load python/<version>

NOTE: The versions available for the gnu9 and the gnu10 compilers are different. For the versions available for the gnu10 compiler, first load the gnu10 module and then load the Python version of your choice: ```bash $ module load gnu10 $ module avail python $ module load python/

8.Is R installed on the Hopper cluster?

Yes, R is installed. To find various available versions, use the command:

$ module avail r

Use the following command to load R for your use:

$ module load r/<version>

Also, the RStudio server is available as a module from the command-line interface (CLI) and a GUI-based application on Open OnDemand (OOD). To access the OOD web server, point your browser to: https://ondemand.orc.gmu.edu

You will have to authenticate, and the credentials are your GMU NetID and Password.

9.Is Matlab installed on the Hopper cluster?

Yes, Matlab is installed. To find various available versions, use the command:

$ module avail matlab

Use the following command to load Matlab for your use:

$ module load matlab/<version>

10.Do you have a quota for each user? How can I check my quota usage?

  For the $HOME directory of each user, the amount of file space used (i.e., quota) is 60 GB. You can check your current usage with the following command:

  ``` $ du -sh $HOME```

  PhD students or their advisors can request additional space on the ```/projects``` filesystem. Usage here should not exceed 1 TB per student.

  A ```/scratch/$USER``` directory is available to each user for temporary storage, such as job results. We will perform occasional sweeps of this filesystem, removing any files that are older than 90 days (about 3 months).

11.How do I submit jobs?

Jobs are submitted through Slurm. Slurm is a workload manager for Linux that manages job submission, deletion, and monitoring. The command for submitting a batch job is:

$ sbatch <slurm script>