Python Virtual Environments On Hopper
There are different approaches in the Python ecosystem for creating virtual environments (VE):
pipenv. Among these, the methods
virtualenv are quite similar, and either one can be chosen. The key distinction between these two lies in how they handle Python executables within the virtual environment folder:
venvcreates a virtual environment without copying Python executables.
virtualenvcreates a virtual environment by copying Python executables.
Before utilizing Python Virtual Environments on Hopper, please take note of the following:
- Python environments need to be established separately on Hopper. A Python Virtual environment built on Hopper should not be expected to function on any other system.
- For Hopper, it is recommended to employ
venvfor creating virtual environments, following the process detailed below.
Starting an interactive Session on Hopper GPUs
If you intend to run the Python Virtual Environments on the Hopper DGX-A100 GPU nodes, the first step is to get directly on the
GPU node by starting an interactive session with the
salloc -p gpuq -q gpu -n 1 --ntasks-per-node=1 --gres=gpu:A100.40gb:1 --mem=50GB
|Type of GPU||SLURM setting||No. of GPUs on Node||No. of CPUs||RAM|
|DGX A100 40GB||--gres=gpu:A100.40gb:nGPUs||8||128||1TB|
If you don't want to use DGX nodes you can also use the below command to start an interactive session on the contrib-gpuq partition
salloc -p contrib-gpuq -q gpu -n 1 --ntasks-per-node=1 --gres=gpu:1 --mem=5GB
sinfocommand to view the list of nodes available, the time restriction for each node, and the available partitions.
Once you have the interactive session started, you should load the necessary modules. For Python on the DGX nodes,
use the module
python/3.8.6-ff which has been built to run across both the CPU nodes and GPU nodes. Currently,
other Python modules will not work across both the GPU and CPU nodes because of the differences in architecture.
Creating a Python Virtual Environment
venv is a Python module available in the standard library and compatible with most Python versions. It simplifies the creation of isolated environments for Python projects. Unlike other environment management tools,
venv ensures that libraries are not shared with other virtual environments by default. You also have the option to prevent access to globally installed libraries if desired.
When you utilize
venv to create a virtual environment, it establishes an isolated space where you can install your project's dependencies independently. Unlike tools like Virtualenv,
venv does not rely on copying the Python interpreter binary into the virtual environment directory. Instead, it often employs symbolic links to the existing Python interpreter on your system. As a result, the virtual environment created by
venv contains symbolic links to the Python executable, optimizing disk space usage.
venv is a built-in module within Python's standard library that offers a simple method for establishing isolated environments tailored to Python projects. It avoids library sharing between virtual environments and enables efficient management of project-specific dependencies.
Python Virtual Environment Setup for Hopper Cluster
To ensure that your Python Virtual Environment runs consistently across all nodes, it's important to use modules built for this purpose. Before creating the Python Virtual Environment, follow these steps:
1 - First switch modules to GNU 10 compilations:
module load gnu10/10.3.0-ya
2 - Check and load python module
module avail python module load python/3.8.6-pi
python -m venv py-env source py-env/bin/activate python -m pip install --upgrade pip
4 - Remove system python module and install modules
module unload python pip install sklearn
[user@hopper2 ~]$ python -m venv py-env [user@hopper2 ~]$ source py-env/bin/activate (py-env) [user@hopper2 ~]$ python -m pip install --upgrade pip Collecting pip Downloading pip-23.2.1-py3-none-any.whl (2.1 MB) |████████████████████████████████| 2.1 MB 13.6 MB/s Installing collected packages: pip Attempting uninstall: pip Found existing installation: pip 20.2.1 Uninstalling pip-20.2.1: Successfully uninstalled pip-20.2.1 Successfully installed pip-23.2.1 (py-env) [user@hopper2 ~]$ module unload python (py-env) [user@hopper2 ~]$ pip install sklearn Collecting sklearn Downloading sklearn-0.0.post7.tar.gz (3.6 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Building wheels for collected packages: sklearn Building wheel for sklearn (pyproject.toml) ... done Created wheel for sklearn: filename=sklearn-0.0.post7-py3-none-any.whl size=2951 sha256=c62d78ad5864da7dddeea4f58c96b35c6bab2584dec8748ce5687b788491f2eb Stored in directory: /home/smudund/.cache/pip/wheels/bc/86/46/dd4e366dc5e1303b4d6927d2a603a1ae7f979d488a5d202330 Successfully built sklearn Installing collected packages: sklearn Successfully installed sklearn-0.0.post7 (py-env) [user@hopper2 ~]$ deactivate [user@hopper2 ~]$
Things to remember before creating a virtual environment
- To create a Python Virtual Environment from a login node, a Python module must first be loaded using the ‘module load’ command.
On Hopper, you need to make sure you have loaded one of the correct Python modules compiled under GNU 10 to run across all nodes. Do this by first loading the gnu10 module and then the Python modules under it:
module load gnu10 module load python/<version>
- Using “pip install
” within an activated Virtual Environment installs the package in the Virtual Environment’s home directory and is available only to that Virtual Environment. However, if you use “pip install --user ” within an activated Virtual Environment, then the package is installed both in the Virtual Environment’s home directory and in /home/$USER/.local directory. By being installed in the /home/$USER/.local directory that package is then available to all Virtual Environments you create. So, the use of “--user” option with pip install is not recommended.
Using a Python Virtual Environment in a SLURM submission script
Below is a sample SLURM submission script. Save the information into
update the timing information, the
<N_GPUs> to reflect the
number of CPU cores and GPUs you need (referring to the table above) and submit it by entering
#!/bin/bash #SBATCH --partition=gpuq #SBATCH --qos=gpu #SBATCH --job-name=gpu_basics #SBATCH --output=gpu_basics.%j.out #SBATCH --error=gpu_basics.%j.out #SBATCH --nodes=1 #SBATCH --ntasks-per-node=<N_CPU_CORES> #SBATCH --gres=gpu:A100.80gb:<N_GPUs> #SBATCH --mem=<MEMORY> #SBATCH --export=ALL #SBATCH -time=0-01:00:00 # set echo umask 0022 # to see ID and state of GPUs assigned nvidia-smi ## Load the necessary modules module load gnu10 module load python ## Execute your script python main.py
Adding your Python Virtual Environment as a Kernel in JupyterLabs
Python Virtual Environments created on Hopper can be added as kernels to the JupyterLab sessions started under Open OnDemand. To see your Python Virtual Environment as a kernel, first, activate the virtual environment from the command line:
pip install ipykernel
Python -m ipykernel install --user --name=env-name