Python Virtual Environments

There are so many variations in the Python world for virtual environments (VE): pyenv, pyvenv, venv, virtualenv, virtualenvwrapper, and pipenv. The two methods (venv and virtualenv) work similarly to one another, and either one can be chosen. The only difference between these two methods is that virtualenv copies the Python executables into the Virtual Environment folder and venv does not.

Before using Python Virtual Environments, please note that:

Python environments must be built separately on Argo and Hopper. A Python Virtual environment built on Argo (for example) will not work on Hopper.
On Argo, we recommend that you use virtualenv as described below.
On Hopper, we recommend that you use venv. The process is similar to what is described below.

Starting an interactive Session on Hopper GPUs

If you intend to run the Python Virtual Environments on the Hopper DGX-A100 GPU nodes, the first step is to get directly on the GPU node by starting an interactive session with the salloc command

salloc -p gpuq -q gpu -n 1 --ntasks-per-node=1 --gres=gpu:A100.40gb:1 --mem=50GB

Currently available GPU node options

Type of GPU	Slurm setting	No. of GPUs on Node	No. of CPUs	RAM
A100 80GB	--gres=gpu:A100.80gb:nGPUS	4	64	500GB
DGX A100 40GB	--gres=gpu:A100.40gb:nGPUs	8	128	1TB

If you don't want to use DGX nodes you can also use the below command to start an interactive session on the contrib-gpuq partition

salloc -p contrib-gpuq -q gpu -n 1 --ntasks-per-node=1 --gres=gpu:1 --mem=5GB

The way the GPU nodes are partitioned will likely change over time to optimize utilization. Use the sinfo command to view the list of nodes available, the time restriction for each node, and the available partitions.

Once you have the interactive session started, you should load the necessary modules. For Python on the DGX nodes, use the module python/3.8.6-ff which has been built to run across both the CPU nodes and GPU nodes. Currently, other Python modules will not work across both the GPU and CPU nodes because of the differences in architecture.

Creating a Python Virtual Environment (using virtualenv)

Virtualenv is written in pure Python and works everywhere. It creates isolated environments and doesn’t share libraries with other virtual environments. Also, it can optionally be set up to ignore (not access) globally installed libraries if preferred. This method will copy the Python interpreter binary into the virtual environment to trick it into thinking it is isolated. So, in this case, the virtual environment will have a physical copy of the Python executable in the virtual environment directory unlike the “venv” which is just a symbolic link.

Example: install without --system-site-packages using Python 3.6.4

[user@argo-1 ~]$ module load python/3.6.4 
Warning - this python version currently lacks sqlite support
[user@ARGO-1 ~]$ python -m virtualenv test-site-virtualenv-3.6.4-no-sys-pack
Using base prefix '/cm/shared/apps/python/3.6.4'
New python executable in /home/user/test-site-virtualenv-3.6.4-no-sys-pack/bin/python3
Also creating an executable in /home/user/test-site-virtualenv-3.6.4-no-sys-pack/bin/python
Installing setuptools, pip, wheel...done.
[user@Argo-1 ~]$ module unload python/3.6.4
[user@ARGO-1 ~]$ source test-site-virtualenv-3.6.4-no-sys-pack/bin/activate
(test-site-virtualenv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ pip freeze
(test-site-virtualenv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ 
(test-site-virtualenv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ python
Python 3.6.4 (default, Jun  7 2018, 10:05:32) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'numpy'
>>>

Without the --system-site-packages option the Virtual Environment will not know the system-wide installed packages and so, the ‘pip freeze’ command did not produce any output, and importing NumPy fails. You will have to install NumPy to be able to use it. For more information on pip command-line options, see here. Another important thing to note here is that once the virtual The environment is created you no longer need the base Python module loaded to use the Virtual environment.

Installing packages in a Virtual Environment

Packages can be easily installed in a Python Virtual Environment using pip. First, the Virtual Environment must be sourced and then pip can be used. Remember, this must be done from a login node. See the following installation example of the sklearn package.

(test-site-venv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ pip install sklearn
Collecting sklearn
Collecting scikit-learn (from sklearn)
 Downloading [https://files.pythonhosted.org/packages/0c/b2/05be9b6da9ae4a4c54f537be22e95833f722742a02b1e355fdc09363877c/scikit_learn-0.20.0-cp36-cp36m-manylinux1_x86_64.whl](https://files.pythonhosted.org/packages/0c/b2/05be9b6da9ae4a4c54f537be22e95833f722742a02b1e355fdc09363877c/scikit_learn-0.20.0-cp36-cp36m-manylinux1_x86_64.whl) (5.3MB)
   100% |████████████████████████████████| 5.3MB 110kB/s 
Collecting scipy>=0.13.3 (from scikit-learn->sklearn)
 Using cached [https://files.pythonhosted.org/packages/a8/0b/f163da98d3a01b3e0ef1cab8dd2123c34aee2bafbb1c5bffa354cc8a1730/scipy-1.1.0-cp36-cp36m-manylinux1_x86_64.whl](https://files.pythonhosted.org/packages/a8/0b/f163da98d3a01b3e0ef1cab8dd2123c34aee2bafbb1c5bffa354cc8a1730/scipy-1.1.0-cp36-cp36m-manylinux1_x86_64.whl)
Requirement already satisfied: numpy>=1.8.2 in ./test-site-venv-3.6.4-no-sys-pack/lib/python3.6/site-packages (from scikit-learn->sklearn)
Installing collected packages: scipy, scikit-learn, sklearn
Successfully installed scikit-learn-0.20.0 scipy-1.1.0 sklearn-0.0
(test-site-venv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$

Quitting/Exiting the Python Virtual Environment

To go back to the normal shell and exit Python Virtual Environment use the command ‘deactivate’.

[user@ARGO-1 ~]$ source test-site-venv-3.6.4-no-sys-pack/bin/activate
(test-site-venv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$
(test-site-venv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ deactivate
[user@ARGO-1 ~]$

Things to remember before creating a virtual environment

To create a Python Virtual Environment from a login node, a Python module must first be loaded using the ‘module load’ command.

On Hopper, you need to make sure you have loaded one of the correct Python modules compiled under GNU 10 to run across all nodes. Do this by first loading the gnu10 module and then the Python modules under it:

module load gnu10
module load python/<version>

On Argo, Python virtual environments must be created from a login node. Creation of a Virtual environment from compute nodes will fail because compute nodes do not have write access to the /home/$USER directory, which is where Python wants to create Virtual Environment folders and config files. Let’s see what happens when we try to create a Virtual Environment from a compute node.

[user@NODE008 ~]$ python -m venv test-site-venv-3.6.4-no-sys-pack-compute-node
 Error: [Errno 30] Read-only file system: '/home/user/test-site-venv-3.6.4-no-sys-pack-compute-node'

However, once created, the Virtual Environment can be sourced from a compute node and Python can be used without loading the Python module by the ‘module load’ command. In the following example, an already created Virtual Environment is sourced from a compute node, and Python is made available.

[user@NODE008 ~]$ source test-site-venv-3.6.4-no-sys-pack/bin/activate
(test-site-venv-3.6.4-no-sys-pack) [user@NODE008 ~]$ python
Python 3.6.4 (default, Jun  7 2018, 10:05:32) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Compute nodes on Hopper have read/write access to /home/$USER.

Using “pip install ” within an activated Virtual Environment installs the package in the Virtual Environment’s home directory and is available only to that Virtual Environment. However, if you use “pip install --user ” within an activated Virtual Environment, then the package is installed both in the Virtual Environment’s home directory and in /home/$USER/.local directory. By being installed in the /home/$USER/.local directory that package is then available to all Virtual Environments you create. So, the use of “--user” option with pip install is not recommended.

Using a Python Virtual Environment in a Slurm submission script

Here is a sample script that you can use for submitting jobs for Python scripts that use a Python Virtual Environment.

#!/bin/sh

## Give your job a name to distinguish it from other jobs you run.
#SBATCH --job-name=<MyJobName>
#SBATCH --partition=<PartitionName>        
## Separate output and error messages into 2 files.
## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID
#SBATCH --output=/scratch/%u/%x-%N-%j.out  # Output file
#SBATCH --error=/scratch/%u/%x-%N-%j.err   # Error file

## Slurm can send you updates via email
#SBATCH --mail-type=BEGIN,END,FAIL         # ALL,NONE,BEGIN,END,FAIL,REQUEUE,..
#SBATCH --mail-user=<GMUnetID>@gmu.edu     # Put your GMU email address here

## Specify how much memory your job needs. 
#SBATCH --mem=<X>G        # Total memory needed per task (units: K,M,G,T)

## Specify how much time your job needs. (default: see partition above)
#SBATCH --time=<D-HH:MM>  # Total time needed for job: Days-Hours:Minutes

## Load the relevant modules needed for the job
module load python/<version>
source <path/to/your/VE>/bin/activate

## Run your program or script
python <your_script.py>

Note: you need to be sure to replace all values in (including the "<" and ">") with appropriate values for you. Any "<" or ">" symbols that remain in #SBATCH commands will generate errors.

The key is the "source .../bin/activate" command that activates the virtual environment. This command must come after the appropriate Python module is loaded, and before the Python script is executed.

Adding your Python Virtual Environment as a Kernel in JupyterLabs

Python Virtual Environments created on Hopper can be added as kernels to the JupyterLab sessions started under Open OnDemand. To see your Python Virtual Environment as a kernel, first, activate the virtual environment from the command line:

source test-env/bin/activate

Next pip install ipykernel

pip install ipykernel

and then add the kernel for the virtual environment

python -m ipykernel install --user --name=env-name

Finally, deactivate your environment. After this, the next time you start a JupyterLab session on Open OnDemand you should see your added kernel as an option.