Skip to content

Python Virtual Environments

Python Virtual Environments

There are so many variations in the Python world for virtual environments (VE): pyenv, pyvenv, venv, virtualenv, virtualenvwrapper, and pipenv. The two methods (venv and virtualenv) work similarly to one another, and either one can be chosen. The only difference between these two methods is that virtualenv copies the python executables into the VE folder and venv does not.

Before using Python Virtual Environments, please note that:

  • Python environments have to be built separately on ARGO and on HOPPER. A Python Virtual environment built on ARGO (for example) will not work on HOPPER.
  • On ARGO, we recommend that you use virtualenv as described below.
  • On HOPPER, we recommend that you use venv. The process is similar to what is described below.

Starting an interactive Session on Hopper GPUs

If you intend to run the Python Virtual Environments on the Hopper DGX-A100 gpus, the first step is to get directly on the gpu node by starting an interactive session with the salloc command

salloc -p gpuq -q gpu -n 1 --ntasks-per-node=1 --gres=gpu:A100.40gb:1 --mem=50GB
Once you have the interactive session started, you should load the necessary modules. For Python on tthe DGX nodes, use the module python/3.8.6-ff which has been built to run across both the cpus and gpus. Currently, other python modules will not work across both the gpu and cpu nodes because of the differences in architecture.

Creating a Python Virtual Environment (using virtualenv)

Virtualenv is written in pure python and works everywhere. It creates isolated environments and doesn’t share libraries with other virtual environments. Also it can optionally be set up to ignore (not access) globally installed libraries if preferred. This method will copy the Python interpreter binary into the virtual environment in order to trick it into thinking it is isolated. So, in this case, the virtual environment will have a physical copy of the Python executable in the virtual environment directory unlike the “venv” which is just a symbolic link.

Example: install without --system-site-packages using python 3.6.4 ================================================================== 

[user@argo-1 ~]$ module load python/3.6.4 
Warning - this python version currently lacks sqlite support
[user@ARGO-1 ~]$ python -m virtualenv test-site-virtualenv-3.6.4-no-sys-pack
Using base prefix '/cm/shared/apps/python/3.6.4'
New python executable in /home/user/test-site-virtualenv-3.6.4-no-sys-pack/bin/python3
Also creating executable in /home/user/test-site-virtualenv-3.6.4-no-sys-pack/bin/python
Installing setuptools, pip, wheel...done.
[user@Argo-1 ~]$ module unload python/3.6.4
[user@ARGO-1 ~]$ source test-site-virtualenv-3.6.4-no-sys-pack/bin/activate
(test-site-virtualenv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ pip freeze
(test-site-virtualenv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ 
(test-site-virtualenv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ python
Python 3.6.4 (default, Jun  7 2018, 10:05:32) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'numpy'
>>>
Without --system-site-packages option the VE will not know the system-wide installed packages and so, ‘pip freeze’ command did not produce any output and importing NumPy fails. You will have to install NumPy to be able to use it. For more information on pip command-line options, see here. Another important thing to note here is that once the virtual environment is created you no longer need the base python module loaded to use the VE.

Installing packages in a VE

Packages can be easily installed in a python-VE using pip. First, the VE must be sourced and then pip can be used. Remember, this has to be done from a login node. See the following installation example of sklearn package.

(test-site-venv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ pip install sklearn
Collecting sklearn
Collecting scikit-learn (from sklearn)
 Downloading [https://files.pythonhosted.org/packages/0c/b2/05be9b6da9ae4a4c54f537be22e95833f722742a02b1e355fdc09363877c/scikit_learn-0.20.0-cp36-cp36m-manylinux1_x86_64.whl](https://files.pythonhosted.org/packages/0c/b2/05be9b6da9ae4a4c54f537be22e95833f722742a02b1e355fdc09363877c/scikit_learn-0.20.0-cp36-cp36m-manylinux1_x86_64.whl) (5.3MB)
   100% |████████████████████████████████| 5.3MB 110kB/s 
Collecting scipy>=0.13.3 (from scikit-learn->sklearn)
 Using cached [https://files.pythonhosted.org/packages/a8/0b/f163da98d3a01b3e0ef1cab8dd2123c34aee2bafbb1c5bffa354cc8a1730/scipy-1.1.0-cp36-cp36m-manylinux1_x86_64.whl](https://files.pythonhosted.org/packages/a8/0b/f163da98d3a01b3e0ef1cab8dd2123c34aee2bafbb1c5bffa354cc8a1730/scipy-1.1.0-cp36-cp36m-manylinux1_x86_64.whl)
Requirement already satisfied: numpy>=1.8.2 in ./test-site-venv-3.6.4-no-sys-pack/lib/python3.6/site-packages (from scikit-learn->sklearn)
Installing collected packages: scipy, scikit-learn, sklearn
Successfully installed scikit-learn-0.20.0 scipy-1.1.0 sklearn-0.0
(test-site-venv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$

Quiting/Exiting the python-VE

To go back to the normal shell and exit python-VE use the command ‘deactivate’.

[user@ARGO-1 ~]$ source test-site-venv-3.6.4-no-sys-pack/bin/activate
(test-site-venv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$
(test-site-venv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ deactivate
[user@ARGO-1 ~]$ 

Things to remember before creating a virtual environment

Python virtual environments must be created from a login node. Creation of a VE from compute nodes will fail because compute nodes do not have write access to the /home/$USER directory, which is where python wants to create VE folders and config files. Let’s see what happens when we try to create a VE from a compute node.

[user@NODE008 ~]$ python -m venv test-site-venv-3.6.4-no-sys-pack-compute-node
 Error: [Errno 30] Read-only file system: '/home/user/test-site-venv-3.6.4-no-sys-pack-compute-node' 
However, once created, the VE can be sourced from a compute node and python can be used without loading the python module by the ‘module load’ command. In the following example an already created VE is sourced from a compute node, and python is made available.

[user@NODE008 ~]$ source test-site-venv-3.6.4-no-sys-pack/bin/activate
(test-site-venv-3.6.4-no-sys-pack) [user@NODE008 ~]$ python
Python 3.6.4 (default, Jun  7 2018, 10:05:32) 
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
To create a python-VE from a login node, a python module must first be loaded using the ‘module load’ command.

NOTE: Using “pip install ” within an activated VE installs the package in the VE’s home directory and is available only to that VE. However, if you use “pip install --user ” within an activated VE, then the package is installed both in the VE’s home directory and in /home/$USER/.local directory. By being installed in the /home/$USER/.local directory that package is then available to all VEs you create. So use of “--user” option with pip install is not recommended.

Using a Python VE in a Slurm submission script

Here is a sample script that you can use for submitting jobs for Python scripts that use a Python virtual environment.

#!/bin/sh

## Give your job a name to distinguish it from other jobs you run.
#SBATCH --job-name=<MyJobName>

## General partitions: all-HiPri, bigmem-HiPri   --   (12 hour limit)
##                     all-LoPri, bigmem-LoPri, gpuq  (5 days limit)
## Restricted: CDS_q, CS_q, STATS_q, HH_q, GA_q, ES_q, COS_q  (10 day limit)
#SBATCH --partition=<PartitionName>        # Default is all-HiPri

## Separate output and error messages into 2 files.
## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID
#SBATCH --output=/scratch/%u/%x-%N-%j.out  # Output file
#SBATCH --error=/scratch/%u/%x-%N-%j.err   # Error file

## Slurm can send you updates via email
#SBATCH --mail-type=BEGIN,END,FAIL         # ALL,NONE,BEGIN,END,FAIL,REQUEUE,..
#SBATCH --mail-user=<GMUnetID>@gmu.edu     # Put your GMU email address here

## Specify how much memory your job needs. (2G is the default)
#SBATCH --mem=<X>G        # Total memory needed per task (units: K,M,G,T)

## Specify how much time your job needs. (default: see partition above)
#SBATCH --time=<D-HH:MM>  # Total time needed for job: Days-Hours:Minutes


## Load the relevant modules needed for the job
module load python/<version>
source <path/to/your/VE>/bin/activate

## Run your program or script
python <your_script.py>

Note: you need to be sure to replace all values in (including the "<" and ">") with appropriate values for you. Any "<" or ">" symbols that remain in #SBATCH commands will generate errors.

The key is the "source .../bin/activate" command that activates the virtual environment. This command must come after the appropriate python module is loaded, and before the python script is executed.

Limitations of Python Virtual Environments

Some Modules are Incompatible with Virtual Environments

Some of our modules were implemented using a virtual environment. This means that you cannot use them in a virtual environment of your own. An example of this are the TensorFlow modules. If you activate a VE that you have created, and then load a TensorFlow module, your VE will be deactivated (even though it will still be shown on your command prompt).

Here is a current list of modules that were implemented using a virtual environment:

  • intel/python/2018-10-p27
  • intel/python/2018-10-p36
  • keras/2.2.0-py27
  • keras/2.2.0-py36
  • pytorch/0.4.1-py27
  • pytorch/0.4.1-py36
  • pytorch/1.0.1-py36
  • tensorflow/cpu/1.12.0-py27
  • tensorflow/cpu/1.12.0-py36
  • tensorflow/cpu/1.8.0-py27
  • tensorflow/cpu/1.8.0-py36
  • tensorflow/gpu/1.12.0-py27
  • tensorflow/gpu/1.12.0-py36
  • tensorflow/gpu/1.8.0-py27
  • tensorflow/gpu/1.8.0-py36

An Alternative to Virtual Environments

You cannot use any of the modules listed above along with a virtual environment. This means that if you need to install additional python packages, then you either won't be able to use those those modules, or you won't be able to use a VE. Because most people need to install an addition package or two, our general advice is simply: don't use the above modules, and instead install all of the python software that you need into a VE (including pytorch, tensorflow or keras).

However, if you feel you need to use one of those modules, and you need to install additional python packages as well, we recommend this alternative approach.

Let's say that you are working on project A, and you need to install a set of packages for use with TensorFlow. Then later, for project B you need to install another set of packages, but they conflict with the packages in project A (perhaps they contain different versions of the same package). A VE seems like just what you need here, but unfortunately you cannot use one. We offer the following alternative in this case.

It is possible to install packages in any directory using the pip -t option. To setup project A, create some directory for storing those packages (e.g. \~/python-packages/projectA). Then install the appropriate packages in there:

pip install <package1> -t ~/python-packages/projectA
...
pip install <packageN> -t ~/python-packages/projectA

Similarly, setup projectB:

pip install <packageX> -t ~/python-packages/projectB
...

Now you need to modify your PYTHONPATH environment variable to include the project you are working on, but to exclude the project you are not. For example, you might add the following lines to you .bashrc file:

export PYTHONPATH=~/python-packages/projectA:$PYTHONPATH
#export PYTHONPATH=~/python-packages/projectB:$PYTHONPATH

You should always make sure that at least one of them is commented out. You may even want to check that your $PYTHONPATH variable contains only one (and the right one!) of these by using the echo $PYTHONPATH command before you run a job.

Alternatively, instead of modifying the PYTHONPATH in your .bashrc file, you could place the appropriate "export" command in your SLURM submission script. The advantage to this approach is that the right variable will always be set when you submit a job. The main disadvantage is that it will be difficult to do interactive testing on the command line. Whatever you choose, we recommend that you not place these commands in both your ".bashrc" and you SLURM submission script.

This approach may require a little more care on your part than using a VE, but at least it solves the problem.