Python Virtual Environments
There are so many variations in the Python world for virtual environments (VE): pyenv, pyvenv, venv, virtualenv, virtualenvwrapper, and pipenv. The two methods (venv and virtualenv) work similarly to one another, and either one can be chosen. The only difference between these two methods is that virtualenv copies the Python executables into the Virtual Environment folder and venv does not.
Before using Python Virtual Environments, please note that:
- Python environments must be built separately on Argo and Hopper. A Python Virtual environment built on Argo (for example) will not work on Hopper.
- On Argo, we recommend that you use virtualenv as described below.
- On Hopper, we recommend that you use venv. The process is similar to what is described below.
Starting an interactive Session on Hopper GPUs
If you intend to run the Python Virtual Environments on the Hopper DGX-A100 GPU nodes, the first step is to get directly on the
GPU node by starting an interactive session with the salloc
command
salloc -p gpuq -q gpu -n 1 --ntasks-per-node=1 --gres=gpu:A100.40gb:1 --mem=50GB
Type of GPU | Slurm setting | No. of GPUs on Node | No. of CPUs | RAM |
---|---|---|---|---|
A100 80GB | --gres=gpu:A100.80gb:nGPUS | 4 | 64 | 500GB |
DGX A100 40GB | --gres=gpu:A100.40gb:nGPUs | 8 | 128 | 1TB |
If you don't want to use DGX nodes you can also use the below command to start an interactive session on the contrib-gpuq partition
salloc -p contrib-gpuq -q gpu -n 1 --ntasks-per-node=1 --gres=gpu:1 --mem=5GB
sinfo
command to view the list of nodes available, the time restriction for each node, and the available partitions.
Once you have the interactive session started, you should load the necessary modules. For Python on the DGX nodes,
use the module python/3.8.6-ff
which has been built to run across both the CPU nodes and GPU nodes. Currently,
other Python modules will not work across both the GPU and CPU nodes because of the differences in architecture.
Creating a Python Virtual Environment (using virtualenv)
Virtualenv is written in pure Python and works everywhere. It creates isolated environments and doesn’t share libraries with other virtual environments. Also, it can optionally be set up to ignore (not access) globally installed libraries if preferred. This method will copy the Python interpreter binary into the virtual environment to trick it into thinking it is isolated. So, in this case, the virtual environment will have a physical copy of the Python executable in the virtual environment directory unlike the “venv” which is just a symbolic link.
Example: install without --system-site-packages using Python 3.6.4
[user@argo-1 ~]$ module load python/3.6.4
Warning - this python version currently lacks sqlite support
[user@ARGO-1 ~]$ python -m virtualenv test-site-virtualenv-3.6.4-no-sys-pack
Using base prefix '/cm/shared/apps/python/3.6.4'
New python executable in /home/user/test-site-virtualenv-3.6.4-no-sys-pack/bin/python3
Also creating an executable in /home/user/test-site-virtualenv-3.6.4-no-sys-pack/bin/python
Installing setuptools, pip, wheel...done.
[user@Argo-1 ~]$ module unload python/3.6.4
[user@ARGO-1 ~]$ source test-site-virtualenv-3.6.4-no-sys-pack/bin/activate
(test-site-virtualenv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ pip freeze
(test-site-virtualenv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$
(test-site-virtualenv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ python
Python 3.6.4 (default, Jun 7 2018, 10:05:32)
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'numpy'
>>>
Installing packages in a Virtual Environment
Packages can be easily installed in a Python Virtual Environment using pip. First, the Virtual Environment must be sourced and then pip can be used. Remember, this must be done from a login node. See the following installation example of the sklearn package.
(test-site-venv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ pip install sklearn
Collecting sklearn
Collecting scikit-learn (from sklearn)
Downloading [https://files.pythonhosted.org/packages/0c/b2/05be9b6da9ae4a4c54f537be22e95833f722742a02b1e355fdc09363877c/scikit_learn-0.20.0-cp36-cp36m-manylinux1_x86_64.whl](https://files.pythonhosted.org/packages/0c/b2/05be9b6da9ae4a4c54f537be22e95833f722742a02b1e355fdc09363877c/scikit_learn-0.20.0-cp36-cp36m-manylinux1_x86_64.whl) (5.3MB)
100% |████████████████████████████████| 5.3MB 110kB/s
Collecting scipy>=0.13.3 (from scikit-learn->sklearn)
Using cached [https://files.pythonhosted.org/packages/a8/0b/f163da98d3a01b3e0ef1cab8dd2123c34aee2bafbb1c5bffa354cc8a1730/scipy-1.1.0-cp36-cp36m-manylinux1_x86_64.whl](https://files.pythonhosted.org/packages/a8/0b/f163da98d3a01b3e0ef1cab8dd2123c34aee2bafbb1c5bffa354cc8a1730/scipy-1.1.0-cp36-cp36m-manylinux1_x86_64.whl)
Requirement already satisfied: numpy>=1.8.2 in ./test-site-venv-3.6.4-no-sys-pack/lib/python3.6/site-packages (from scikit-learn->sklearn)
Installing collected packages: scipy, scikit-learn, sklearn
Successfully installed scikit-learn-0.20.0 scipy-1.1.0 sklearn-0.0
(test-site-venv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$
Quitting/Exiting the Python Virtual Environment
To go back to the normal shell and exit Python Virtual Environment use the command ‘deactivate’.
[user@ARGO-1 ~]$ source test-site-venv-3.6.4-no-sys-pack/bin/activate
(test-site-venv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$
(test-site-venv-3.6.4-no-sys-pack) [user@ARGO-1 ~]$ deactivate
[user@ARGO-1 ~]$
Things to remember before creating a virtual environment
- To create a Python Virtual Environment from a login node, a Python module must first be loaded using the ‘module load’ command.
On Hopper, you need to make sure you have loaded one of the correct Python modules compiled under GNU 10 to run across all nodes. Do this by first loading the gnu10 module and then the Python modules under it:
module load gnu10
module load python/<version>
- On Argo, Python virtual environments must be created from a login node. Creation of a Virtual environment from compute nodes will fail because compute nodes do not have write access to the /home/$USER directory, which is where Python wants to create Virtual Environment folders and config files. Let’s see what happens when we try to create a Virtual Environment from a compute node.
[user@NODE008 ~]$ python -m venv test-site-venv-3.6.4-no-sys-pack-compute-node
Error: [Errno 30] Read-only file system: '/home/user/test-site-venv-3.6.4-no-sys-pack-compute-node'
[user@NODE008 ~]$ source test-site-venv-3.6.4-no-sys-pack/bin/activate
(test-site-venv-3.6.4-no-sys-pack) [user@NODE008 ~]$ python
Python 3.6.4 (default, Jun 7 2018, 10:05:32)
[GCC 7.3.1 20180303 (Red Hat 7.3.1-5)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
/home/$USER
.
- Using “pip install
” within an activated Virtual Environment installs the package in the Virtual Environment’s home directory and is available only to that Virtual Environment. However, if you use “pip install --user ” within an activated Virtual Environment, then the package is installed both in the Virtual Environment’s home directory and in /home/$USER/.local directory. By being installed in the /home/$USER/.local directory that package is then available to all Virtual Environments you create. So, the use of “--user” option with pip install is not recommended.
Using a Python Virtual Environment in a Slurm submission script
Here is a sample script that you can use for submitting jobs for Python scripts that use a Python Virtual Environment.
#!/bin/sh
## Give your job a name to distinguish it from other jobs you run.
#SBATCH --job-name=<MyJobName>
#SBATCH --partition=<PartitionName>
## Separate output and error messages into 2 files.
## NOTE: %u=userID, %x=jobName, %N=nodeID, %j=jobID, %A=arrayID, %a=arrayTaskID
#SBATCH --output=/scratch/%u/%x-%N-%j.out # Output file
#SBATCH --error=/scratch/%u/%x-%N-%j.err # Error file
## Slurm can send you updates via email
#SBATCH --mail-type=BEGIN,END,FAIL # ALL,NONE,BEGIN,END,FAIL,REQUEUE,..
#SBATCH --mail-user=<GMUnetID>@gmu.edu # Put your GMU email address here
## Specify how much memory your job needs.
#SBATCH --mem=<X>G # Total memory needed per task (units: K,M,G,T)
## Specify how much time your job needs. (default: see partition above)
#SBATCH --time=<D-HH:MM> # Total time needed for job: Days-Hours:Minutes
## Load the relevant modules needed for the job
module load python/<version>
source <path/to/your/VE>/bin/activate
## Run your program or script
python <your_script.py>
Note: you need to be sure to replace all
values in #SBATCH
commands will generate errors.
The key is the "source .../bin/activate" command that activates the virtual environment. This command must come after the appropriate Python module is loaded, and before the Python script is executed.
Adding your Python Virtual Environment as a Kernel in JupyterLabs
Python Virtual Environments created on Hopper can be added as kernels to the JupyterLab sessions started under Open OnDemand. To see your Python Virtual Environment as a kernel, first, activate the virtual environment from the command line:
source test-env/bin/activate
pip install ipykernel
python -m ipykernel install --user --name=env-name