Skip to content

Managing Python Packages with Conda Environments on the Cluster

Conda environments are an excellent method for managing python packages and libraries in the cluster environment. Previously, a central Anaconda3 Module was used but this usually caused path issues and prevented other modules from working correctly. Instead, it is possible to use a mini version of Anaconda, miniconda which includes just conda nd its dependencies. It is also very small and can be downloaded directly to your /home directory.

The following instructions have steps for idownloading and installing Miniconda in your home directory on the cluster and running it.

Installing Maker in a python virtual environment with Miniconda3

  1. Download miniconda3:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
  1. Install it in your preferred path:
 bash miniconda.sh -b -p $HOME/miniconda

You can now create a custom conda environment. In the steps below, a conda environment for the package 'Maker' is created from a conda environment file.

  1. Activate base conda environment:
 source $HOME/miniconda/bin/activate
 export PYTHONNOUSERSITE=true
  1. Download and create your virtual environment for maker using the environment yml file. This was generated from tests on the cluster so it should have all the necessary libraries. To download the environment.yml file maker_3.01.03_environment.yml use:

wget https://wiki.orc.gmu.edu/mkdocs/maker/maker_3.01.03_environment.yml
Then create the virtual environemnt with:
conda env create -f maker_3.01.03_environment.yml
  1. You should now be able to activate it with:
conda activate maker_3.01.03
  1. Alternatively, you can simply create the environment with
conda env create -n maker_env

then activate it and install necessary libraries/use python with

conda activate maker_3.01.03

python

Running in batch mode with SLURM

Below is an example of a SLURM script (run.slurm) to run in this conda environment. You can modify the different SLURM parameters to match what you need:

Download the run.slurm script:

wget https://wiki.orc.gmu.edu/mkdocs/maker/run.slurm
#!/bin/bash
#SBATCH --job-name=maker_test
#SBATCH --partition=normal
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=64
#SBATCH --output=maker_test_%j.out
#SBATCH --error=maker_test_%j.err
#SBATCH --mem=50GB
#SBATCH --time=0-12:00:00

### Load modules
module unload openmpi4
module load gnu10 openmpi

### Activate virtual environment
source /home/$USER/miniconda/bin/activate
source activate maker_3.01.03

### Set environment variables:
export LIBDIR=/home/$USER/miniconda/envs/maker_3.01.03/share/RepeatMasker/Libraries

### Execute program
mpiexec -n ${SLURM_NTASKS_PER_NODE} maker [OPTIONS]

Run this with the sbatch command:

sbatch run.slurm

If you ssh onto the node on which the jobs starts, you should see it now utilizes all the available cpus for maker. To see which nodes the job is running on, you can use the squeue command:

squeue --me