Skip to main content

Persistant Python Virtual Environments

When accessing a remote environment to train a model or perform other GPU task activities we will need a way to save multiple virtual environments and also install packages and download models. Here are some recommendations to use the /data/common directory on your GPU environment for persistence.

 

Python Virtual environments

We recommend using python virtual environments to allow the installation of different python and library versions.

Making python virtual environments persistent on your GPU instance. Remember to use your /data/xxxxxx directory to create your environment if you have shared storage attached.

python -m venv /data/xxxxxx/myenv

Accessing the python virtual environment after creation.

source /data/xxxxxx/myenv/bin/activate

The above will ensure that your environment is permanent.

 

Export this environment variables on your GPU instance.

# Temporary files
export TMPDIR=/data/common/tmp
# pip cache
export PIP_CACHE_DIR=/data/common/pip-cache
# PyTorch cache (e.g., model weights)
export TORCH_HOME=/data/common/torch-cache
# Hugging Face cache directories
export HF_HOME=/data/common/hf_home
export HF_DATASETS_CACHE=/data/common/hf_datasets_cache
export TRANSFORMERS_CACHE=/data/common/transformers_cache
# General XDG-compliant cache directory
export XDG_CACHE_HOME=/data/common/hf_xdg_cache

Now you can go ahead and install any library needed. It will be installed in the right directory for persistency on your GPU instance, for example:

pip install --upgrade pip
pip install torch transformers datasets