Update CUDA Version from Inside a Pod | hosted·ai Documentation

The following steps document how to Update the CUDA Version using /usr/local/cuda Symlink Inside a Pod. They can be executed from within a deployed pod when you SSH into it using the details given on the Team > GPUaaS tab.

These steps assume you have a running pod with GPU access via the NVIDIA device plugin, a base CUDA image (e.g., nvidia/cuda:11.7.1-base-ubuntu20.04), and want to update to a newer CUDA version (e.g., CUDA 12.2). You’ll install the new CUDA toolkit and update the /usr/local/cuda symlink.

1. Access the Pod

Access to the pod via SSH

2. Verify Current CUDA Version

Check the current CUDA version:
nvcc --version
This shows the version of the CUDA compiler (e.g., CUDA 11.7).
Check the host driver version (visible inside the pod):
nvidia-smi
Ensure the driver supports the desired CUDA version (e.g., CUDA 12.2 requires driver >= 520.61.05).

3. Download the Desired CUDA Toolkit

Inside the pod, download the CUDA toolkit installer for the desired version. For example, to install CUDA 12.2 on Ubuntu 20.04:
wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_520.61.05_linux.run
Note: You need wget or curl installed. If not available, install them:
apt-get update && apt-get install -y wget

[Note: If apt-get fails due to a read-only filesystem or lack of permissions, you may need to adjust the container’s security context or use a different base image with these tools pre-installed]

4. Install the CUDA Toolkit

Run the CUDA installer to install the new toolkit. By default, it installs to /usr/local/cuda-12.2 (or similar). Avoid overwriting the existing /usr/local/cuda symlink yet:
sh cuda_12.2.0_520.61.05_linux.run --toolkit --silent --installpath=/usr/local/cuda-12.2
- --toolkit: Installs only the CUDA toolkit (not the driver, since the host driver is managed by the NVIDIA device plugin).
- --silent: Runs the installer non-interactively.
- --installpath: Specifies the installation directory.
If the filesystem is read-only or you lack permissions, you may need to:
- Mount a writable directory (e.g., /tmp) and install there, then move files.
- Adjust the pod’s security context to allow privileged operations (see below).

5. Update the /usr/local/cuda Symlink

Update the /usr/local/cuda symlink to point to the new CUDA version:
ln -sfn /usr/local/cuda-12.2 /usr/local/cuda
Verify the symlink:
ls -l /usr/local/cuda
It should point to /usr/local/cuda-12.2.

6. Update Environment Variables

Ensure the new CUDA version is used by updating environment variables like PATH and LD_LIBRARY_PATH:
export PATH=/usr/local/cuda/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

[Note: To persist these changes in the current session, add them to the shell profile (e.g., ~/.bashrc) or application configuration. Note that these changes are lost when the pod restarts unless persisted in the image.]

7. Verify the New CUDA Version

Check the updated CUDA version:
nvcc --version
It should now show CUDA 12.2 (or the installed version).

Run a test command to ensure GPU functionality:
nvidia-smi
If the command fails or shows errors, there may be a mismatch between the CUDA version and the host driver.