Skip to main content

Update CUDA Version from Inside a Pod

The following steps document how to Update the CUDA Version using /usr/local/cuda Symlink Inside a Pod. They can be executed from within a deployed pod when you SSH into it using the details given on the Team > GPUaaS tab.

These steps assume you have a running pod with GPU access via the NVIDIA device plugin, a base CUDA image (e.g., nvidia/cuda:11.7.1-base-ubuntu20.04), and want to update to a newer CUDA version (e.g., CUDA 12.2). You’ll install the new CUDA toolkit and update the /usr/local/cuda symlink.

1. Access the Pod

  • Access to the pod via SSH

2. Verify Current CUDA Version

  • Check the current CUDA version:
    nvcc --version
    This shows the version of the CUDA compiler (e.g., CUDA 11.7).

  • Check the host driver version (visible inside the pod):
    nvidia-smi
    Ensure the driver supports the desired CUDA version (e.g., CUDA 12.2 requires driver >= 520.61.05).

3. Download the Desired CUDA Toolkit

  • Inside the pod, download the CUDA toolkit installer for the desired version. For example, to install CUDA 12.2 on Ubuntu 20.04:
    wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_520.61.05_linux.run

  • Note: You need wget or curl installed. If not available, install them:
    apt-get update && apt-get install -y wget

[Note: If apt-get fails due to a read-only filesystem or lack of permissions, you may need to adjust the container’s security context or use a different base image with these tools pre-installed]

4. Install the CUDA Toolkit

  • Run the CUDA installer to install the new toolkit. By default, it installs to /usr/local/cuda-12.2 (or similar). Avoid overwriting the existing /usr/local/cuda symlink yet:
    sh cuda_12.2.0_520.61.05_linux.run --toolkit --silent --installpath=/usr/local/cuda-12.2

    • --toolkit: Installs only the CUDA toolkit (not the driver, since the host driver is managed by the NVIDIA device plugin).

    • --silent: Runs the installer non-interactively.

    • --installpath: Specifies the installation directory.

  • If the filesystem is read-only or you lack permissions, you may need to:

    • Mount a writable directory (e.g., /tmp) and install there, then move files.

    • Adjust the pod’s security context to allow privileged operations (see below).

  • Update the /usr/local/cuda symlink to point to the new CUDA version:
    ln -sfn /usr/local/cuda-12.2 /usr/local/cuda

  • Verify the symlink:
    ls -l /usr/local/cuda
    It should point to /usr/local/cuda-12.2.

6. Update Environment Variables

Ensure the new CUDA version is used by updating environment variables like PATH and LD_LIBRARY_PATH:
export PATH=/usr/local/cuda/bin:$PATH

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

[Note: To persist these changes in the current session, add them to the shell profile (e.g., ~/.bashrc) or application configuration. Note that these changes are lost when the pod restarts unless persisted in the image.]

7. Verify the New CUDA Version

  • Check the updated CUDA version:
    nvcc --version
    It should now show CUDA 12.2 (or the installed version).

Run a test command to ensure GPU functionality:
nvidia-smi
If the command fails or shows errors, there may be a mismatch between the CUDA version and the host driver.