The following steps document how to Update the CUDA Version using /usr/local/cuda Symlink Inside a Pod. They can be executed from within a deployed pod when you SSH into it using the details given on the Team > GPUaaS tab.
These steps assume you have a running pod with GPU access via the NVIDIA device plugin, a base CUDA image (e.g., nvidia/cuda:11.7.1-base-ubuntu20.04), and want to update to a newer CUDA version (e.g., CUDA 12.2). You’ll install the new CUDA toolkit and update the /usr/local/cuda symlink.
1. Access the Pod
Access to the pod via SSH
2. Verify Current CUDA Version
Check the current CUDA version:
nvcc --version
This shows the version of the CUDA compiler (e.g., CUDA 11.7).Check the host driver version (visible inside the pod):
nvidia-smi
Ensure the driver supports the desired CUDA version (e.g., CUDA 12.2 requires driver >= 520.61.05).
3. Download the Desired CUDA Toolkit
Inside the pod, download the CUDA toolkit installer for the desired version. For example, to install CUDA 12.2 on Ubuntu 20.04:
wget https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_520.61.05_linux.runNote: You need wget or curl installed. If not available, install them:
apt-get update && apt-get install -y wget
[Note: If apt-get fails due to a read-only filesystem or lack of permissions, you may need to adjust the container’s security context or use a different base image with these tools pre-installed]
4. Install the CUDA Toolkit
Run the CUDA installer to install the new toolkit. By default, it installs to /usr/local/cuda-12.2 (or similar). Avoid overwriting the existing /usr/local/cuda symlink yet:
sh cuda_12.2.0_520.61.05_linux.run --toolkit --silent --installpath=/usr/local/cuda-12.2--toolkit: Installs only the CUDA toolkit (not the driver, since the host driver is managed by the NVIDIA device plugin).
--silent: Runs the installer non-interactively.
--installpath: Specifies the installation directory.
If the filesystem is read-only or you lack permissions, you may need to:
Mount a writable directory (e.g., /tmp) and install there, then move files.
Adjust the pod’s security context to allow privileged operations (see below).
5. Update the /usr/local/cuda Symlink
Update the /usr/local/cuda symlink to point to the new CUDA version:
ln -sfn /usr/local/cuda-12.2 /usr/local/cudaVerify the symlink:
ls -l /usr/local/cuda
It should point to /usr/local/cuda-12.2.
6. Update Environment Variables
Ensure the new CUDA version is used by updating environment variables like PATH and LD_LIBRARY_PATH:export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
[Note: To persist these changes in the current session, add them to the shell profile (e.g., ~/.bashrc) or application configuration. Note that these changes are lost when the pod restarts unless persisted in the image.]
7. Verify the New CUDA Version
Check the updated CUDA version:
nvcc --version
It should now show CUDA 12.2 (or the installed version).
Run a test command to ensure GPU functionality:nvidia-smi
If the command fails or shows errors, there may be a mismatch between the CUDA version and the host driver.