Skip to main content

GPUaaS Node Preparation

A GPUaaS Node can have different roles:

  • Controller

  • Worker

  • Storage

The first node added to the GPUaaS 2.0 will have the Controller and Worker Roles.

You can add a separate node with sufficient disk space and assign it the storage role.

Optionally, the Controller node can also support the storagerole if specified.

However, do note that GPUaaS 2.0 currently supports only 1 storage node per region.

You can add one or many GPUaaS nodes as dedicated worker nodes.

GPUaaS Node Preparation

Please ensure that your server hostname has only lowercase characters.

e.g gpu01 , not GPU01

Create a service account (ex: hai) in all Nodes

bash
root@gpu01:~# cat /etc/passwd | grep hai
  hai:x:1001:1001::/home/hai:/bin/bash
 root@gpu01:~# 

Service account should have sudo privilege (full root access) with NOPASSWD flag enabled.

bash
root@gpu01:~# sudo -U hai -l  | tail -2
 User hai may run the following commands on gpu01:
    (ALL : ALL) NOPASSWD: ALL
 root@gpu01:~# 

Storage Node Preparation

Storage Node should allocate disk space for two different needs:

1. Persistent/Shared Storage

Assign one or more block storage devices (physical disks, RAID arrays or external storage) to meet your persistent storage with a Volume Group created on top of it, so you should see something like this with hai_storage as the example.

bash
root@gpu01:~# vgs
  VG          #PV #LV #SN Attr   VSize   VFree  
  hai_storage   1   0   0 wz--n- <18.51t <18.51t
root@gpu01:~# 

2. Ephemeral Storage

Kubernetes uses ephemeral storage from all the nodes where the pods are scheduled. Hence, we need to provision a filesystem on each GPUaaS node.

Size of the ephemeral storage filesystem depends on the number of GPU cards and Node's Role. If the root filesystem has sufficient space the following steps can be skipped.

Create ephemeral storage on each node using below steps

Assuming /dev/md0 as the disk assigned for ephemeral storage, format the volume as ext4.

bash
$ sudo mkfs.ext4 /dev/md0 

Create the mount point & mount the filesystem:

bash
$ sudo mkdir -p /var/lib/ephemeral ; mount /dev/md0 /var/lib/ephemeral

Make the filesystem mount persistent:

bash
$ sudo vi /etc/fstab

### Add below entry to the config file
/dev/md0 /var/lib/ephemeral ext4 defaults 0 0

Create Bind mount points:

bash
$ sudo mkdir -p /var/lib/ephemeral/kubelet /var/lib/ephemeral/containerd

Mount the directories /var/lib/kubelet & /var/lib/containerd as bind mounts inside var/lib/ephemeral.

bash
$ sudo mount --bind /var/lib/ephemeral/kubelet /var/lib/kubelet ; sudo mount --bind /var/lib/ephemeral/containerd /var/lib/containerd

Add these bind mounts to /etc/fstab for persistence:

bash
$ sudo vi /etc/fstab

### Add below entry to the config file
/var/lib/ephemeral/kubelet /var/lib/kubelet none bind,nofail,x-systemd.requires=/var/lib/ephemeral 0 0
/var/lib/ephemeral/containerd /var/lib/containerd none bind,nofail,x-systemd.requires=/var/lib/ephemeral 0 0

Verify the mounts and filesystem:

bash
$ sudo mount | grep -E 'kubelet|containerd' ; df -ThP /var/lib/ephemeral