A GPUaaS Node can have different roles:
Controller
Worker
Storage
The first node added to the GPUaaS 2.0 will have the Controller and Worker Roles.
You can add a separate node with sufficient disk space and assign it the storage role.
Optionally, the Controller node can also support the storagerole if specified.
However, do note that GPUaaS 2.0 currently supports only 1 storage node per region.
You can add one or many GPUaaS nodes as dedicated worker nodes.
GPUaaS Node Preparation
Please ensure that your server hostname has only lowercase characters.
e.g gpu01 , not GPU01
Create a service account (ex: hai) in all Nodes
root@gpu01:~# cat /etc/passwd | grep hai
hai:x:1001:1001::/home/hai:/bin/bash
root@gpu01:~# Service account should have sudo privilege (full root access) with NOPASSWD flag enabled.
root@gpu01:~# sudo -U hai -l | tail -2
User hai may run the following commands on gpu01:
(ALL : ALL) NOPASSWD: ALL
root@gpu01:~# Storage Node Preparation
Storage Node should allocate disk space for two different needs:
1. Persistent/Shared Storage
Assign one or more block storage devices (physical disks, RAID arrays or external storage) to meet your persistent storage with a Volume Group created on top of it, so you should see something like this with hai_storage as the example.
root@gpu01:~# vgs
VG #PV #LV #SN Attr VSize VFree
hai_storage 1 0 0 wz--n- <18.51t <18.51t
root@gpu01:~# 2. Ephemeral Storage
Kubernetes uses ephemeral storage from all the nodes where the pods are scheduled. Hence, we need to provision a filesystem on each GPUaaS node.
Size of the ephemeral storage filesystem depends on the number of GPU cards and Node's Role. If the root filesystem has sufficient space the following steps can be skipped.
Create ephemeral storage on each node using below steps
Assuming /dev/md0 as the disk assigned for ephemeral storage, format the volume as ext4.
$ sudo mkfs.ext4 /dev/md0 Create the mount point & mount the filesystem:
$ sudo mkdir -p /var/lib/ephemeral ; mount /dev/md0 /var/lib/ephemeralMake the filesystem mount persistent:
$ sudo vi /etc/fstab
### Add below entry to the config file
/dev/md0 /var/lib/ephemeral ext4 defaults 0 0Create Bind mount points:
$ sudo mkdir -p /var/lib/ephemeral/kubelet /var/lib/ephemeral/containerdMount the directories /var/lib/kubelet & /var/lib/containerd as bind mounts inside var/lib/ephemeral.
$ sudo mount --bind /var/lib/ephemeral/kubelet /var/lib/kubelet ; sudo mount --bind /var/lib/ephemeral/containerd /var/lib/containerdAdd these bind mounts to /etc/fstab for persistence:
$ sudo vi /etc/fstab
### Add below entry to the config file
/var/lib/ephemeral/kubelet /var/lib/kubelet none bind,nofail,x-systemd.requires=/var/lib/ephemeral 0 0
/var/lib/ephemeral/containerd /var/lib/containerd none bind,nofail,x-systemd.requires=/var/lib/ephemeral 0 0Verify the mounts and filesystem:
$ sudo mount | grep -E 'kubelet|containerd' ; df -ThP /var/lib/ephemeral