Understanding GPU accelerated VM and GPUaaS Instances

hosted.ai provides two distinct ways to deliver compute and GPU resources: VM Instances and GPUaaS Instances.

Although both options rely on the same underlying hardware, they operate on completely different layers of the platform. This page explains how they work, how they differ, and when each should be used.

VM Instances (Virtual Machines with GPU Passthrough)

VM Instances are traditional virtual machines, powered by KVM virtualization. Here, a physical GPU is mapped directly and exclusively to a VM.

Key characteristics

1:1 GPU passthrough
Each VM receives a full, dedicated GPU. No sharing, no partitioning
Full operating system access
User can manage their own OS environment, including kernels, drivers and system services
Storage persistence
Anything stored on the VM’s disks remain available until the instance or its volume is explicitly deleted including OS and any other data
High isolation
GPU and compute resources are not shared with other tenants
Predictable performance
Since the entire GPU is allocated to a single VM, workloads receive consistent and stable performance

When to use VM Instances

VM is ideal when users need:

Complete control over the OS
Custom GPU drivers or libraries
Long running workloads
Workloads that must not share GPUs with other tenants
Environments similar to dedicated bare-metal servers

GPUaaS Instances (Containers with access to GPU Pools)

GPUaaS is a Kubernetes based compute layer designed for running containerised workloads.
Instead of provisioning a VM, users launch pods that are assigned GPU resources from a GPU Pool.

What GPU Pools are

Providers group one or more GPUs into Pools and define a GPU scheduling mode (read more about the supported GPU scheduling modes here), as well as accommodating parameters like the Time Quantum in cases where the Pool is configured to use temporal scheduling.

This allows the same physical GPU to be either dedicated to one user or be shared across multiple users, depending on how the Pool is configured

Key characteristics

ssh access to a linux environment with additional port service mapping
Users access GPUaaS instances in the same way as they access VMs. They can expose up to 4 network ports on an external IP to map services to other users
Container based workloads
Users are connecting to an isolated pod rather than a VM
Flexible GPU sharing and overcommit
GPU pools can be configured with a sharing ratio that allows exclusive 1:1 mapping of GPUs or shared across multiple tenants up to 10 times
Fast startup
Kubernetes orchestrates pods for rapid deployment and rapid teardown
Pod lifecycle control
Pods can be started, stopped restarted and deleted
Infiniband fabric support
When infiniband interconnect fabric is enabled, users can scale up to 32 GPUs in one location with full memory coherency
Furiosa NPU support
Pods can be started, stopped restarted and deleted

When to use GPUaaS Accelerators

GPUaaS is the right fit when you need to provide:

Fast spin up/Spin down of environments
Multi-tenant GPU sharing of the same GPU resource
More control of the installed software environment
Inference or batch style GPU processing
Flexible and affordable GPU resources