hosted.ai provides two distinct ways to deliver compute and GPU resources: VM Instances and GPUaaS Instances.
Although both options rely on the same underlying hardware, they operate on completely different layers of the platform. This page explains how they work, how they differ, and when each should be used.
VM Instances (Virtual Machines with GPU Passthrough)
VM Instances are traditional virtual machines, powered by KVM virtualization. Here, a physical GPU is mapped directly and exclusively to a VM.
Key characteristics
1:1 GPU passthrough
Each VM receives a full, dedicated GPU. No sharing, no partitioningFull operating system access
User can manage their own OS environment, including kernels, drivers and system servicesStorage persistence
Anything stored on the VM’s disks remain available until the instance or its volume is explicitly deleted including OS and any other dataHigh isolation
GPU and compute resources are not shared with other tenantsPredictable performance
Since the entire GPU is allocated to a single VM, workloads receive consistent and stable performance
When to use VM Instances
VM is ideal when users need:
Complete control over the OS
Custom GPU drivers or libraries
Long running workloads
Workloads that must not share GPUs with other tenants
Environments similar to dedicated bare-metal servers
GPUaaS Instances (Containers with access to GPU Pools)
GPUaaS is a Kubernetes based compute layer designed for running containerised workloads.
Instead of provisioning a VM, users launch pods that are assigned GPU resources from a GPU Pool.
What GPU Pools are
Providers group one or more GPUs into Pools and define a GPU scheduling mode (read more about the supported GPU scheduling modes here), as well as accommodating parameters like the Time Quantum in cases where the Pool is configured to use temporal scheduling.
This allows the same physical GPU to be either dedicated to one user or be shared across multiple users, depending on how the Pool is configured
Key characteristics
ssh access to a linux environment with additional port service mapping
Users access GPUaaS instances in the same way as they access VMs. They can expose up to 4 network ports on an external IP to map services to other usersContainer based workloads
Users are connecting to an isolated pod rather than a VMFlexible GPU sharing and overcommit
GPU pools can be configured with a sharing ratio that allows exclusive 1:1 mapping of GPUs or shared across multiple tenants up to 10 timesFast startup
Kubernetes orchestrates pods for rapid deployment and rapid teardownPod lifecycle control
Pods can be started, stopped restarted and deletedInfiniband fabric support
When infiniband interconnect fabric is enabled, users can scale up to 32 GPUs in one location with full memory coherencyFuriosa NPU support
Pods can be started, stopped restarted and deleted
When to use GPUaaS Accelerators
GPUaaS is the right fit when you need to provide:
Fast spin up/Spin down of environments
Multi-tenant GPU sharing of the same GPU resource
More control of the installed software environment
Inference or batch style GPU processing
Flexible and affordable GPU resources