Skip to main content
You are here: hosted·ai

Understanding GPU accelerated VM and GPUaaS Instances

hosted.ai provides two distinct ways to deliver compute and GPU resources: VM Instances and GPUaaS Instances.

Although both options rely on the same underlying hardware, they operate on completely different layers of the platform. This page explains how they work, how they differ, and when each should be used.

VM Instances (Virtual Machines with GPU Passthrough)

VM Instances are traditional virtual machines, powered by KVM virtualization. Here, a physical GPU is mapped directly and exclusively to a VM.

Key characteristics

  • 1:1 GPU passthrough
    Each VM receives a full, dedicated GPU. No sharing, no partitioning

  • Full operating system access
    User can manage their own OS environment, including kernels, drivers and system services

  • Storage persistence
    Anything stored on the VM’s disks remain available until the instance or its volume is explicitly deleted including OS and any other data

  • High isolation
    GPU and compute resources are not shared with other tenants

  • Predictable performance
    Since the entire GPU is allocated to a single VM, workloads receive consistent and stable performance

When to use VM Instances

VM is ideal when users need:

  • Complete control over the OS

  • Custom GPU drivers or libraries

  • Long running workloads

  • Workloads that must not share GPUs with other tenants

  • Environments similar to dedicated bare-metal servers

GPUaaS Instances (Containers with access to GPU Pools)

GPUaaS is a Kubernetes based compute layer designed for running containerised workloads.
Instead of provisioning a VM, users launch pods that are assigned GPU resources from a GPU Pool.

What GPU Pools are

Providers group one or more GPUs into Pools and define a GPU scheduling mode (read more about the supported GPU scheduling modes here), as well as accommodating parameters like the Time Quantum in cases where the Pool is configured to use temporal scheduling.

This allows the same physical GPU to be either dedicated to one user or be shared across multiple users, depending on how the Pool is configured

Key characteristics

  • ssh access to a linux environment with additional port service mapping
    Users access GPUaaS instances in the same way as they access VMs. They can expose up to 4 network ports on an external IP to map services to other users

  • Container based workloads
    Users are connecting to an isolated pod rather than a VM

  • Flexible GPU sharing and overcommit
    GPU pools can be configured with a sharing ratio that allows exclusive 1:1 mapping of GPUs or shared across multiple tenants up to 10 times

  • Fast startup
    Kubernetes orchestrates pods for rapid deployment and rapid teardown

  • Pod lifecycle control
    Pods can be started, stopped restarted and deleted 

  • Infiniband fabric support
    When infiniband interconnect fabric is enabled, users can scale up to 32 GPUs in one location with full memory coherency 

  • Furiosa NPU support
    Pods can be started, stopped restarted and deleted 

When to use GPUaaS Accelerators

GPUaaS is the right fit when you need to provide:

  • Fast spin up/Spin down of environments

  • Multi-tenant GPU sharing of the same GPU resource

  • More control of the installed software environment

  • Inference or batch style GPU processing

  • Flexible and affordable GPU resources