GPU | hosted·ai Documentation

Overview

To meet diverse workload needs, choose between virtual machines with dedicated GPU pass-through for maximum control and isolation, or container-based GPU pools for shared, elastic GPU resources.

Virtual Machines with GPU Pass-through

Leverage KVM Virtualization for Isolated Environments

Utilize KVM virtualization to create robust, isolated virtual machines. This allows you to run a complete operating system, separate from other workloads on the same physical hardware, providing a stable foundation for your applications.

Assign Physical GPUs Directly for Maximum Performance

Remap a physical GPU directly and exclusively to a virtual machine. This ensures the VM has sole access to the GPU, bypassing virtualization layers to achieve peak performance for demanding tasks.

Guarantee Exclusive GPU Access

Provide each virtual machine with a full, dedicated GPU or multiple GPUs. This configuration eliminates resource sharing, guaranteeing your VM has exclusive access to its assigned hardware for predictable performance.

Maintain Full Operating System Control

Gain complete operating system access to manage kernels, drivers, packages, and system services. This allows for deep customization and specific configurations required for your unique software stack.

Full OS Control

VMs provide complete OS and driver control, essential for custom setups. GPU Pools offer a different approach with containerized resource management.

GPU Pools (GPU-as-a-Service)

Deploy Containerized Workloads with Kubernetes

Build and manage containerized workloads on a Kubernetes-based compute layer. This architecture enables efficient resource management and elastic scaling for your applications.

Allocate GPU Resources Granularly to Pods

Launch pods that receive GPU resources directly from a GPU pool. This method allows for precise allocation of GPU power to individual containerized applications, optimizing resource utilization.

Configure GPU Pools for Flexible Resource Management

Group one or more GPUs into pools and define scheduling modes and performance isolation profiles. This configuration allows for controlled sharing and utilization of GPUs within the pool, optimizing resource distribution.

Implement Efficient Scheduling Modes

Configure scheduling modes, such as temporal or spatial, to manage GPU resource distribution. Set time quantums for time-slicing modes to provide fine-grained control over resource allocation to multiple users.

Dedicate a GPU to a single user or share it across multiple users within a pool. This flexibility allows for overcommitment of GPU capacity, maximizing utilization and providing elastic resource management.

Choosing the Right Offering

Select Virtual Machines for Complete OS and Driver Control

Use virtual machines with GPU pass-through when your workload requires complete operating system control, including custom GPU drivers or libraries. This option is ideal for long-running workloads, applications demanding high isolation, or environments that need to replicate dedicated bare-metal servers.

Select GPU Pools for Elastic, Containerized Workloads

Use GPU Pools for containerized workloads that benefit from flexible or elastic GPU sharing. This option is optimized for fast startup and scaling, making it suitable for AI inference tasks and short-duration jobs requiring rapid deployment and tear-down.

Compare Isolation Levels

Virtual machines provide higher isolation with dedicated GPUs, ensuring no sharing with other tenants. GPU Pools offer flexible sharing configurations, allowing providers to share resources across multiple users and potentially oversell capacity for greater elasticity.

Compare Control and Flexibility Trade-offs

Virtual machines offer complete control over the operating system and its environment for deep customization. GPU Pools prioritize flexibility and rapid deployment through their container-based, Kubernetes-orchestrated architecture, enabling fast startup and scaling for dynamic workloads.