Key Features and Improvements
1️⃣ Credit-Based Scheduler
The Credit Based Scheduler introduces a dynamic, fair, and efficient scheduling mechanism for task execution across namespaces. This release ensures optimal resource utilisation, prioritisation based on accumulated credits, and robust handling of various task scenarios.
Core Capabilities
Real-time resource consumption monitoring
Automated pre-emption of lower credit tasks
Low latency performance for inferencing tasks.
-> Key Features & Implementation Details
Credit Accumulation and Utilisation
Namespaces accumulate credits when tasks complete before their allocated Time Quantum (TQ) or when idle during a cycle.
Credits are capped at 5 × TQ to prevent excessive accumulation.
Namespaces with higher credits are prioritised for scheduling, and credits are reduced as TQ is consumed.
Benefits
✔ Fair usage enforcement
✔ Low latency performance
2️⃣ Cluster Panel Support on KVM
Unified interface for management of VM’s on KVM clusters, to provide support to GPU passthrough.
Key Features
Support for clusters of compute servers.
End to End Management of VM’s with GPU passthrough from the user panel, admin panel & compute cluster panel.
Billing for consumed VM’s resources.
Simple package installation on Ubuntu 24.04, in addition with cluster management by a centralised admin panel.
Software defined Integrated Storage Technology
-> Key Supported features
Full VM lifecycle support via supported template list [create/delete/start/stop/restart/resize]Public IP address assignment
Virtual console support for the end users
GPU & PCI Device Passthrough support
Impact
→ Deployment of dedicated GPU VM’s
→ Service based instance provisioning
→ Persistent storage for the root file system.
🔜 Planned Enhancements
Enhanced Storage Redundancy
Cluster Panel Controller Failover
Stats Reporting
VM Live Migration
3️⃣ Enhanced Billing & Resource Limit Enforcement
Full overhaul of billing logic for precision, trust, and control.
Included Improvements
Dynamic time & timezone aware Billing engine
Enhanced Self-service Billing Transparency
Summarised team Billing consumption API’s
Hierarchical Global & Regional Billing Enhancements
Outcome
🔹 Enhanced Billing Transparency for the End Users.
🔹 Higher granularity for Billing
🔹 Improve the Developer experience working with Billing API’s.
4️⃣ Automated End-User Application Port Exposure
End user Self Service publication of Network Port to Internal or External clients.
Capabilities Added
Automated endpoint assignment & routing
End user Self-service visibility, control & removal
Dynamic linking to GPU accelerated services from the User Panel.
User Value
-> Rapid delivery of production-ready applications
-> Enable multi-tenant services at scale
Improvements to existing features
Task ID | Issue Summary | Customer Benefits |
HAI-4027 | Hide Workspacess and their count if there is no cluster attached & create instances button if there is no ability for the user to create a KVM instance. | This prevents confusion and keeps the interface clear by only showing workspace information when it’s relevant. It does not show create instances button if there is no ability for the user to create a KVM instance. |
HAI-3627 | Add a API endpoint to get all of a teams GPUaaS subscription details and this enables a all regions filter. | Saves time and effort by eliminating the need to check each region individually, making subscription management faster and easier. |
HAI-3382 | Pagination implemented for Currencies module (objects) | Pagination introduction enables users to seamlessly navigate through all available currencies without performance constraints. |
HAI-1997 | Exposing Hourly Billing to users | Users can now monitor and report resource consumption at an hourly level for more accurate invoicing, granular cost insights. The retention flexibility helps optimize storage while supporting detailed billing analysis. |
Bug fixes
Task ID | Release Notes |
|---|---|
HAI-4026 | Fixed the issue where pod audit logs were being duplicated in the API response. With this fix, the API now returns only unique pod audit logs. |
HAI-3971 | Fixed a bug in vGPU resource allocation where the system incorrectly calculated available vGPUs, resulting in "Insufficient resources" errors for valid subscription requests. The calculation now correctly checks if a single instance can fit, ensuring accurate vGPU availability reporting and successful subscriptions for larger instance types. |
HAI-3964 | The status-based filtering now works as expected, allowing users to retrieve logs accurately without encountering 500 errors. |
HAI-3839 | The system now soft-deletes old pricing records when new rates are introduced, ensuring billing uses only the latest rate active during the usage period. |
HAI-3828 | Resolved the issue where HostedAI was being used in reply-to for user panel emails rather than a value configured by a Full Administrator. |
HAI-3822 | Fixed the issue where Shared Volumes may still show in the Teams > Shared Storage section although they have been deleted. |
HAI-3783 | Resolved the issue where externally exposed services were unreachable because the new port was not added to the teams NetworkPolicy. |
HAI-3767 | Logic has been updated to ensure only the latest price per resource is shown in pricing policies |
HAI-3757 | Resolved an issue where the Workspaces section displayed an empty list even though workspaces existed. |
HAI-3732 | Resolved an issue where users could subscribe to a GPUaaS pool while an unsubscription was still in progress, leading to inconsistent subscription states between the user and admin panels. |
HAI-3731 | Added the security level information for GPUaaS pools in the Admin panel post Pool creation. |
HAI-3728 | On a new install the default image policy automatically includes the latest images and resolves an issue caused where the default policy contained no images. |
HAI-3727 | Made changed to make it easier for users to identify and resolve subscription issues, for example no resources available in resource policy. |
HAI-3725 | Made changes to ensure all GPUaaS nodes are reliably registered, allowing users to add and initialise multiple nodes efficiently without manual workarounds. |
HAI-3724 | We have fixed an issue where removing a GPUaaS pool caused existing billing statistics to disappear for teams. The fix ensures billing statistics are retained even after a pool is deleted, allowing users and teams to maintain accurate historical billing records. |
HAI-3721 | The UI logic was updated to show each power action (start/stop/reboot) only when appropriate for the pod’s current state. |
HAI-3713 | Resolved an issue where GPU Workload Images were incorrectly displayed under the “Included Images” column after creating an Image Policy. |
HAI-3704 | Fixed an issue where users were unable to create a new Pricing Policy due to an “Invalid Request Data” error, even when all required fields were filled correctly. |
HAI-3678 | Resolved an issue where network policies in multi-node environments blocked pod ingress and egress traffic across nodes, preventing SSH access to pods via tunnel interfaces. |
HAI-3646 | Prevented nodes from leaving the cluster if they have GPUs assigned to a pool. Now, any attempt to remove a node with GPUs assigned to a pool will be blocked, maintaining cluster integrity. |