
From Batch to AI-Native: How Volcano 1.14 Unifies Training, Inference & Agent Workloads
Running massive AI training jobs, LLM inference workloads, and bursty AI agents on the same Kubernetes cluster is a recipe for wasted GPU capacity, fragmented resource allocation, and skyrocketing cloud costs. The problem isn't just deployment—it's intelligent scheduling that prevents idle resources while maintaining low-latency performance for unpredictable agent workloads.
Jesse Stutler, Maintainer at Volcano, explains how Volcano 1.14 is evolving from a batch scheduling tool into an AI-native unified scheduling platform. With its new multi-scheduler architecture, topology-aware scheduling, and KV cache awareness, Volcano handles the full AI lifecycle—training, inference, and agents—on a single cluster without sacrificing performance or burning through GPU budgets.
Key Topics Covered:
Multi-scheduler architecture with dynamic sharding for batch and agent workloads
Topology-aware scheduling for hyper-node bin packing and network domain optimization
AgentCube: Kubernetes-native platform for bursty, short-lived AI agent sessions
Katana: AI inference routing with KV cache awareness, prefix caching, and speculative decoding
Colocation strategies using cgroup v2 to increase deployment density and GPU utilization
Read the full story & transcript at www.tfir.io
#Kubernetes #AIScheduling #Volcano #GPUOptimization #KubeCon #LLMInference #AIAgents #CloudCost #MachineLearning #OpenSource
Jesse Stutler, Maintainer at Volcano, explains how Volcano 1.14 is evolving from a batch scheduling tool into an AI-native unified scheduling platform. With its new multi-scheduler architecture, topology-aware scheduling, and KV cache awareness, Volcano handles the full AI lifecycle—training, inference, and agents—on a single cluster without sacrificing performance or burning through GPU budgets.
Key Topics Covered:
Multi-scheduler architecture with dynamic sharding for batch and agent workloads
Topology-aware scheduling for hyper-node bin packing and network domain optimization
AgentCube: Kubernetes-native platform for bursty, short-lived AI agent sessions
Katana: AI inference routing with KV cache awareness, prefix caching, and speculative decoding
Colocation strategies using cgroup v2 to increase deployment density and GPU utilization
Read the full story & transcript at www.tfir.io
#Kubernetes #AIScheduling #Volcano #GPUOptimization #KubeCon #LLMInference #AIAgents #CloudCost #MachineLearning #OpenSource
The Linux Foundation
The Linux Foundation is a nonprofit consortium dedicated to fostering the growth of Linux and collaborative software development. Founded in 2000, the organization sponsors the work of Linux creator Linus Torvalds and promotes, protects and advances the L...
Enhancing Your Gaming Experience on Linux With Sched_ext - Changwoo Min, Igalia
The Linux Foundation
Golden Kubestronaut Fabrizio Sgura say you can go a lot farther than you think!
The Linux Foundation