AI workloads bring a new class of cost and efficiency challenges: GPU underutilization, unpredictable token consumption, and the complexity of multi-model agentic environments. These things can make it hard to understand what you’re spending, but the tools and practices to fix that exist today.
This session walks through a practical framework for taking control of AI infrastructure costs across three stages: visibility, optimization, and sustainability. Drawing on Capital One’s real-world experience, we cover how to instrument observability for AI workloads, which KPIs matter for GPU utilization and model performance, and how to use intelligent routing and token optimization to reduce waste without sacrificing capability. Attendees leave with a working monitoring approach, concrete cost optimization techniques, and automation strategies they can apply immediately.