How do you run 100+ production microservices on Spot instances without waking up at 3 AM?. This talk reveals the “Safety Net” strategy: using Karpenter for dynamic rightsizing and Kubernetes Native Primitives (PodAntiAffinity and PDBs) as financial guardrails. I’ll prove that you don’t need a massive team to save 70%—you just need a “Janitor” Lambda to clean up the cloud waste and an architecture designed to be interrupted
This session provides a blueprint for building a “Safe Spot” environment:
Infrastructure as Code (Terraform): Automating Karpenter NodePools with wide instance diversification (e.g., C, M, R families) to handle regional stockouts.
The “Janitor” Lambda: A custom Python-based automation that identifies “Zombie” resources, converts GP2 to GP3 for an immediate 20% saving, and alerts owners of idle RDS instances.
Monitoring & Reporting: Using Kubecost to move from “Total Bill” visibility to “Unit Economics” (Cost per Pod-Hour) per microservice