Prompt Failures and Latency Spikes: Observability for AI

Logs and metrics are great—until you're trying to debug an AI agent that just replied “I'm sorry, Dave, I can’t do that.” Observability in AI systems isn't just about uptime—it's about understanding prompts, latencies, retries, hallucinations, token usage, and model behavior under pressure. This session dives into how we used OpenLit, a lightweight CNCF tool for observing LLM and agent-based workloads. We’ll explore how we tracked & debugged prompt chains across orchestration layers, visualize token usage and latency distributions at each step of an AI workflow, monitor agent reasoning patterns and build performance dashboards that go beyond CPU and memory into model selection, response variability, and input size trends. Whether you're building RAG systems, multi-step AI agents, or just trying to figure out why your chatbot is always apologizing, this talk shows how OpenLit brings real observability into your AI stack — without needing a PhD in tracing or LLM internals.

NDC { London }

Prompt Failures and Latency Spikes: Observability for AI

Prerit Munjal