Abstract
Observability is supposed to help you tame complexity, but your Observability stack can quickly become just as complex as the systems it's meant to watch. For most teams, the answer is to pay someone else to deal with it. But bills grow, auditors ask awkward questions, and sometimes you just run out of road with your SaaS provider. In those instances, you have to turn to running it yourself.
Drawing on a decade of experience in building, maintaining, and operating self hosted monitoring and Observability stacks, in this talk, I will explain what it actually means to run your own stack, what the tooling landscape is, where it shines, and where the open source world struggles behind the SaaS experience.
Along the way, I'll cover options for all your telemetry types, with concrete recommendations on what to use and what to avoid, and insights on how to tie them together into one coherent debugging canvas, with a look at where the Observability world is going next.
Interview:
What is your session about, and why is it important for senior software developers?
My session is about self hosted Observability, and the options and unique challenges it presents. More than that, it's an insight into what's going under the hood in an observability system, and aims to contextualise that to better enable engineers to understand and work with their telemetry going forward
Why is it critical for software leaders to focus on this topic right now, as we head into 2026?
Especially in the world of AI, our systems are becoming more complex by the day. Observability is the answer to tackling that complexity, but you have to do it right, which means knowing how to get the best out of your telemetry systems
What are the common challenges developers and architects face in this area?
Lots of developers struggle with tying telemetry together into a cohesive debugging strategy. Try as we might, the "three pillar" idea still exists, and is sub optimal for the modern distributed system.
What's one thing you hope attendees will implement immediately after your talk?
Ways to tie telemetry of different types. In particular "exemplars" are an underused aspect of the modern metrics system.
Speaker
Colin Douch
Site Reliability Engineer @DuckDuckGo
Colin currently works as an SRE at DuckDuckGo, orchestrating and inventing solutions to better serve DuckDuckGo's increasingly large portfolio of services, serving search queries and AI chats from around the world. Formerly heading up the Observability Team at Cloudflare, he has been working, advising, and researching in the Monitoring and Observability space for close to 10 years and has gained a wide perspective into the difficulties that modern companies, big and small, deal with in properly introspecting their systems. Originally from New Zealand, he now lives in the UK and regularly speaks at conferences to share insights from the practical side of Observability engineering.