Abstractions are what allow us to build the complex applications that we all use day-to-day. For example, it's rare for us to care about the precise details of on-disk storage when building an application — that's why databases exist!
Debugging is different though. It forces us to break through those abstractions in order to understand what the computer is really doing.
In this talk, we'll explore the aftermath of a complex outage in a Postgres cluster. We'll retrace the steps we took to reliably reproduce the failure in a local environment and pull out lessons about debugging complex systems along the way. At one point, we'll dive into the depths of how Postgres represents data on disk, and realise that even unfamiliar layers of a system don't need to be scary.
Interview:
What's the focus of your work these days?
I work as an infrastructure engineer at a company called PlanetScale where we build a MySQL managed database platform. The focus of my work specifically is building all of the infrastructure underneath the database that helps it to run super reliably and automated and without too many hands.
What's the motivation for your talk at QCon London 2023?
My talk's motivation is about trying to help people get better at debugging really complex problems. I think it's really easy to become very skilled in other parts of writing software and lag behind in terms of debugging skills. It's something I feel very passionate about as an infrastructure engineer because a lot of my job is figuring out why things are going wrong. I actually don't think it's that hard, I think people often find it tricky because it's outside of their comfort zone. In my talk I go through a really complex example of a failure, but show that it's really just about applying the same step-by-step approach.
How would you describe your main persona and target audience for this session?
Anyone who builds and runs software in production. In terms of level, I'd say probably mid to senior plus. I try to avoid assuming any domain knowledge in the talk. I do assume a base level of programming knowledge to get there.
Is there anything specific that you'd like people to walk away with after watching your session?
I'd like people to believe that they can do things that they're not so familiar with. The specific example I use in the talk is of a database clustering outage. It's about following that methodical debugging process, in which we had to dive all the way down into the binary on disk format of the database - which is a scary place that you don't normally go to.
Speaker
Chris Sinjakli
Infra Engineer @planetscaledata
Chris enjoys working on the strange parts of computing where software and systems meet. He especially likes the challenges of databases and distributed systems.
All his programs are made from organic, hand-picked, artisanal keypresses.