Shielding the Core: Architecting Resilience with Multi-Layer Defenses

Abstract

High-demand events can cause sudden traffic spikes that overwhelm even well-designed systems. In ticketing platforms, millions of users — alongside increasingly sophisticated automated agents — may arrive simultaneously, placing extreme pressure on backend services.

At SeatGeek, we observed that even elastic infrastructure has limits: autoscaling takes time to react, and systems must survive while capacity catches up. To address this gap, we designed a layered shielding architecture that distributes defensive responsibilities across multiple parts of the platform.

At the edge, caching, shielding, and admission control mechanisms such as queueing absorb traffic bursts before they reach the origin. API gateways enforce fairness through rate limiting and request validation. Deeper in the stack, Kubernetes-native networking policies and platform controls help contain failures and protect service boundaries.

This layered approach allows the system to shed load early, protect critical services, and degrade gracefully during extreme demand. But resilience is not static: traffic patterns evolve, new bottlenecks emerge, and systems must continuously adapt through observability and feedback signals.

In this talk, we will explore the architecture and operational lessons behind building multi-layer shields that protect core systems under internet-scale traffic, and share practical insights for designing resilient platforms that can withstand traffic stampedes without bringing down the entire ecosystem.

Interview:

What is your session about, and why is it important for senior software developers?

This talk explores how to design resilient systems that can withstand extreme traffic spikes without collapsing. Using real-world examples from ticketing platforms, I will show how distributing defensive responsibilities across layers — edge, gateway, and platform infrastructure — helps protect critical services during sudden demand surges. Senior engineers often operate systems where scaling alone is not enough; resilience requires intentional architecture and operational controls. The session focuses on practical patterns that help systems degrade gracefully rather than fail catastrophically.

Why is it critical for software leaders to focus on this topic right now, as we head into 2026?

Traffic patterns are becoming less predictable as automated agents, AI-driven clients, and global user demand increase system pressure. At the same time, modern platforms rely on complex distributed architectures in which small failures can quickly cascade. Leaders need to design systems that assume sudden spikes and evolving traffic behavior. Building resilience through layered defenses and clear operational signals is becoming essential for maintaining reliability at scale.

What are the common challenges developers and architects face in this area?

A common misconception is that cloud elasticity alone solves scalability problems. In reality, autoscaling takes time to react, and systems often experience instability before capacity catches up. Teams also struggle to identify truly critical services, manage noisy-neighbor effects in shared infrastructure, and detect early signals of system stress. Designing architectures that shed load early and protect the core system requires coordination across multiple platform layers.

What's one thing you hope attendees will implement immediately after your talk?

I hope attendees rethink where traffic control happens in their systems. Instead of relying solely on backend scaling, they should introduce earlier defenses — such as caching, admission control, and rate limiting — to absorb pressure before it reaches core services. Even small changes at the edge or gateway layer can dramatically improve system stability during traffic spikes.

What makes QCon stand out as a conference for senior software professionals?

QCon focuses on real engineering experience rather than hype or vendor-driven content. Speakers share lessons learned from operating large-scale systems in production, including the trade-offs and failures behind architectural decisions. This creates an environment where senior engineers can learn from peers facing similar challenges. The emphasis on practical insight and honest technical discussion makes QCon particularly valuable.


Speaker

Anderson Parra

Staff Software Engineer @SeatGeek

Anderson Parra is a Staff Software Engineer on SeatGeek’s Cloud Platform team, where he works on the infrastructure that powers high-demand ticket onsales. His work focuses on building resilient systems that can withstand internet-scale traffic and on designing layered defenses across edge, API gateways, and Kubernetes platforms to protect core services while preserving a fair user experience.

Over the past 18+ years, Ander has built and operated large-scale distributed systems handling massive traffic and data volumes for companies in Brazil, Ireland, Germany, the UK, and the United States. He has worked across a wide range of technologies, including Java, Scala, Go, Ruby, Python, JavaScript, and Lua, with a strong focus on platform engineering and distributed systems.

Anderson holds a master’s degree in distributed systems based on his research, “A Lightweight Reconfiguration Solution for Paxos”.
 

Read more
Find Anderson Parra at:

Date

Wednesday Mar 18 / 03:55PM GMT ( 50 minutes )

Location

Churchill (Ground Fl.)

Share

From the same track

Session resilience

How to Find Resilience Bugs in Systems that Don't Exist

Wednesday Mar 18 / 10:35AM GMT

Building correct distributed systems takes thinking outside the box, and the fastest way to do that is to think inside a different box. One different box is "formal methods", the discipline of mathematically verifying software and systems.

Speaker image - Hillel Wayne

Hillel Wayne

Author of "Logic for Programmers" and "Learn TLA+"

Session decentralized

Spritely: Infrastructure for the Future of the Internet

Wednesday Mar 18 / 11:45AM GMT

Let's take back the internet! Learn about Spritely's work to re-decentralize the net with new foundational technologies that put users in control.

Speaker image - Christine  Lemmer-Webber

Christine Lemmer-Webber

Executive Director @Spritely Institute, Co-Author of ActivityPub

Speaker image - David Thompson

David Thompson

CTO @Spritely Institute

Session architecture

Understanding Progressive Collapse: How To Avoid A Cascading Failure

Wednesday Mar 18 / 01:35PM GMT

Small things going wrong can quickly snowball. The cascading failure is often a nightmare scenario for any system. An initial problem, which in isolation seems like such a minor problem, can kick off a chain reaction of ever-increasing failures, potentially leading to catastrophic results.

Speaker image - Sam Newman

Sam Newman

Microservice, Cloud, CI/CD Expert, Author of "Building Microservices" and "Monolith to Microservices", 20+ Years Experience as a Developer

Session

Keeping the Nation On-Air: How We Think About Resilience at the BBC

Wednesday Mar 18 / 02:45PM GMT

At the heart of the BBC is delivering value to all, serving audiences across the UK and the world on TV, radio, and online with trusted and impartial news and high-quality British content.

Speaker image - Tom Everest

Tom Everest

Head of Department for Architecture and Supply Chain @BBC