Summary

Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconlondon.com with any comments or concerns.

Introduction:

The presentation begins with an analogy comparing the experience of riding the tube to navigating distributed systems. Just like the unexpected pauses in a tube journey, distributed systems may face latency issues due to various propagation delays and dependence on multiple components.

Main Content:

Latency as a Design Constraint: The talk emphasizes treating latency as a design constraint by assigning a latency budget for each component in the system to avoid bottlenecks.
Trade-offs and Techniques:
- Parallelism: Implementing safe parallelism can help manage fan-out scenarios where one request leads to multiple paths in the system.
- Retry and Timeout Strategies: Importance of setting appropriate retry limits and timeouts to avoid amplifying latency unnecessarily.
- Data Handling: Reduce payload sizes by intelligently selecting the data to send, preventing unnecessary data transmission.
- Observability: Establishing trace-driven observability to measure and diagnose system delays effectively.
Challenges in Distributed Systems:
- Managing database tail latency and avoiding hot spots.
- Strategies for data partitioning to ensure even traffic distribution across shards.
Human Elements and Engineering Culture:
- Ensuring team alignment on latency goals and regular discussion on incidents and improvements.
- Using service level objectives (SLOs) and service level indicators (SLIs) for performance tracking.
Future Outlook: Discusses potential future enhancements like predictive caching and adaptive routing.

Conclusion:

The session closes with recommendations for practical steps to refine API design and a call to inculcate a culture of excellence by habit.

Overall, the presentation provides a detailed framework for creating APIs in distributed systems that adhere to sub-100ms latency targets while maintaining reliability and performance under varying operational conditions.

This is the end of the AI-generated content.

Abstract

A “simple” API request rarely stays simple. In distributed systems, one call quickly turns into fan-out across gateways, services, caches, and databases — and your p99 becomes the sum of every hop and every flaky dependency. Worse, it’s often not a clean outage; it’s grey failures and intermittent slowdowns that are hard to reproduce and easy for customers to feel.

In this session, I’ll share a practical playbook for designing sub-100ms APIs when fan-out is unavoidable. We’ll start with latency budgets, so performance becomes a design constraint, not a hope. Then we’ll cover the patterns that keep tail latency predictable: safe parallelism, timeouts and retries that don’t amplify failure, idempotency, bulkheads/circuit breakers with fallbacks, and caching strategies where invalidation is treated as a correctness problem. We’ll close with trace-driven observability — the minimal signals that let you quickly answer: where did the milliseconds go, what changed, and is it us or a dependency?

Main takeaways:

How to budget latency across service boundaries and enforce it with guardrails
How to use timeouts/retries/idempotency + bulkheads without creating new p99 spikes
How to use traces + a few key metrics to pinpoint the slow hop fast

Speaker

Saranya Vedagiri

Senior Staff Engineer @eBay

Saranya Vedagiri is a Staff Engineer at eBay, where she designs and operates large-scale distributed systems with a focus on reliability and low-latency performance. Her work spans API design, service-to-service communication, caching strategies, and resilience patterns that keep critical flows fast under real production traffic. Saranya is passionate about performance as a product feature, engineering culture, and mentoring teams to build systems that stay predictably fast as they scale.