From Fan-Out to Fast: Sub-100ms API Design in Distributed Systems

Summary

Disclaimer: This summary has been generated by AI. It is experimental, and feedback is welcomed. Please reach out to info@qconlondon.com with any comments or concerns.

Introduction:

The presentation begins with an analogy comparing the experience of riding the tube to navigating distributed systems. Just like the unexpected pauses in a tube journey, distributed systems may face latency issues due to various propagation delays and dependence on multiple components.

Main Content:

  • Latency as a Design Constraint: The talk emphasizes treating latency as a design constraint by assigning a latency budget for each component in the system to avoid bottlenecks.
  • Trade-offs and Techniques:
    • Parallelism: Implementing safe parallelism can help manage fan-out scenarios where one request leads to multiple paths in the system.
    • Retry and Timeout Strategies: Importance of setting appropriate retry limits and timeouts to avoid amplifying latency unnecessarily.
    • Data Handling: Reduce payload sizes by intelligently selecting the data to send, preventing unnecessary data transmission.
    • Observability: Establishing trace-driven observability to measure and diagnose system delays effectively.
  • Challenges in Distributed Systems:
    • Managing database tail latency and avoiding hot spots.
    • Strategies for data partitioning to ensure even traffic distribution across shards.
  • Human Elements and Engineering Culture:
    • Ensuring team alignment on latency goals and regular discussion on incidents and improvements.
    • Using service level objectives (SLOs) and service level indicators (SLIs) for performance tracking.
  • Future Outlook: Discusses potential future enhancements like predictive caching and adaptive routing.

Conclusion:

The session closes with recommendations for practical steps to refine API design and a call to inculcate a culture of excellence by habit.

Overall, the presentation provides a detailed framework for creating APIs in distributed systems that adhere to sub-100ms latency targets while maintaining reliability and performance under varying operational conditions.

This is the end of the AI-generated content.


Abstract

A “simple” API request rarely stays simple. In distributed systems, one call quickly turns into fan-out across gateways, services, caches, and databases — and your p99 becomes the sum of every hop and every flaky dependency. Worse, it’s often not a clean outage; it’s grey failures and intermittent slowdowns that are hard to reproduce and easy for customers to feel.

In this session, I’ll share a practical playbook for designing sub-100ms APIs when fan-out is unavoidable. We’ll start with latency budgets, so performance becomes a design constraint, not a hope. Then we’ll cover the patterns that keep tail latency predictable: safe parallelism, timeouts and retries that don’t amplify failure, idempotency, bulkheads/circuit breakers with fallbacks, and caching strategies where invalidation is treated as a correctness problem. We’ll close with trace-driven observability — the minimal signals that let you quickly answer: where did the milliseconds go, what changed, and is it us or a dependency?

Main takeaways:

  • How to budget latency across service boundaries and enforce it with guardrails
  • How to use timeouts/retries/idempotency + bulkheads without creating new p99 spikes
  • How to use traces + a few key metrics to pinpoint the slow hop fast

Speaker

Saranya Vedagiri

Senior Staff Engineer @eBay

Saranya Vedagiri is a Staff Engineer at eBay, where she designs and operates large-scale distributed systems with a focus on reliability and low-latency performance. Her work spans API design, service-to-service communication, caching strategies, and resilience patterns that keep critical flows fast under real production traffic. Saranya is passionate about performance as a product feature, engineering culture, and mentoring teams to build systems that stay predictably fast as they scale.
 

Read more
Find Saranya Vedagiri at:

Date

Monday Mar 16 / 10:35AM GMT ( 50 minutes )

Location

Whittle (3rd Fl.)

Topics

architecture Distributed Systems Scalable and Reliable

Share

From the same track

Session Platform Engineering

APIs for Agents: Rethinking API Programs in the MCP Era

Monday Mar 16 / 01:35PM GMT

As API programs mature, a familiar gap emerges: some teams operate with strong standards, reusable platforms, and clear governance,  while others rely on informal guidance and best-effort consistency.

Speaker image - Jim Gough

Jim Gough

Distinguished Engineer, API Platform Lead Architect @Morgan Stanley, Co-Author of Optimizing Java

Speaker image - Andreea Niculcea

Andreea Niculcea

Vice President @Morgan Stanley

Session architecture

Managing Asynchronous APIs at Scale

Monday Mar 16 / 05:05PM GMT

When event-driven architectures are small, teams can reason about events through word-of-mouth. They know who publishes what, who consumes it, and how messages flow through the system. Teams manage their own infrastructure or raise tickets to request changes.

Speaker image - Ian Cooper

Ian Cooper

Senior Principal Engineer @Just Eat Takeaway

Session Observability

Uncorking Queueing Bottlenecks with OpenTelemetry

Monday Mar 16 / 11:45AM GMT

Queues are the backbone of scalable, asynchronous systems, but they can easily create a tangled web of complexity. When things slow down, the bottleneck could be anywhere, from producer lag to consumer exhaustion, and standard metrics often fail to show the full picture.

Speaker image - Julian Wreford

Julian Wreford

Team Lead of Operability Team @Gearset, Software Engineer Turned Accidental SRE

Speaker image - Oli Lane

Oli Lane

Engineering Team Lead @Gearset, Focusing on Engineering Culture, Observability, and Platform Reliability

Session AI

Enchant Your AI and APIs with eBPF Magic 🪄

Monday Mar 16 / 03:55PM GMT

It is a common occurrence to see applications thrown over the fence, landing somewhere in production without a second thought about their lifecycle or how they may need maintaining in the future to connect to more efficient API endpoints.

Speaker image - Dan Finneran

Dan Finneran

Principal Community Advocate at Isovalent @Cisco

Session

Unconference: Connecting Systems

Monday Mar 16 / 02:45PM GMT