Tales of Kafka @Cloudflare: Lessons Learnt on the Way to 1 Trillion Messages

Cloudflare uses Kafka to decouple microservices and communicate the creation, change or deletion of various resources via a common data format in a fault-tolerant manner. This decoupling is one of many factors that enables Cloudflare engineering teams to work on multiple features and products concurrently.

We learnt a lot about Kafka on the way to one trillion messages, and built some interesting internal tools to ease adoption as well as improve resiliency that will be explored in this talk.

Interview:

What's the focus of your work these days?

Andrea: I work as a Systems Engineer at Cloudflare. I research and help prioritize what's needed to make other developers' lives easier. I design, implement and maintain platforms used by other teams to do their job effectively. For example, we maintain SDKs for Kafka and platforms that allow emitting alerts, emails audit logs and so on.

What's the motivation for your talk at QCon London 2023?

We wanted to share more about what we do for productivity, organizational culture, and also I've never spoken to such a big conference before, so it's an amazing chance for me to become more confident.

How would you describe your main persona and target audience for this session?

I think there will be a lot of people that have a lot of experience, senior engineering, CTOs, tech leads.

Is there anything specific that you'd like people to walk away with after watching your session?

A better understanding of how to set up for success when deciding to integrate with Kafka, especially in a big company where there's a lot of traffic. Many of the points in our presentation also apply to different worlds, different platforms and systems. I think everyone can get something from it. 


Speaker

Andrea Medda

Senior Systems Engineer @Cloudflare

Andrea is a Senior Systems Engineer at Cloudflare. He loves using cutting edge technology and approaches to solve real customer problems.

Andrea loves Golang and distributed systems. He has a pet friend Maru that loves to pop by in his meetings.

Read more
Find Andrea Medda at:

Speaker

Matt Boyle

Engineering Manager @Cloudflare

Matthew Boyle is an experienced technical leader in the field of distributed systems, specializing in using Go.

He has worked at huge companies such as Cloudflare and General Electric, as well as exciting high-growth startups such as Curve and Crowdcube.

Matt has been writing Go for production since 2018 and often shares blog posts and fun trivia about Go over on Twitter (@MattJamesBoyle).

Read more
Find Matt Boyle at:

Date

Monday Mar 27 / 02:55PM BST ( 50 minutes )

Location

Fleming (3rd Fl.)

Topics

Microservices Kafka resilience case study

Share

From the same track

Session

Building High-Fidelity Data Streams

Monday Mar 27 / 01:40PM BST

Low latency data streaming technology and practices remain a hot and trending topic among data engineers today. At its core, it promises to deliver data in near real time in order to provide snappy data-driven user experiences.

Speaker image - Sid Anand

Sid Anand

Fellow, Cloud & Data Platform @Walmart, Apache Airflow Committer/PMC, Ex-Netflix, LinkedIn, eBay, Etsy, & PayPal

Session Microservices

Banking on Thousands of Microservices

Monday Mar 27 / 05:25PM BST

Monzo has built an entire banking platform from scratch composed of many microservices; it serves over 7 million customers daily with an organisationally lean engineering team. All aspects of the bank are deployed hundreds of times a day (even on Fridays!).

Speaker image - Suhail Patel

Suhail Patel

Staff Engineer @Monzo Focused on Designing and Operating Distributed Systems, Previously @Citymapper

Session scalability

Scaling Google's Global Cloud L7 Load Balancer

Monday Mar 27 / 10:35AM BST

We'll take a look at Google's Global Cloud L7 Balancer, how it's put together and how we've scaled it to meet the reliability and performance demands of our Cloud customers.

Speaker image - James Spooner

James Spooner

Principal Engineer, Load Balancing @Google

Session scalability

Zoom: Why Does It Work?

Monday Mar 27 / 04:10PM BST

During the pandemic Zoom had to scale massively to support the big move from working in the office every day to meeting online for both business and private use. How did Zoom manage this scaling dilemma? And when you join a Zoom call how does that actually work?

Speaker image - Ian Sleebe

Ian Sleebe

Senior Solutions Architect @Zoom

Session

Unconference: Architectures You've Always Wondered About

Monday Mar 27 / 11:50AM BST

What is an unconference? An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.

Speaker image - Shane Hastie

Shane Hastie

Global Delivery Lead @SoftEd, Lead Editor for Culture & Methods @InfoQ