Innovations in Data Engineering

Data engineering has become an indispensable function in most software engineering organizations today. Data engineering as a discipline has broadened to encompass all practices, systems, and architectures involved in storing and serving data for a myriad of needs. From OLTP systems that power user experiences to the analytics systems that power business & user insights to all of the connective tissue that keeps data consistent between these systems, data engineers have their hands full managing complex systems and architectures. The promise of the modern data stack was to simplify these architectures to reduce the operational burden many of us still wrestle with today. But, what really works? Which technologies and practices live up to their promises? What patterns and technologies have stood the test of time? What are some pitfalls that you need to be aware of? Come to this track to learn from data engineers facing & solving these problems today.


From this track

Session

A New Era for Database Design with TigerBeetle

Monday Mar 27 / 10:35AM BST

The pre-recorded video of this presentation will become available within the next few hours.  

Speaker image - Joran Greef

Joran Greef

Founder and CEO @TigerBeetle

Session Apache Pinot

Speed of Apache Pinot at the Cost of Cloud Object Storage with Tiered Storage

Monday Mar 27 / 11:50AM BST

For real-time analytics, you need systems that can provide ultra low latency (milliseconds) and extremely high throughput (hundreds of thousands of queries per second).

Speaker image - Neha Pawar

Neha Pawar

Founding Engineer @StarTree

Session Microservices

Change Data Capture for Microservices

Monday Mar 27 / 01:40PM BST

Microservices represent complex business domains in the form of loosely coupled systems, but these don't exist in isolation: services need to propagate data changes amongst each other, in a reliable and scalable way.

Speaker image - Gunnar Morling

Gunnar Morling

Senior Staff Software Engineer @Decodableco

Session transactions

Amazon DynamoDB Distributed Transactions at Scale

Monday Mar 27 / 02:55PM BST

NoSQL databases are popular for their high availability, high scalability, and predictable performance.

Speaker image - Akshat Vig

Akshat Vig

Senior Principal Engineer NoSQL databases @awscloud

Session raft

Multi-Region Data Streaming with Redpanda

Monday Mar 27 / 04:10PM BST

Real time data streaming platforms such as Redpanda have become a mission critical component in enterprise infrastructure. Multi-region deployments of streaming applications can provide important benefits, such as improved resiliency, better performance and cost reduction.

Speaker image - Michał Maślanka

Michał Maślanka

Software Engineer @Redpanda

Session processing techniques

In-Process Analytical Data Management with DuckDB

Monday Mar 27 / 05:25PM BST

Analytical data management systems have long been monolithic monsters far removed from the action by ancient protocols. Redesigning them to move into the application process greatly streamlines data transfer, deployment, and management.

Speaker image - Hannes Mühleisen

Hannes Mühleisen

Co-founder and CEO @duckdblabs

Track Host

Sid Anand

Fellow, Cloud & Data Platform @Walmart, Apache Airflow Committer/PMC, Ex-Netflix, LinkedIn, eBay, Etsy, & PayPal

Sid recently joined Walmart (i.e. Walmart Global Tech) as a fellow to work on all things data. Prior to joining Walmart Global Tech, Sid served as the Chief Architect and Head of Engineering for Datazoom, where he and his team built high-fidelity, low-latency data streaming systems. Prior to joining Datazoom, Sid served as PayPal's Chief Data Engineer, where he helped build systems, platforms, teams, and processes, all with the aim of building access to the hundreds of petabytes of data under PayPal's management. Prior to joining PayPal, Sid held senior technical positions at Netflix, LinkedIn, eBay, & Etsy to name a few. He earned my BS and MS degrees in CS from Cornell University, focusing on Distributed Systems.

Outside of work, Sid advises early-stage companies and several conferences. Once an active committer on Apache Airflow, he is now mostly a fan.

Sid's body of work includes but is not limited to :

  • The world's first cloud-based streaming video service -- I was the first engineer to work on the cloud at Netflix
  • LinkedIn's Federated Search Typeahead (a.k.a. auto-complete)
  • LinkedIn's (Big Data) Self-service Marketing Analytics tool
  • PayPal's DBaaS - an internal self-service system to provision & manage heterogenous databases
  • PayPal's CDC - an internal self-service CDC system to stream DB updates to nearline applications
  • eBay-over-Skype : Following the Skype-acquisition, I built a P2P version of eBay offers
  • eBay's Best Match Search Ranking Engine powered by an In-Memory Database
  • eBay's Fuzzy-match name/email Search
  • Agari's Data Platform : Batch & Streaming Predictive Data Platform as a Service
  • Datazoom's Platform : High-fidelity, Low-latency Streaming Data Platform as a Service
Read more
Find Sid Anand at: