The online world we interact with today is increasingly powered by data and by insights extracted from that data. Our ever-growing thirst for data insights and data-driven behavior (e.g. ML-based systems) is driving our industry to collect data more often from an increasingly varied set of sources. With increased amounts of data, scale becomes a challenge. To complicate matters further, customers want reliable access to high-quality data and insights. This adds availability and data quality to our list of requirements. More often than not, customers require low-latency as well, often referring to the time it takes raw data to be converted into usable insights or production-grade models. Last but not least, access patterns and use-cases dictate the form data will take when being served!
Depending on how the data will be used, the medium used to store and serve it will vary widely. OLTP/OLAP DBs, caches, object stores, search engines, graph DBs, data streams, vector DBs, and the like represent the many forms data takes to be suitable to its many uses. Come to this track to learn about new technologies, practices, and trends shaping the way you will work with data.
From this track
Introducing Tansu.io -- Rethinking Kafka for Lean Operations
Tuesday Mar 17 / 10:35AM GMT
What if Kafka brokers were ephemeral, stateless and leaderless with durability delegated to a pluggable storage layer?
Peter Morgan
Founder @tansu.io
The Rise of the Streamhouse: Idea, Trade-Offs, and Evolution
Tuesday Mar 17 / 11:45AM GMT
Over the last decade, streaming architectures have largely been built around topic-centric primitives—logs, streams, and event pipelines—then stitched together with databases, caches, OLAP engines, and (increasingly) new serving systems.
Giannis Polyzos
Principal Streaming Architect @Ververica
Anton Borisov
Principal Data Architect @Fresha
From S3 to GPU in One Copy: Rethinking Data Loading for ML Training
Tuesday Mar 17 / 01:35PM GMT
ML training pipelines treat data as static. Teams spend weeks preprocessing datasets into WebDataset or TFRecords, and when they want to experiment with curriculum learning or data mixing, they reprocess everything from scratch.
Onur Satici
Staff Engineer @SpiralDB
Ontology‐Driven Observability: Building the E2E Knowledge Graph at Netflix Scale
Tuesday Mar 17 / 02:45PM GMT
As Netflix scales hundreds of client platforms, microservices, and infrastructure components, correlating user experience with system performance has become a hard data problem, not just an observability one.
Prasanna Vijayanathan
Engineer @Netflix
Renzo Sanchez-Silva
Engineer @Netflix
Building a Control Plane for Production AI
Tuesday Mar 17 / 03:55PM GMT
Details coming soon.
Unconference: Modern Data Engineering
Tuesday Mar 17 / 05:05PM GMT