RocksDB, a key-value store built on the foundation of Log-Structured Merge-Tree data structures and originally open-sourced by Facebook, has played a significant role in shaping data systems over the past decades. However, it hasn’t seen widespread adoption in analytics databases, mainly due to the absence of native support for tight columnar encoding formats.
This talk will explore a journey of building a modern analytical database, Rockset, on top of RocksDB. We’ll discuss a key insight that enabled us to bring columnar encoding into RocksDB, achieving not only performance parity with column-oriented databases, but also allowing real-time updates. Additionally, we will highlight the architectural advantages of deploying RocksDB in the cloud, showing how we achieved compute-storage and compute-compute separation by utilizing cloud object storage for durability and multi-tenant hot storage layer for performance. Finally, we will share learnings from operating Rockset and RocksDB in production.
Interview:
What's the focus of your work these days?
I am an engineer at Rockset, a search and analytics database, and my primary responsibility these days is query performance. My work ranges from low-level optimizations of hot inner loops to thinking about the performance of the system on a higher level and, finally, building tooling that helps us debug performance issues more quickly.
What's the motivation for your talk at QCon London 2024?
There are a number of challenges that we solved while building Rockset that I am happy to share with the audience. We picked RocksDB as our underlying key-value store, which brings important architectural advantages when deployed in the cloud and makes it easy to build compute-storage and compute-compute separation. However, off-the-shelf RocksDB is not performant for analytical queries due to lack of tight columnar encodings. This talk will explore both aspects - why RocksDB is great for the cloud, and how we made it perform well for analytical workloads.
How would you describe your main persona and target audience for this session?
This talk will be technical and will assume attendees have good knowledge of the architecture and design of data-intensive systems. The target audience will be builders and people who spend a lot of their time thinking about data, systems, and performance.
Is there anything specific that you'd like people to walk away with after watching your session?
I hope they'll gain new insights into how to build data-intensive systems in the cloud, a deeper understanding of RocksDB and some tricks on how to use it as a building block of a system where performance matters.
Speaker
Igor Canadi
Founding Engineer and Architect @Rockset, Previously at RocksDB and Facebook
Igor Canadi is a Founding Engineer at Rockset, a modern cloud-native search and analytics database, where he is responsible for the data indexing and distributed SQL engine. Previously, Igor was a Software Engineer at Facebook, where he developed RocksDB, an open-source key-value store widely deployed in the data industry, and contributed to Facebook's core GraphQL infrastructure. In his free time, he enjoys sailing and snowboarding.