Modern data analytics workflows rely on scaling out to huge numbers of users and compute nodes. Managing database installations to handle this scale can be unsustainably complex and expensive. Is it instead possible to get rid of all this complexity and build a database with just a client-side library and object storage?
At Man Group we have evolved over time from managing one of the largest MongoDB installations in Europe to a serverless model where users interact directly with object storage using ArcticDB, our own database engine. What we've learnt from this should be interesting to anyone interested in distributed computing, not just database development.
We will focus on topics such as:
- Our choices around ACID and our core data structures
- How to manage global state with lock-free techniques such as CRDTs
- How we manage to work with relatively high latency commodity object storage
- How object storage has evolved over time, and how advanced it is becoming
Interview:
What is the focus of your work?
I work on ArcticDB, a client-side database engine optimised for timeseries data that's been developed from scratch in Man Group and now its own business. A lot of my work on the project has been on an optional set of server side processes to manage data replication and streaming data ingestion.
What’s the motivation for your talk?
ArcticDB has an unusual serverless architecture where users use our library to interact directly with object storage, with no co-ordinating server. We've learnt a lot about distributed computing and working with object storage by building this, and I want to share some interesting techniques and design choices that we've used.
Who is your talk for?
The concepts in my talk should be interesting to anyone working on distributed computing problems, whether for database development or not. It should be particularly interesting to people who work with object storage like S3. We will discuss specific design choices we've made and why, especially around our data structures and data format, so a good audience would be senior developers and technical architects who make similar decisions in their own projects.
What do you want someone to walk away with from your presentation?
New ideas about how to make useful software in a serverless architecture and an appreciation of how powerful modern object storage technologies are.
What do you think is the next big disruption in software?
It might not be as big a disruption as AI but it will be interesting to see how columnar file formats evolve and whether a successor to Parquet as a de facto standard will emerge.
Speaker

Alex Seaton
Staff Engineer @ArcticDB, Previously Working on Quant Trading Systems @Man Group
Alex Seaton is an engineer working on ArcticDB at Man Group. ArcticDB is a high-performance dataframe database that is optimised for timeseries data, data-science workflows and scales to petabytes of data and thousands of simultaneous users. At ArcticDB his focus has been on data replication and tick streaming infrastructure. Before joining ArcticDB, Alex built trade execution and market data systems.