Reliable Data Flows and Scalable Platforms: Tackling Key Data Challenges

There are a few common and mostly well-known challenges when architecting for data. For example, many data teams struggle to move data in a stable and reliable way from operational systems to analytics systems. At the same time, they must manage complex and often costly infrastructure landscapes. These issues hinder companies from effectively leveraging their data for business purposes.

Drawing from real-world experience, this presentation will explore how we address these challenges by building reliable and scalable data platforms with reasonable costs. It will also cover solutions to help operational teams provide their data and to observe the flow towards analytics systems. In addition to discussing architectures and design considerations the presentation will also highlight tools and techniques used to implement these platforms.

Interview:

What is the focus of your work?

In one sentence, I would say: availability of data for analytical use, regardless of whether it is a dashboard, machine learning or GenAI. In addition to data platforms and infrastructures, this often includes the transfer of data from operational systems to analytical systems, which is often still treated rather neglected and offers many pitfalls.

What’s the motivation for your talk?

Many data initiatives fail to obtain reliable and consistent data. There are many different approaches to solving this problem, some of which are borrowed from well-known software engineering best practices. I would like to present a few of these solutions and, above all, show how we have implemented them in practice. Without the costs exploding.

Who is your talk for?

Of course, the talk is primarily aimed at data architects and data engineers. But I also invite all software engineers to take a look at the topic. They too are increasingly coming into contact with the supply of data and can make a big difference. Plus knowing about whats possible there could make their lives easier.

What do you want someone to walk away with from your presentation?

Ideas on how I can provide data better in the sense of more stable and correct. Tangible possibilities of which tools and processes can be used to implement this. Without sinking into complexity and ultimately costs.

What do you think is the next big disruption in software?

I believe that one of the big issues will be the reduction of complexity. Nowadays, modern IT landscapes are a patchwork of barely comprehensible building blocks that are somehow held together. In addition to maintenance, this also makes it difficult to test and implement new features in order to quickly adapt to the market. In this context, automation (e.g. through AI) will play a major role on both the business and technical side.

What was one interesting thing that you learned from a previous QCon?

I was already familiar with duckdb, but in 2023 I understood in Hannes Mühleisen's presentation what possibilities the technology offers and what diverse applications it enables. That was the first time I was able to imagine the changes in data architectures that are possible thanks to the new, lean processing frameworks.


Speaker

Matthias Niehoff

Head of Data and Data Architecture @codecentric AG, iSAQB Certified Professional for Software Architecture

Matthias Niehoff works as Head of Data and Data Architect for codecentric AG and supports customers in the design and implementation of data architectures. His focus is on the necessary infrastructure and organization to help data and ML projects succeed.

Read more
Find Matthias Niehoff at:

From the same track

Session Data engineering

Building a Global Scale Data Platform with Cloud-Native Tools

Wednesday Apr 9 / 01:35PM BST

As businesses increasingly operate in hybrid and multi-cloud environments, managing data across these complex setups presents unique challenges and opportunities. This presentation provides a comprehensive guide to building a global-scale data platform using cloud-native tools.

Speaker image - George Hantzaras

George Hantzaras

Engineering Director, Core Platforms @MongoDB, Open Source Ambassador, Published Author

Session

Achieving Precision in AI: Retrieving the Right Data Using AI Agents

Wednesday Apr 9 / 11:45AM BST

In the race to harness the power of generative AI, organizations are discovering a hidden challenge: precision.

Speaker image - Adi Polak

Adi Polak

Director, Advocacy and Developer Experience Engineering @Confluent, Author of "Scaling Machine Learning with Spark" and "High Performance Spark 2nd Edition"

Session Data Architecture

Beyond the Warehouse: Why BigQuery Alone Won’t Solve Your Data Problems

Wednesday Apr 9 / 03:55PM BST

Many organizations mistake the adoption of a data warehouse, like BigQuery, as the golden ticket to solving all their data challenges. But without a robust data strategy and architecture, you’re simply shifting chaos into the cloud.

Speaker image - Sarah Usher

Sarah Usher

Data & Backend Engineer, Community Director, Mentor

Session

The Data Backbone of LLM Systems

Wednesday Apr 9 / 02:45PM BST

Any LLM application has four dimensions you must carefully engineer: the code, data, models and prompts. Each dimension influences the other. That's why you must learn how to track and manage each. The trick is that every dimension has particularities requiring unique strategies and tooling.

Speaker image - Paul-Emil Iusztin

Paul-Emil Iusztin

Senior ML/AI Engineer, MLOps, Founder @Decoding ML