Pitfalls of Unified Memory Models in GPUs

Modern GPUs offer support for so-called unified memory, providing a universal address space for both CPUs and GPUs. Whilst attractive from a programming perspective, the use of unified memory can often introduce performance regressions and a surprising level of additional complexity.

This presentation explores the use of unified memory on modern GPUs, the low-level details of how unified memory is realized on an x86-64 system, and some of the tools that can be used to understand exactly what's happening on your GPU.

Interview:

What's the focus of your work these days?

I spend most of my time trying to understand mathematical objects and algorithms before implementing them as efficiently as I possibly can across different hardware. This work spans across all levels of the computing hierarchy.

What's the motivation for your talk at QCon London 2024?

To help the audience understand how unified memory can both help and hinder software development, as well as the tools that can be used to profile and understand the code that runs on their GPUs.

How would you describe your main persona and target audience for this session?

There will be something in this talk for a range of personas, provided that they're interested in GPUs. The talk will cover the high-level ideas behind how a GPU works before we dive into the deep details. This talk will likely not appeal to highly-experienced GPU programmers. I would like to reach an audience that
is curious about how programming a GPU is different compared to programming for a CPU.

Is there anything specific that you'd like people to walk away with after watching your session?

Software engineering is a constant balance between complexity and performance. I would like everyone to walk away with the knowledge that this relationship is not necessarily linear; indeed, we can sometimes reduce complexity without harming the overall performance of our software. On the other hand, it is sometimes necessary to make our programs slightly more complicated in order to achieve peak performance.


Speaker

Joe Rowell

Founding Engineer @poolside.ai, Low-Level Performance Engineer, Cryptographer and PhD Candidate @RHUL

Joe Rowell is a Founding Engineer at poolside. Prior to joining poolside, Joe studied for a PhD in Cryptography at Royal Holloway, University of London under Martin Albrecht. Joe's work focuses on efficiency at all layers of the software stack from both theoretical and practical angles.

Read more

From the same track

Session performance

A Walk Along the Complexity-Performance Curve

Monday Apr 8 / 10:35AM BST

Software performance and complexity are related. It’s common for refactoring to introduce unanticipated regressions, and for performance optimisations to attract scrutiny in code review; how much performance improvement is worth a perceived loss of readability?

Speaker image - Richard Startin

Richard Startin

Senior Software Engineer @Datadog

Session

Panel: What Does the Future of Computing Look Like

Monday Apr 8 / 05:05PM BST

The future of computing promises to be revolutionary. This panel dives into cutting-edge advancements that will redefine how we interact with technology. We'll explore groundbreaking concepts and discuss their potential to transform our world.

Speaker image - Julia Lawall

Julia Lawall

Senior Scientist @INRIA

Speaker image - Matt Fleming

Matt Fleming

CTO @Nyrkiö, Former Linux Kernel Maintainer @Intel and @SUSE

Speaker image - Joe Rowell

Joe Rowell

Founding Engineer @poolside.ai, Low-Level Performance Engineer, Cryptographer and PhD Candidate @RHUL

Session Linux kernel

Opening the Box: Diagnosing Operating-System Task-Scheduler Behavior on Highly Multicore Machines

Monday Apr 8 / 11:45AM BST

An operating system task scheduler is responsible for placing tasks on cores and for selecting which task is allowed to run, at what time. As such, the scheduler is a critical component of any operating system and has a major impact on application performance.

Speaker image - Julia Lawall

Julia Lawall

Senior Scientist @INRIA

Session

Practical Benchmarking: How To Detect Performance Changes in Noisy Results

Monday Apr 8 / 03:55PM BST

Finding statistically significant changes in performance results has always been challenging but now that most of our code runs on hardware and infrastructure we don't own, we need methods and tools for detecting performance changes in noisy data.

Speaker image - Matt Fleming

Matt Fleming

CTO @Nyrkiö, Former Linux Kernel Maintainer @Intel and @SUSE

Session

Unconference: Performance Engineering Unleashed

Monday Apr 8 / 02:45PM BST

An unconference is a participant-driven meeting. Attendees come together, bringing their challenges and relying on the experience and know-how of their peers for solutions.