Understanding and Tuning System Performance with CPU Hardware Counters

Abstract

Counters are fundamental to monitoring: how many requests were processed, how many CPU-seconds consumed, how many bytes sent over a network. Very likely you are already monitoring your applications and operating systems via the hundreds or thousands of counters they expose.

But did you know that modern CPUs have counters right inside the silicon circuitry? They count things like number of cache hits, or RAM accesses, or the number of instructions processed. These can give an amazingly detailed view of how the system is performing.

Historically it was impossible to access CPU hardware counters on cloud servers, as they were hidden by the hypervisor. But in the last few years, CPU manufacturers and cloud providers have added interfaces to bridge this gap

This talk looks at:

  • What are CPU hardware counters?
  • What kinds of CPU counters are most useful?
  • Using hardware counters as metrics, and via profiles.
  • Specific tools: Prometheus Node Exporter, CAdvisor, Linux Perf, PerfGo
  • Real-world examples where CPU hardware counters enable performance improvements.
  • We will focus on the Linux operating system.

By the end of the session you should:

  • Know what data is available via CPU hardware counters.
  • Be inspired to install some tools to collect and visualise the data.
  • Understand the limitations on shared cloud infrastructure.

Who is this session for?

  • Those interested in system performance.
  • Software engineers or observability architects.
  • Someone ideally familiar with modern CPU architectures - pipelining, multi-level cache, branch prediction.

Speaker

Bryan Boreham

Distinguished Engineer @Grafana Labs, Member of the Prometheus Team, Expert in Distributed Systems and Computer Performance

Bryan Boreham is a Distinguished Engineer at Grafana Labs, working on highly scalable storage for metrics, logs, traces and profiles. A contributor to many Open Source projects since 1988, Bryan is a member of the Prometheus team and previous maintainer of CNCF Cortex and CNI projects. Earlier, he led interest rate trading systems at Barclays Investment Bank, where performance engineering was business-critical. 

His work spans distributed systems, deep performance analysis, and understanding how software really behaves under load.

Read more
Find Bryan Boreham at:

From the same track

Session Deterministic Simulation Testing

A Deterministic Simulation Testing (DST) Journey: From WASM in Go to State Machines in Rust

Monday Mar 16 / 10:35AM GMT

Deterministic simulation testing finds bugs by exploring random execution paths, injecting failures, and letting you replay any failure with a single starting seed.

Speaker image - Alfonso Subiotto

Alfonso Subiotto

Software Engineer @Polar Signals

Session WebAssembly

Designing Language-Agnostic Plugin Systems With Webassembly and Extism

Monday Mar 16 / 01:35PM GMT

Imagine a world where anyone could write plugins/extensions in any languages that interop with the application, regardless of your stack. Extism makes that real by using WebAssembly.

Speaker image - Shivay Lamba

Shivay Lamba

Developer Experience Engineer @Qualcomm, Google Summer of Code Admin @Jenkins

Session Platform Engineering

Fixing the AI Infra Scale Problem by Stuffing 1M Sandboxes in a Single Server

Monday Mar 16 / 03:55PM GMT

The past year has seen an absolute explosion in the use of AI and agents in particular, a trend that is guaranteed to accelerate going forward.

Speaker image - Felipe Huici

Felipe Huici

CEO and Co-Founder @Unikraft, Founder and Maintainer of the Linux Foundation Unikraft Open Source Project

Session

Use<’lifetimes> For<’what>

Monday Mar 16 / 02:45PM GMT

As Rust has become more ergonomic, lifetimes have become more nuanced.By thinking of lifetimes as sets of loans, rather than using the traditional "regions of code" definition, this talk explores advanced lifetime concepts such as variance and higher ranked lifetimes.

Speaker image - Ethan Brierley

Ethan Brierley

Senior Engineer @TrueLayer and Co-Organiser of Rust London

Session

Unconference: Native Languages

Monday Mar 16 / 11:45AM GMT