Abstract
Counters are fundamental to monitoring: how many requests were processed, how many CPU-seconds consumed, how many bytes sent over a network. Very likely you are already monitoring your applications and operating systems via the hundreds or thousands of counters they expose.
But did you know that modern CPUs have counters right inside the silicon circuitry? They count things like number of cache hits, or RAM accesses, or the number of instructions processed. These can give an amazingly detailed view of how the system is performing.
Historically it was impossible to access CPU hardware counters on cloud servers, as they were hidden by the hypervisor. But in the last few years, CPU manufacturers and cloud providers have added interfaces to bridge this gap
This talk looks at:
- What are CPU hardware counters?
- What kinds of CPU counters are most useful?
- Using hardware counters as metrics, and via profiles.
- Specific tools: Prometheus Node Exporter, CAdvisor, Linux Perf, PerfGo
- Real-world examples where CPU hardware counters enable performance improvements.
- We will focus on the Linux operating system.
By the end of the session you should:
- Know what data is available via CPU hardware counters.
- Be inspired to install some tools to collect and visualise the data.
- Understand the limitations on shared cloud infrastructure.
Who is this session for?
- Those interested in system performance.
- Software engineers or observability architects.
- Someone ideally familiar with modern CPU architectures - pipelining, multi-level cache, branch prediction.
Speaker
Bryan Boreham
Distinguished Engineer @Grafana Labs, Member of the Prometheus Team, Expert in Distributed Systems and Computer Performance
Bryan Boreham is a Distinguished Engineer at Grafana Labs, working on highly scalable storage for metrics, logs, traces and profiles. A contributor to many Open Source projects since 1988, Bryan is a member of the Prometheus team and previous maintainer of CNCF Cortex and CNI projects. Earlier, he led interest rate trading systems at Barclays Investment Bank, where performance engineering was business-critical.
His work spans distributed systems, deep performance analysis, and understanding how software really behaves under load.