Case study
Case Study: Kernel update triggered high load from Go-based observability
This example shows how a production infrastructure problem can be investigated methodically, improved safely and turned into clearer operational practice.
Context
A customer experienced a sharp increase in load attributed to the Dockerd solution shortly after applying a kernel update. Because the platform included Go-based application containers, those containers became the first suspected cause.
The customer disabled the Go-based app containers, expecting the Dockerd load to drop. It did not. That made the incident harder to reason about because the obvious suspect had already been removed from the equation.
The problem
- High load appeared against the Dockerd solution after a kernel update, creating a strong but unproven link between Docker, the update and the application stack.
- Go-based application containers were suspected, but disabling them did not resolve the Dockerd load.
- The remaining load source was not obvious because Netdata, another Go-based component, was still running and interacting with Docker metrics outside the disabled app containers.
- The customer needed a structured process-level review rather than more trial-and-error solution restarts.
Our approach
- Reviewed host load, Dockerd behaviour, process-level CPU usage, container activity and solution behaviour instead of assuming the application containers were still responsible.
- Separated actual container workload from Docker daemon activity and host-level solutions so the investigation did not stop at Docker alone.
- Identified that the apparent Dockerd load was being driven by Netdata observability, a Go-based observability solution still running on the host.
- Explained why the issue had been easy to miss: the customer had focused on Go application containers, but Netdata itself is Go-based, so disabling the app containers did not remove every Go workload or Docker-related monitor from the server.
Hands-on outcomes
Relevant technologies and keywords
These are the main technologies, solutions and search terms connected to this case study.
Related solutions
Relevant solutions for similar infrastructure problems.
Want assist with a similar issue?
Send the symptoms, affected system, recent changes and organisation impact. We will suggest the most appropriate route: emergency engineering assistance, a fixed-scope engineering fix, an infrastructure review or a wider project.