Case study

Case Study: Kernel update triggered high load from Go-based observability

This example shows how a production infrastructure problem can be investigated methodically, improved safely and turned into clearer operational practice.

Context

A customer experienced a sharp increase in load attributed to the Dockerd solution shortly after applying a kernel update. Because the platform included Go-based application containers, those containers became the first suspected cause.

The customer disabled the Go-based app containers, expecting the Dockerd load to drop. It did not. That made the incident harder to reason about because the obvious suspect had already been removed from the equation.

The problem

  • High load appeared against the Dockerd solution after a kernel update, creating a strong but unproven link between Docker, the update and the application stack.
  • Go-based application containers were suspected, but disabling them did not resolve the Dockerd load.
  • The remaining load source was not obvious because Netdata, another Go-based component, was still running and interacting with Docker metrics outside the disabled app containers.
  • The customer needed a structured process-level review rather than more trial-and-error solution restarts.

Our approach

  • Reviewed host load, Dockerd behaviour, process-level CPU usage, container activity and solution behaviour instead of assuming the application containers were still responsible.
  • Separated actual container workload from Docker daemon activity and host-level solutions so the investigation did not stop at Docker alone.
  • Identified that the apparent Dockerd load was being driven by Netdata observability, a Go-based observability solution still running on the host.
  • Explained why the issue had been easy to miss: the customer had focused on Go application containers, but Netdata itself is Go-based, so disabling the app containers did not remove every Go workload or Docker-related monitor from the server.

Hands-on outcomes

Root cause identifiedThe continued Dockerd high load was traced to Netdata observability rather than the disabled application containers.
Assumption correctedThe kernel update and Go application containers were treated as clues, not conclusions.
Reduced unnecessary changesThe customer avoided further disruptive container changes once the host-level observability process was identified.
Straightforward handoverThe findings showed why Dockerd appeared responsible, how Netdata was causing the pressure, and why observability tools should be included in future load investigations.

Relevant technologies and keywords

These are the main technologies, solutions and search terms connected to this case study.

LinuxKernel updateHigh loadGoDocker containersNetdataObservabilityCPU usagePerformance Tuning troubleshootingProcess analysis

Want assist with a similar issue?

Send the symptoms, affected system, recent changes and organisation impact. We will suggest the most appropriate route: emergency engineering assistance, a fixed-scope engineering fix, an infrastructure review or a wider project.

Discuss your project