Apr 12, 2022

Why and How to Audit Kubernetes Changes (Updated)

The term "root cause analysis" makes most people think of complex machine-learning algorithms. But the core idea is very simple.

The term "root cause analysis" makes most people think of complex machine-learning algorithms.

But the core idea is very simple. What's more, root cause analysis is increasingly important one as digital systems become more complex.

For example, imagine an application that experiences CPU throttling. Back in the old days, the cause of this was simple. The server didn't have enough CPUs and you needed to either optimize the application or put it on a bigger server. However, in a modern Kubernetes cluster, the cause is not obvious and neither is the fix.

What is root cause analysis?

Imagine the following: you plug in an electric kettle, a fuse blows, and the power goes out. You obviously suspect the kettle.

Root cause found!

Why root cause analysis matters in Kubernetes?

Most Kubernetes clusters are frenetic, energetic places. Every now and then, some human updates a line of YAML, in one of the many teams working on the cluster. Now, long sleeping Kubernetes controllers wake up. They work slavishly to make the intentions conveyed in that YAML true. They spin pods up, rapidly pulling images, mounting volumes, and even terminating other pods if necessary. All in a mad rush to make status equal spec.

But when something goes wrong, can we turn back the arrow of time and see which human-made change triggered the problem?

Indeed we can! Lets see a few ways to do so.

Sources of truth

The four standard ways of monitoring changes to a Kubernetes cluster are:

Instrument your CI/CD pipeline
Use GitOps
Use the Kubernetes Audit API
Connect to the API Server and listen for changes

The order in the list above is deliberate. Each method sees types of changes than the previous method.

See it running in your environment.

We'll help you get Robusta installed on your cluster and walk through a live incident.

Try now

Why and How to Audit Kubernetes Changes (Updated)

What is root cause analysis?

Why root cause analysis matters in Kubernetes?

Sources of truth

See it running in your environment.

Prefer to tell us about your setup first?