Back to blog

Sep 24, 2022

You Can't Have Both High Utilization and High Reliability

Everyone wants high utilization and high reliability. The hard truth about Kubernetes is that you need to pick one or the other.

You Can't Have Both High Utilization and High Reliability

Everyone wants high utilization and high reliability. The hard truth about Kubernetes is that you need to pick one or the other.

Here is a question I asked on LinkedIn and Twitter that demonstrates the dilemma:

A Kubernetes pod uses 2 CPU on average and occasionally spikes to 3 CPU. What should its resource allocation look like?

The question is simple: How much CPU should you request for a pod that usually needs two CPUs, but sometimes needs three. Lets look at a few strategies and the tradeoffs.

Overcommit with CPU request = 2, limit = 3

This is the naive strategy. At first glance, it seems good. You set aside 2 CPUs for your pod but let it access 3 CPUs when necessary.

There are two problems with this approach:

  1. The 3rd CPU might not be available when you need it.
  2. CPU limits are usually a bad practice on Kubernetes and can mess up P99 latency.

Setting a request of 2 and a limit of 3 means you are optimizing for utilization at the expense of reliability!

Underutilize with CPU request = 3

Now we're talking! For the first time, you are guaranteed 3 CPU during a spike. A CPU request of 3 is a hard promise by Kubernetes that you will get 3 CPUs when you need them.

The downside is decreased utilization and wasted compute capacity. The node will be underutilized when the pod isn't spiking. If all the pods in your cluster behaved like this spikey pod, you would have 66% utilization most of the time! That's a lot of wasted compute capacity!

In short, a CPU request of 3 chooses reliability over utilization.

See it running in your environment.

We'll help you get Robusta installed on your cluster and walk through a live incident.

Prefer to tell us about your setup first?