Kubernetes
April 8, 2023

Fairness, Kubernetes Pricing, and Burstable CPUs

Well, that blew up fast.

We benchmarked node efficiency on a bunch of Kubernetes providers. After reading the responses, I have more questions than I started with. The hot issue is how burstable nodes should be scheduled.

What are burstable instances?

Here’s the gist of it:

  1. AWS, GCP, and Azure all have “burstable” node types.
  2. Burstable nodes guarantee 1 CPU, but let you temporarily “burst” to 2 CPUs
  3. There are variants with different CPU numbers

It’s important to read the fine print to understand how a burstable node actually performs.

Usually, instead of saying “1 vCPU that can burst to 2 vCPUs”, vendors write something like “2vCPUs with a 20% baseline”, meaning 0.4 vCPUs “normally” that can burst until 2vCPU.

When can burstable instances burst?

There are conditions for bursting. After all, if you could burst all the time, this would be a regular node.

Usually:

  1. Nodes need to first under-use CPU to burst later
  2. By under-using CPU, you gain credits
  3. By bursting, you use credits

There are variations on this. T3 instances on AWS let you burst forever, but at a substantial cost. (AWS calls it a “small additional charge”, but my calculation says that bursting increases the price by 50-239% - see appendix 1.)

How do cloud vendors implement burstable nodes?

How does this work physically? Obviously cloud vendors are running multiple virtual machines on the same hardware. They over-allocate vCPUs to regular CPUs and then… magic? It’s not clear how they handle the scenario where everyone bursts at once. I’m new to the concept of burstable CPUs, so if there’s a public explanation somewhere please send it to me.

Until then, I’ve speculated in Appendix 2.

The big controversy: How should burstable CPUs be scheduled?

Back to the point of this post. Think hard before continuing: how do you schedule Kubernetes Pods to burstable CPUs?

There’s no great answer here. For simplicity, imagine a 1vCPU node that can burst to 2vCPUs. GKE takes the approach that this is fundamentally a 1vCPU node. So they schedule 1vCPU worth of requests to the node and mark the remaining 1vCPU as unavailable. Then they get weird numbers in node efficiency benchmarks.

On the other hand, AWS and AKS schedule the node as if it had 2vCPUs.

Which is better?

I’m honestly not sure:

  • AKS is violating the assumption that Pods are guaranteed their requests. A B4ms isn’t going to guarantee those cores if Pods end up needing them. That’s a hard break.
  • EKS will bill you exorbitantly if you have accurate CPU requests. They schedule nodes to the burst level, then charge you extra if you behave as scheduled.
  • GKE will be… OK? If you set requests according to peak CPU, you won’t benefit from the ability to burst. If you set requests according to average performance, you will benefit. I think GKE’s behavior will surprise end users the most, but maybe it’s the lesser evil?

By the way, we haven’t even touched on memory here. As you can see in the original post, up to 46% of memory is reserved by cloud vendors!

Regardless, it’s been an interesting weekend. I got yelled at online, but learned a bunch. I’ll call it a success.

Below are two appendixes and then closing notes, including a survey I ran on how people expect burstable instances to behave. If you don’t care for the appendixes, jump to the end.

Appendix 1 - Calculating the burst cost of T3.medium on AWS

A T3.medium has 2 vCPUs. The on-demand price is $0.0418/hour.

The docs say:

“In the cases that the T3 instances needs to run at higher CPU utilization for a prolonged period, it can do so for a small additional charge of $0.05 per vCPU-hour.”

Imagine you use the full 2 vCPUs constantly. You’re penalized $0.1 total. (2 vCPUs, each at $0.05.) The cost of the T3.medium went from $0.0418/hour to $0.1418/hour. I wouldn’t call a 3.39x increase in pricing “a small additional charge”.

We can calculate this another way that’s less obvious, but that a sophisticated buyer might use. You buy a T3.medium and use it properly. You’re essentially getting 0.4 vCPUs for the price of $0.0418/hour. (T3.medium’s baseline threshold is 20%, meaning you’re expected to use only 20% of the 2vCPUs on average. That’s 0.4 vCPU.) So you’re paying $0.1045/vCPU. If you get hit by the penalty, now the price is $0.1545/vCPU ($0.1045 + $0.05). That’s a 47.8% pricing hike. Still not a “small additional charge” by any means.

Appendix 2 - How does bursting really work?

What if all the virtual machines running on the same physical machine need to burst at the same time? I don’t know how cloud vendors handle this, but I can speculate. A few options:

  1. They don’t
  2. Hypervisor migrations
  3. Half of each physical host runs the cloud provider’s own workloads. These get paused if necessary.

Another idea is to fill the host with 50% spot instances, but I don’t think that’s hermetic because spot instances are still guaranteed a 60-120 second notice before interruption. 

Closing Notes

All this theory is good, but what do end users expect? I surveyed LinkedIn followers to find out:

Who are these people who answered the poll? LinkedIn lets you see, so I checked. Participants are DevOps, SREs, and software engineers at real-world companies.

I’m not sure what’s most fair from cloud vendors. For now, it’s back to work on robusta.dev for me. We have a few announcements to make at KubeCon and they need final touches.

Update: announcements have been made! We released an open source cli for determining requests and limits based on historical Prometheus data.

Never miss a blog post.