There are many possible node types for each cloud provider, which makes comparing them difficult. We chose to benchmark both 2 and 4 CPU nodes. Memory was determined by whatever CPU type we chose.
For self-managed and dev clusters, we just benchmarked clusters we had on-hand.
To make analyzing the results easy, we onboarded all our clusters to a single Robusta.dev account. Then we opened to the Node page and sorted by available CPU.
This let us track all clusters on a single page.
We then exported the data to a spreadsheet, attached below.
First, the raw results.
CPU and memory efficiency were calculated as (Total - Reserved) / Total. Higher scores are better.
A few things are immediately obvious:
Memory efficiency is much worse than CPU efficiency, regardless of cloud provider.
AKS and GKE are the worst offenders, with AKS reserving up to 46% of memory, and GKE reserving up to 53% of CPU.
We only measured one node type for AWS but it was pretty good! (Note to self: ask the team why we didn't benchmark two node types on AWS.)
Some clusters, like Rancher and CIVO, supposedly have 100% efficiency. I'm not sure that's a good thing! Reserving some resources for the node is important to guarantee stability. Are they reserving resources in some other way (e.g. using DaemonSets for system workloads or using static pods.) or are they playing it risky? Not sure.