Minimal Viable Platforms

Everyone is speaking about “platforms” these days. But where should you start, and what is the success criteria?

In this post, I’ll describe the Platform-shaped problems your organization might have - and how to solve them fast.

This post is for companies with up to 1000 developers. Above that scale, you likely have a multi-year Platform initiative and a large team. Your motivation for building a platform will usually be less about improving developer experience, and more about establishing a security and regulatory baseline with less effort. You may find value in this post, but it is written from the perspective of smaller companies.

A warning: too many companies try to build Platforms without defining the success criteria or concrete problems they’re out to solve. This is a dangerous approach. You cannot possibly “improve developer experience” without a detailed map of where devs burn time today. Likewise, you cannot reduce cognitive load with a granular understanding of what technologies devs in your org know well and what they have yet to learn.

Let's list some common problems and how to solve them with an MVPlatform.

Problem 1: Too many URLs to remember

What does the problem look like? Your company uses EKS, Vault, Terraform, 5 observability tools, PagerDuty, Sentry, GitHub, Confluence, Jenkins, and a myriad of other tools. Devs can’t remember the URLs and don’t recall to open them when they’re relevant.

What’s the MVPlatform? Create a Confluence page and put a big table in it, with teams or microservices as rows, tools like Sentry as columns, and links in each cell. It’s not perfect, but you can’t beat this for ROI.

When does the MVPlatform not work? If you have thousands - not dozens - of microservices. In this case, consider a Platform, subtype ServiceCatalog. Good options are Backstage, Cortex, Port, and OpsLevel.

What does this actually look like? Basically, like LinkTree for your microservices.

Problem 2: Devs know Docker, not Kubernetes

What does the problem look like? Your application devs are great at Java and Spring Boot - and can build Docker containers like a boss - but when it comes to getting containers on Kubernetes, they shoot themselves in the foot. They write manifests for individual Pods when they need Deployments. You get paged at 3AM because they did liveness probes wrong. And application metrics are missing in Prometheus because they don’t understand ServiceMonitors.

What’s the MVPlatform? Write a ‘StandardMicroservice’ Helm chart, create a presentation about it, write more documentation than code, and schedule talks for application teams titled ‘Getting Docker Containers on Kubernetes in A Single Day’. To be successful, make assumptions and set strong defaults. The holy grail is for a typical application team to start using the Helm chart by setting an image value in the Helm chart. (Over time they’ll expand beyond that.)

Common Mistakes:

Trying to “simplify things” or “abstract away Kubernetes”. This is not your goal! Your goal is to change the learning curve for Kubernetes, NOT to “abstract it away” - whatever that even means. Complexity comes from your requirements, not from Kubernetes!

‍

Misidentifying this as a technical challenge - with a deliverable in GitHub - and not an educational challenge - with deliverables on Google Calendar (i.e. workshops with dev teams)

When does the MVPlatform not work? I have yet to see this fail entirely! But if you support a lot of application teams, that Helm chart will become unwieldy to maintain. It starts with innocent requests from developers like “Please add a Helm value for Deployment.strategy.rollingUpdate.maxSurge“. You comply - of course - and soon your Helm chart has every field in the Kubernetes API. In this case, consider one of the following:

Only capture the simple case in your Helm chart. Encourage developers to move off it, as their apps grow in complexity.
Use Helm libraries and give devs composable elements they can re-use without being constrained by you.
Don’t use Helm. Use something like Kustomize or YTT, so that devs can override fields you didn’t intend to expose. (If devs can only do things with your ‘platform’ that you anticipated, you haven’t built a platform. Also, you’re a bottleneck.)

Problem 3: DevOps turns into “Tech Support for Devs”

What does the problem look like? You have a Slack channel where application developers ask questions of DevOps engineers. Your team spends 30% of their time answering questions that devs should be able to solve themselves. It is often trivial questions like ‘why is my pod not working’ or ‘how can I find pod logs’.

What’s the MVPlatform? Before solutions, let’s diagnose the problem properly. The following is almost certainly true:

You already have the data that devs need to solve problems on their own, but it’s easier for them to ask the DevOps team
Documentation alone has never fixed this problem for anyone! Even with docs, it will still be easier for devs to ask DevOps instead of fixing problems themselves.
This is both a cultural problem and a data-accessibility problem. You will need to address both.

First, solve the cultural problem: You can’t train every dev in your organization at once, but many DevOps teams have been successful by training the trainers. Turn one dev from each application team into the DevOps expert in that team. Create a special Devs-Do-DevOps program, give participants special recognition, and set a structured curriculum with weekly meetings.

Change incentives to change the outcome. Here is the equation you need to change:

It’s faster for Devs to ask DevOps than look themselves

This is a tractable problem, and the common solution is to bring observability data to devs, instead of bringing devs to observability data.

Example: We did this using HolmesGPT, which surfaces logs, events, and alerts inside Slack. It doesn’t "abstract Kubernetes away" - it just makes the right data available when devs need it.

Common Mistakes:

Confusing a no-data problem with a data-not-accessible problem: if your devs are not looking at pod logs today, adding distributed tracing will not solve this problem! Devs won't look at tracing data either!
Building a platform which “abstracts away Kubernetes” (or whatever other tech you use). Your goal is not to hide observability data - but rather to surface it up. Of course, if you’re able to draw team-boundaries such that devs never need to worry about node-level issues, by all means do so! But if devs need such information, never hide it. And focus on reality today, not where you want to be tomorrow, when deciding what data to show or hide.

Problem 4: Devs Open Tickets to Provision Infrastructure

What does the problem look like? Your DevOps team is swamped with tickets from developers that read ‘Setup a new cluster for my-team’ or ‘Create namespace my-app in cluster prod-eu’. In other words, your team reads requirements in a ticket and then turns around to click a button or write terraform/YAML. And only your team has permissions to do so.

What’s the MVPlatform? Not enough information to answer! Why can’t devs do those operations themselves? Is it for regulatory reasons? (If so, do you need approval workflows or just a documentation trail?) Is it a lack of knowledge (Can you train them?) Is it a lack of permissions (Are you able to give them permissions?) Or is it a desire to maintain company standards by having everyone follow certain templates?

Each of those problems have different solutions! Sometimes the solution is to build a platform, subtype self-service-provisioning, where the outcome is a button devs push to provision a cluster. In that case, the MVPlatform might be a GitHub Action, Azure DevOps Pipeline, or a Jenkins Job.

I don’t have strong recommendations for tooling here - if people let me know on LinkedIn what they use, I’ll update the post with recommendations.

Closing

If you’re building a platform, forget the buzzwords and focus on the problems you want to solve. No one has ever gone wrong by doing that.

I haven’t covered everything in this post - in particular, I’ve skipped the compliance/regulatory motivations for building platforms - but I hope what’s here is enough to start you down the right path of your own MVPlatform.

As always, for questions and comments, you can find me on LinkedIn.

Natan Yellin

,

CEO

Natan has been writing software for over 15 years. He regularly posts about all things Kubernetes on LinkedIn.‍