Innovation

Transforming Prometheus Alerts into Clear Insights for the Hubble Space Telescope Institute (STScl)

Company Overview

When your job is to support the safety and scientific endeavor of the Hubble Space Telescope it is critical that you are a safe pair of hands when it comes to operating your infrastructure!

STScI knew they had to change as they managed the transition to Kubernetes knowing the complexity and importance of their mission. We joke in this devops world that we cannot go into the data centre to fix our IT; STScI takes this to a whole new level!

The DevOps Plaforms Team at STScI, led by James Wu, recognised the need to bring about change to help developers build things of the highest quality without compromise on quality. Since STScI used a traditional service model with software running directly on VM’s they decided that their move to Cloud Native needed a new approach to their infrastructure. This was very much a developer-focused change and driven by a bottoms up motion. The decision was taken to adopt a cloud native stack and adopt a platform engineering operating model to support this initiative. STScI evaluated and selected Kubernetes and Rancher as their underlying platform!

Recognising that operating this platform was going to create new challenges let alone striving to improve developer experience in this transition, James looked to the community ideas on day 2 operations and found Robusta, and quickly developed his ideas around platform operations. Building, learning and operating multiple Rancher clusters, many of which have strict security requirements brings some unique challenges but the desire to not reinvent the wheel also figured highly on the agenda.

STScI used Prometheus already and were keen to embrace it but to simplify the observability stack as much as possible.

STScI currently operates multiple clusters with limited Internet access and provides formidable developer support with a platform team of only three engineers. To do this they embrace the built in graphing, alerting and log collation, and automation, reducing the need to keep building dashboards or writing complex queries.

From OSS to Enterprise

STScI deployed Robusta and within a couple of hours were seeing results. Enriched Slack messages, with graphs, logs etc. provided them with the context everyone needed to do their job significantly reducing the need to access the clusters via kubectl.

The key is that Robusta is a dedicated Kubernetes Observability and Operations platform – it supercharges Prometheus and makes Kubernetes troubleshooting accessible to anyone.

Robusta decreases the barriers to effectively make sense of alerts on k8s (human-readable messages, a way to visualize events across the whole environment over time, and less noise in general), which eases the cognitive burden for engineers tasked with operating clusters and reduces the risk of configuration drift. Robusta monitors all the Prometheus alerts and Kubernetes errors that occur, and surfaces the important ones, so that no problem goes unnoticed. The result – STScI can make quick changes to their clusters.

Day 2 operations are so much more than alerting

James and the team at STScI have made much progress but they are not done yet. On the agenda right now is the cost optimisation where they are planning to use KRR to both measure infrastructure efficiency but also as a tool to help their developers optimise the applications at run time. And of course security is a major concern for STScI so James and his team are also implementing role-based access controls in the Robusta UI to ensure everyone has all of the context they need without needing privileged access to the clusters. This gives rbac controlled access to automate many tasks including managing node and pod lifecycle using the embedded workflow.

As James Wu said ”With a small platform team at STScI, Robusta acts as a force multiplier and lets a lean team like ours punch above our weight when it comes to delivering capability”.

Looking ahead

Finally, the team are keen to adopt the AI capabilities in the platform which fortunately can be run on their own private LLMs. James commented that “Robusta strives to work closely with STScI to ensure they are always delivering the most value for STScI’s needs. We value the responsiveness and cooperation, and it goes a long way toward augmenting the capabilities on our team”.

Let’s face it Kubernetes needs to be simple for STScI to operate - that telescope is dramatic enough!

Download in PDF

Trusted By Platform Engineering and DevOps Teams Around The Globe

I really like the stream of information you get simply by installing Robusta. As an operator, it is a no brainer to add it to my clusters. Gives really good insights without a lot of effort.

Matthias Nguyen, Managing Director Unbasical GmbH

"It's the easiest monitoring solution there is for k8s, an excellent, feature rich product, with a team of people behind it you could have a beer with."

Andrew Riddell, IT Systems Manager UGL

“I start mornings by checking production in Robusta. I love how Robusta is opinionated, highlights problems and significant events. After viewing details, I know enough to resolve issues.”

Keir Robinson, Engineering Manager, Navenio

"By adding Robusta to kube-prometheus-stack and enabling alert grouping, we reduced the number of Slack messages by 90% without missing a single important notification."

Yoni Golob, DevOps Engineer,
Placer.ai

“I use Robusta for governance of my Kubernetes infrastructure. A major strength is the Prometheus integration (kube-prometheus-stack).”

Roberto Iannone, DevOps Engineer, RiAtlas

“We manage kubernetes clusters for multiple clients. With Robusta, it's far easier to compare deployments across our clusters, and notice discrepancies in deployed versions.”

Asbjørn Dyhrberg Thegler, DevOps Consultant, Deranged

“One of the most satisfying features of Robusta is consolidating monitoring data from dozens of clusters across multiple regions into a unified interface.”

Silviu Iaşcu, Director Infra Operations & Cloud,
Jedox

“We adopted Robusta for one of our clients in order to have enriched alerts coming from both in-cluster Kubernetes events and an out-of-cluster Alert Manager installation.”

Diego Ojeda, DevOps Consultant, BinBash

“With Robusta, I don’t need to check my cluster’s health every day. If something needs my attention, I get a message in Teams. I can escalate critical issues immediately.”

Oleg Minaev, Lead Backend Developer, Aureliym GmbH

“We're using Robusta to standardize k8s alerting. Previously, we were using kube-prometheus-stack but the default alerts were too noisy and it was harder to configure”.

James Wu, Space Telescope Science Institute

“I told my devops team to evaluate all the observability tools they want, and to choose the best one for Kubernetes. They chose Robusta."

Yonatan Itai, VP R&D, Cyera

Lorem ipsum dolor sit amet consectetur. Lectus cras mauris egestas vestibulum libero quam aliquet tortor. Platea malesuada quis quam ultrices eu egestas.

Lorem ipsum dolor sit amet consectetur. Lectus cras mauris egestas vestibulum libero quam aliquet tortor. Platea malesuada quis quam ultrices eu egestas.

Lorem ipsum dolor sit amet consectetur. Lectus cras mauris egestas vestibulum libero quam aliquet tortor. Platea malesuada quis quam ultrices eu egestas.

Create your account to get started

Email us, and we'll provide you with a login link to complete your onboarding from your computer, where Robusta performs at its best.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.