Kubernetes
January 30, 2023

How Are Prometheus Alerts Configured on Kubernetes with prometheus-community/prometheus?

Prometheus is great for alerting, but where do those alerts come from? How is Prometheus configured? Specifically, how is it configured when running on Kubernetes?

Before we set up Prometheus, you need to know about two types of Prometheus rules. They are

  • Alerting rules - Alerts are written in PromQL. They evaluate one or more expressions and fire alerts based on the result. 
  • Recording rules - These are precomputed expressions that can be queried without having to execute the original expression every time. Other expressions can use recording rules as part of their queries to run faster.

We will see how to configure alerts for Prometheus instances installed using the prometheus-community/prometheus Helm chart. Since we are using a Helm chart, we will configure rules using the Helm values file. We will discuss how and where to configure these rules with an example later.

Note: these instructions do not apply to Prometheus instances installed using kube-prometheus-stack, the Prometheus Operator, and the Robusta Helm chart. We will see how to edit rules for those installs in a later post.

What is Prometheus.yml?

prometheus.yml is the global configuration file for Prometheus. Back in the old days, before Kubernetes, it was typically located on servers at /etc/config/prometheus.yml.

Note that prometheus.yml doesn’t define all your Prometheus rules, rather it names other files that contain the actual rules. Traditionally, alerting rules and recording rules are split into separate files, but this is just a convention. 

Where is prometheus.yml on Kubernetes?

It is located at /etc/config/prometheus.yml in the prometheus-server pod. But you probably shouldn’t go ahead and edit that The prometheus-server pod does not have persistent storage. When the pod goes down, your updated configuration will be lost since it is not saved in persistent storage.

How to add rules when using the Prometheus Helm chart?

In the prometheus-community/prometheus Helm chart, Prometheus reads its configuration from /etc/config/prometheus.yml. A ConfigMap with the name “<ReleaseName>-server” manages this file, and updates its content based on your Helm values. The ConfigMap makes sure your rules are not lost when the pod goes down.

Here is an example of that ConfigMap. The contents of prometheus.yml are read from the prometheus.yml key inside the ConfigMap. Everything under that is the standard Prometheus config, as explained in the Prometheus documentation.

prometheus.yml configuration in Helm values file

To modify prometheus.yml when using the prometheus-community/prometheus Helm chart, change the Helm value serverFiles.prometheus.yml

What is alerting_rules.yml?

By default, prometheus.yml points at another file called alerting_rules.yml, which contains the alerting rules themselves. To modify the rules on Kubernetes, change the Helm value serverFiles.alerting_rules.yml.

You can also split your alerting rules into multiple files. You should edit the Helm values for Prometheus as follows:

Example of alerting rule configuration in Helm values

What is recording_rules.yml?

Just like alerting rules, recording rules are traditionally saved in the file “/etc/config/recording_rules.yml”. 

When using the prometheus-community/prometheus Helm chart, set the contents of this file using serverFiles.recording_rules.yml

Example of recording rule configuration in Helm values

Hands-on example

Let us configure an alerting rule to notify us when a Kubernetes job fails.

Prerequisites:

Before we begin, make sure you have the following

Step 1:

Add the following alert at “alerting_rules.yml”.

alerting_rules.yml:
    groups:
    - name: Jobfailed
      rules:
      - alert: KubernetesJobFailed
        annotations:
          description: |-
            Job {{ $labels.namespace }}/{{ $labels.exported_job }} failed to complete
              VALUE = {{ $value }}
              LABELS = {{ $labels }}
          summary: Kubernetes Job failed (instance {{ $labels.instance }})
        expr: kube_job_status_failed > 0
        for: 0m
        labels:
          severity: warning

Step 2: 

Run helm upgrade helm upgrade <release_name> -f <helm_values.yaml> prometheus-community/prometheus

Step 3: 

Open the Prometheus webUI -> Status -> rules and you will see your rules.

Alerts in the Prometheus UI

Step 4: 

Exec into the prometheus-server pod, change directory to /etc/config. Run head -n 15 alerting_rules.yml to see the updated alerts file.

Alerts inside the prometheus-server pod

What it looks like

Here is the default alert notification you will receive in Slack.

Default job failed alert in Slack

You can see why the alert occurred, by forwarding the alert to Robusta. In the below example, the output of job logs have been attached to the alert by Robusta.

Job failed alert with context

How to add rules for other Prometheus installs on Kubernetes?

Other Prometheus setups like Robusta.dev use the Prometheus Operator. When using the operator it is best to define rules using the PrometheusRule CRD. We will cover this in a future post.

Questions? Comments?

‍Yes, please. I’m on Twitter and LinkedIn. To learn more about monitoring Kubernetes with Prometheus and runbook automation, view the KubeCon talk our team gave.

Never miss a blog post.