Most people are familiar with CrashLoopBackOffs, but there are actually many ways a pod can unexpectedly stop running. Here are the top four:
When OOM Kills occur: A pod uses up “too much” memory. That is, more memory than the limit or more memory than is available on the node.
How OOM Kills work: The Linux Kernel kills the process, causing an OOMKill (Out of Memory Kill). No warning is given, and Kubernetes has little control over this process.
Why Kubernetes does OOM Kills: You’re out of memory. But Kubernetes doesn’t kill the pod, the Linux kernel does.
Deliberately reproducing OOM Kills: Run `kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/oomkills_demo/oomkill_job.yaml `
What it looks like:
When CrashLoopBackOff occurs: Every time a pod crashes, Kubernetes restarts it after some time. The time between each restart is called Backoff time, and it is increased gradually. Too many restarts and it ends up in the CrashloopBackoff state.
How CrashloopBackoff work: The pod goes into CrashloopBackoff state and remains that way until the BackOff period ends.
Why kubernetes does CrashloopBackoff: Your pod is repeatedly crashing and restarts don't help. To avoid unnecessary repetitive crashes, Kubernetes waits a little in the hope that things will improve. (E.g., if the pod is crashing because an external service is down, and it goes back up.)
Deliberately reproducing CrashloopBackoff: Run `kubecl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/crashloop_backoff/create_crashloop_backoff.yaml`
What it looks like:
What are init-containers: Init-containers are used to perform preparations before your main container runs. Your main container runs only if the Init-container exits successfully.
When Init:CrashLoopBackOff occur: When an Init-container has a CrashLoopBackOff, this is called – you guessed it – an Init:CrashLoopBackOff
Deliberately reproducing Init:CrashLoopBackOff: Run `kubectl apply -f https://raw.githubusercontent.com/robusta-dev/kubernetes-demos/main/init_crashloop_backoff/create_init_crashloop_backoff.yaml`
What it looks like:
Get the logs (just like CrashLoopBackOff) but make sure you specify the Init-container you want logs for: ‘kubectl logs <pod-name> -c <Init-container>’. Try to understand if this is an issue with the Init-container itself (an application issue) or a Kubernetes infrastructure related issue.
When evictions occur: A node runs out of resources, so Kubelet starts terminating pods to reclaim resources for essential processes. Alternatively, you can use the `Eviction API`, and manually terminate a pod. Finally, pods can be evicted due to Pod Priorities, as illustrated below.
Why Kubernetes evicts pods: Node-pressure evictions happen to protect the node. Other evictions happen for the reason you choose.
How you can deliberately reproduce Pod Evictions:
In order to reproduce an eviction without causing chaos in your cluster, we’re going to pick a victim node that will be interrupted during the process. Don’t try this on a production cluster!
What it looks like:
Yes, please. I’m on Twitter and LinkedIn. Also, check out Robusta.dev to get notified when problems like these occur in your cluster.