We Had 100 Dead Alerts Firing for Services That No Longer Existed. So I Built a Kubernetes Operator.
TL;DR: I built and open sourced a Kubernetes operator that manages Grafana Cloud dashboards, alert rules, and SLOs as code — with automatic cleanup when services are decommissioned. It solves the "...

Source: DEV Community
TL;DR: I built and open sourced a Kubernetes operator that manages Grafana Cloud dashboards, alert rules, and SLOs as code — with automatic cleanup when services are decommissioned. It solves the "100 orphaned alerts" problem by coupling Grafana resource lifecycle to Kubernetes resource lifecycle. It was a Tuesday afternoon when someone on the team noticed that Grafana was still sending alerts for a service we'd decommissioned four months ago. Not one alert. Not five. We found over 100 alert rules in Grafana Cloud that had no corresponding live service. Some went back almost a year. No one cleaned them up — ownership was unclear after teams changed. The alerts just stayed there, quietly firing, quietly getting ignored, quietly eroding trust in the entire alerting system. That's when I started building the Grafana Cloud Operator. The Problem With Managing Grafana Manually If you've worked on a platform team, this scenario is probably familiar. Grafana is great for interactive observabil