Your Cluster Is Already Watching. That's the Observer Pattern.
Platform Engineer + CNCF Ambassador, AWS community builder. I design scalable cloud-native platforms and love making teams faster and safer.
Your Cluster Is Already Watching. That's the Observer Pattern.
At some point, most platform teams hit the same wall.
A service goes down. An secret expires. A node runs out of disk. And the first question in the postmortem is always the same: why didn't we know sooner?
So you add more monitoring. More alerts. More dashboards. And somewhere along the way, someone suggests: "we should build a controller that watches for this and reacts automatically."
That controller is an Observer. You probably didn't call it that either.
The problem
Manual reconciliation doesn't scale.
When your platform grows — more clusters, more teams, more services — the list of things someone needs to watch and react to grows with it. Certificate expiration. Secret rotation. Node pressure. Deployment drift. Config changes that need to propagate across namespaces.
If the reaction to any of these depends on a human noticing and acting, you have a fragile system. Not because your team is slow, but because humans aren't designed to watch dozens of state transitions simultaneously and react to all of them in real time.
The deeper problem is coupling. When your response logic lives in runbooks, scripts, or people's heads, it's tightly coupled to whoever wrote it. It doesn't compose. It doesn't scale. And it breaks in the exact moment you need it most — when things are moving fast and nobody has time to read the runbook.
Why it hurts
Think about secret rotation without automation. Someone sets a reminder. The reminder gets missed. The secret expires in production on a Friday at 5pm. Three people get paged. The incident takes two hours to resolve, not because the fix is hard, but because nobody had a clear picture of what was watching what.
Or think about configuration drift. A team manually edits a ConfigMap in staging. Nobody notices it diverged from what's in Git. Three weeks later, a deployment to production behaves differently than expected. The diff is one line. Finding it takes half a day.
These aren't rare edge cases. They're the steady background noise of a platform that reacts to change manually instead of automatically.
What the pattern says
The Observer pattern says: when the state of something changes, every component that cares about that change gets notified and can react — without the thing that changed needing to know who's watching.
There are two roles. The subject holds state and emits events when that state changes. The observers subscribe to those events and decide what to do with them. The subject doesn't care how many observers there are or what they do. The observers don't care how the state changed or who triggered it.
The key property is decoupling. The thing that changes and the things that react to that change don't need to know about each other. You can add new observers without touching the subject. You can change how the subject works without touching the observers.
Where you see this in practice
The Kubernetes controller loop is the Observer pattern implemented at cluster scale — and it's the most important example in this entire series.
Every controller in Kubernetes follows the same structure: watch a resource for changes, compare the current state against the desired state, and act to close the gap. The API server is the subject. Controllers are the observers. When you create a Deployment, the Deployment controller observes that event and starts reconciling — creating ReplicaSets, scheduling Pods, updating status.
You didn't wire them together explicitly. The controller registered its interest in Deployments, and the API server notifies it when something changes. That's the pattern.
Operators extend this to your own domain. When you write a Kubernetes Operator — or adopt one like External Secrets Operator, Cert-Manager, or Argo Rollouts — you're writing an Observer for a specific kind of state change. External Secrets Operator watches for ExternalSecret resources and reacts by fetching secrets from Vault or AWS Secrets Manager and syncing them into Kubernetes Secrets. Cert-Manager watches for Certificate resources and reacts by requesting TLS certificates from Let's Encrypt.
Each of these is an automated response to a state change. No runbook. No human in the loop. The observer watches, the event happens, the reaction fires.
Argo Rollouts takes this further into deployment strategy. It observes metrics from Prometheus during a canary deployment and reacts automatically — promote if the error rate stays below the threshold, roll back if it spikes. The deployment strategy becomes a set of Observer reactions to real-time system state rather than a manual decision made by an engineer watching a dashboard.
When to write one and when not to
Knowing this pattern helps you make a decision that comes up more often than it should: should we write a custom operator for this?
The answer is usually: only if the thing you're watching and the reaction you need aren't covered by an existing operator, and only if the operational value justifies the maintenance cost.
Writing an operator is committing to a stateful, long-running Observer. It will need to handle edge cases — what happens if it misses an event? What if the reaction fails halfway? What if the resource it's watching gets deleted while it's processing? These aren't reasons not to write one, but they're reasons to reach for an existing operator first and build a custom one only when you have a clear gap.
The pattern also helps you evaluate existing tools more clearly. When you're comparing Cert-Manager vs. manually rotating certificates via a CronJob, you're really comparing "a well-designed Observer with proper reconciliation semantics" vs. "a polling loop that approximates observation." That framing makes the tradeoff obvious.
What changes when you think this way
When you see your platform through the Observer lens, the question shifts from "who is responsible for watching this?" to "what should be watching this automatically?"
That's not just a semantic difference. It changes how you design for reliability. Instead of writing runbooks that tell humans what to watch and when to react, you build controllers that watch continuously and react consistently. The human moves from being the observer to being the designer of observers.
It also changes how you think about toil. Toil — the repetitive, manual, automatable work that plagues ops teams — is almost always a symptom of missing observers. If someone is manually rotating secrets, there's no secret observer. If someone is manually syncing configs across environments, there's no config observer. If someone is manually checking whether a canary deployment looks healthy enough to promote, there's no metrics observer.
Name the toil. Find the missing observer. Build or adopt it.
What's something your team is still watching manually that should have an observer by now? Hit reply at blog@parraletz.dev — I'd bet the pattern already exists.


