$ cat post/the-blinking-cursor-/-we-documented-nothing-then-/-the-patch-is-still-live.md

18MAY20

the blinking cursor / we documented nothing then / the patch is still live

Title: Kubernetes Complexity Fatigue and the Long Road Home

May 18, 2020 was a weird day. It started like any other; coffee, breakfast, then the usual morning scramble to review emails and set my team’s priorities for the day. But there was this lingering feeling of something shifting. The pandemic had forced everyone into remote work, but it also seemed to be reshaping how we approached tech—how we built systems, managed infrastructure, and even thought about our roles in IT.

Today, I spent some time digging through a particularly thorny issue with our Kubernetes cluster. Our application, which uses ArgoCD for GitOps deployment, had started misbehaving in a strange way. Pods would intermittently crash, but there was no clear pattern or error message to go on. The usual suspects like kubectl and logs didn’t reveal much. I decided to take a more aggressive approach.

I fired up an eBPF session. The idea of using eBPF (extended Berkeley Packet Filter) had been on my mind for a while—its ability to provide deep packet inspection without the overhead of traditional kernel modules was intriguing, especially in our complex Kubernetes setup. As I set it up and started tracing, something caught my eye: there were repeated requests to a specific API endpoint that seemed suspiciously frequent.

It turned out to be the health check probes configured on our application’s deployment YAML files. These probes had been running every second, hammering the API server with over 100K hits per hour! This was clearly causing resource contention and leading to those crashes. It wasn’t a bug in our app or ArgoCD; it was a misconfiguration.

Debugging this brought back memories of the Kubernetes complexity fatigue everyone seemed to be talking about. The tooling is amazing, but as projects scale up, the complexity doesn’t always scale down proportionally. There are so many moving parts and configurations that go into running a cluster, and sometimes the smallest tweak can have huge implications.

Thinking about this, I couldn’t help but feel a bit of nostalgia for simpler times—when infrastructure was more straightforward, when a single server did everything. Now, with all these layers and tools, it feels like we’re building castles in the sky. But they are necessary castles; without them, our applications can’t reach the scale or performance that users expect.

As I settled into writing a fix for this issue, I couldn’t help but wonder how things would have played out if we had started with a more thoughtful approach to automation and monitoring from the beginning. Would it have made a difference? Probably not entirely, because infrastructure is inherently complex, but certainly, better choices might have mitigated some of these issues.

And speaking of choices, today’s HN featured “Ask HN: Am I the longest-serving programmer – 57 years and counting?” It reminded me of how much has changed in tech over my career. In those early days, things were simpler—code was smaller, servers were fewer, and networking was just about getting from one machine to another. Now, it’s all about managing clusters, microservices, and distributed systems.

But the core principles haven’t really changed: good engineering practices, thorough testing, and a deep understanding of what you’re building. That’s still where the real value lies. Whether we’re talking eBPF or traditional kernel modules, at their heart, they are tools to solve problems. The key is knowing when to use them wisely.

As I finished up my changes for today and committed them to our GitOps repo with ArgoCD, a feeling of satisfaction washed over me. Despite the complexity, we managed to get this right. And maybe that’s what it comes down to: managing not just the tech, but also the complexity that comes with it.

Stay tuned for more adventures in the wild world of Kubernetes and infrastructure, where simplicity meets complexity every day. Until next time, keep your scripts clean and your logs verbose!

[End of blog post]