a diff I once wrote / I git bisect to old code / I left a comment

Title: Kubernetes Complexity Fatigue: A Developer’s Perspective

July 8, 2019. The morning starts like any other with a cup of coffee and an RSS feed sweep through Hacker News. The usual tech tidbits roll in, but this time something catches my eye—an article about eBPF gaining traction. It feels like there’s been a lot of noise around Kubernetes lately—so many new tools, so much complexity—that I needed to take a step back.

I’ve been working with Kubernetes for almost three years now, and it’s definitely grown on me. But the more we layer on top of it, the more I feel this creeping sense that it’s becoming too complex. We’re adding operator after operator, service mesh, and all sorts of sidecars to manage state. It’s like decorating a Christmas tree with every new shiny tool that comes out.

Today, I’m at my desk with a cup of coffee trying to get through some code reviews for our internal developer portal, Backstage. We’ve been experimenting with it for a while now, and the team is starting to see real benefits in terms of documentation and ease-of-use for developers working across multiple projects. But the underlying infrastructure supporting it has its own set of issues.

The latest challenge we faced was getting eBPF running on our Kubernetes cluster. It’s an exciting technology that promises so much—fine-grained control over system behavior, lower overhead compared to traditional kernel modules—but deploying it correctly is a whole new can of worms. The learning curve is steep, and there’s no shortage of gotchas when you start tinkering with low-level stuff.

I’ve spent the morning debugging an issue where eBPF programs were intermittently crashing our pods. It was like trying to catch a ghost—every time I thought I had it figured out, another symptom would appear. The logs didn’t give me enough context, and the error messages were cryptic at best. Eventually, I found that a mismatch in kernel versions between nodes was causing the issue. Once I got that sorted out, things started working much more predictably.

But this morning’s debugging session left me thinking—maybe it’s time to take a hard look at how we’re using Kubernetes. It’s clear that Kubernetes is becoming an incredibly powerful tool for managing containerized applications, but the trade-offs in terms of complexity are high. The moment you start adding operators and sidecars, you’re layering on yet another piece of moving parts that can go wrong.

I also started thinking about SRE roles and how they’ve been proliferating over the past few years. It makes sense—a lot of what we do as developers and platform engineers falls under the realm of “site reliability engineering.” But is it really necessary to have dedicated SREs for every project? I think there’s a balance that can be struck between developer and ops responsibilities.

As much as Kubernetes has simplified many aspects of deploying applications, it’s also pushed us into a realm where the learning curve is steeper than ever. We need tools like Backstage to help navigate this complexity, but we shouldn’t let them become crutches that hide the real issues.

The other day, I had a conversation with one of our developers who mentioned feeling overwhelmed by all the new technologies and tools. It’s not just them; it’s something I’ve been wrestling with too. There are so many shiny things to explore, but sometimes the simplest solution might be the best one.

In the end, today’s work on eBPF reminded me that while Kubernetes is a powerful tool, we shouldn’t lose sight of why we use it in the first place. We need to balance its complexity with the ease and reliability it brings. Maybe choosing “boring” technologies sometimes isn’t such a bad idea.