$ cat post/kubernetes-complexity-fatigue-and-the-search-for-simplicity.md

Kubernetes Complexity Fatigue and the Search for Simplicity


September 13, 2021. The world feels a bit different today. Just over a year ago, I was knee-deep in Kubernetes cluster management, dealing with endless YAML files, complex configurations, and the occasional misbehaving pod. Fast forward to now, and it seems like everyone is talking about platform engineering and internal developer portals. But under all that, the underlying pain points remain.

I spent last week digging through a particularly gnarly issue with our Kubernetes setup. It wasn’t just one pod going south; several pods across multiple namespaces started crashing in quick succession. My first instinct was to look at the logs, which were cryptic and unhelpful. Then I remembered, “Oh yeah, eBPF.” Maybe it’s time to dig into that.

The Journey Back to eBPF

For those who haven’t heard of it, eBPF stands for Extended Berkeley Packet Filter. It’s a relatively new feature in Linux kernel 4.16 and above, which allows for hooking into the kernel at various points without needing root permissions or modifying any code. This makes it incredibly powerful for debugging and performance analysis.

I spent hours setting up bpftrace, a tool that uses eBPF to trace system behavior. It’s like having a superpower to peek inside the kernel, but with the added benefit of not breaking anything. After an initial round of tracing, I could see some patterns in how memory usage and network traffic were correlating with pod crashes.

But here’s the thing: eBPF is still tricky. Setting it up correctly takes time, and understanding all its nuances requires a deep dive into both kernel programming and system behavior. This isn’t something you can just install a plugin for; it’s raw power and complexity rolled into one.

The Search for Simplicity

As I wrestled with eBPF, my mind kept drifting back to the days when Kubernetes seemed simpler. Back then, we used Helm charts and Tiller (remember that?). Now, everything feels like an upgrade or a rewrite, and it’s easy to feel overwhelmed by all the moving parts.

But there’s also something about the current state of Kubernetes that’s intriguing. With ArgoCD and Flux GitOps maturing, we’re starting to see more automation around deployments and configurations. These tools can help us manage complexity by versioning our cluster states and rolling out changes with minimal disruption.

The Framework Laptop

While I was debugging pods, something else caught my attention—Hacker News had an article about “The Framework is the most exciting laptop I’ve used.” Now, laptops are a bit outside of my immediate domain, but it got me thinking. If a new device can claim to be better than everyone’s favorite tech, what does that say about our industry’s constant quest for the latest and greatest?

It made me reflect on how much we as engineers rely on new tools and gadgets. Sure, the Framework might offer some cool features, but do they justify the price? And more importantly, are we becoming too reliant on these shiny toys instead of building robust infrastructure that stands the test of time?

The Real Work

Back to the issue at hand. After a few days of tracing and debugging, I finally nailed down the root cause: memory pressure. One of our services was eating up all available RAM, causing cascading failures in other pods. With this insight, we deployed some memory cgroups and managed to stabilize things.

Reflecting on this experience, it’s clear that while Kubernetes has its challenges, it also offers immense power for managing complex workloads. The complexity fatigue is real, but so are the opportunities to build something truly robust and scalable.

As I sit here writing this, I’m reminded of how much there is to learn and improve upon in platform engineering. But at least now, when someone asks why we use Kubernetes, I can point to it as a way to handle the complex needs of modern applications—and not just because it’s the new shiny thing.


This was my reality check for September 13, 2021. The tech world continues to evolve, and while there are moments of frustration, there’s also a lot of progress and excitement.