$ cat post/the-build-finally-passed-/-we-ran-out-of-inodes-first-/-the-repo-holds-it-all.md

the build finally passed / we ran out of inodes first / the repo holds it all


Title: Kubernetes Complexity Fatigue in a Pandemic


August 17, 2020 was just another day of waking up to the pandemic news cycle and then diving into my usual technical tasks. The headlines were filled with stories of Apple and Facebook battling over app stores and policies, but for me, it was all about Kubernetes clusters, eBPF, and trying to keep our internal developer portals running smoothly.

Today, I spent some time wrestling with a particularly stubborn issue in one of our Kubernetes clusters. It was a classic case where a deployment failed silently, making tracking down the problem much more challenging. Usually, my first step would be to check out the logs, but this time they were empty—like the cluster had just decided it preferred silence over giving me any clues.

I started by double-checking if anything had changed in our infrastructure scripts, but everything seemed in order. Then I remembered reading about eBPF recently and wondered if there might be something unusual going on at a lower level that wasn’t showing up in the logs. After some digging, I found a few articles suggesting that eBPF could be used to monitor and debug container networking issues.

I set up an eBPF program to track network traffic between my problematic pod and others within the cluster. It was a bit of a hack, just throwing together something quick, but it worked. The output showed that there were indeed some strange packets being dropped or handled differently from what I expected.

With this newfound insight, I could see where things went wrong in our deployment process. A new network policy had been added to isolate certain services for security reasons, and somehow my application hadn’t properly accounted for the change. Once I adjusted the configuration accordingly, everything started working smoothly again.

This experience made me reflect on the growing complexity of Kubernetes clusters as more and more teams onboard. It’s hard enough managing a cluster with just a few dozen pods; imagine what it’s like when you have hundreds or thousands! The challenge is finding tools that help us manage this complexity without overwhelming our engineers.

Speaking of which, I also spent some time setting up an internal developer portal using Backstage. As teams continue to scale and remote work becomes the norm, having a centralized hub for all our documentation, APIs, and infrastructure details has become more important than ever. The Backstage project is still maturing, but it’s showing promise in making this process easier.

One of the things that struck me most during these tasks was the contrast between the rapid changes happening outside our teams (like Apple kicking Fortnite off their App Store) and the steady, incremental progress we’re making internally. It’s easy to feel like we’re lagging behind when you see such dramatic news items, but I try not to get too hung up on it. What matters is that we keep pushing forward with the tasks at hand.

Reflecting back on this day, it made me realize how much we’ve come as a team and as an industry in just a few short years. From the excitement of eBPF to the practical challenges of running Kubernetes clusters, it’s been a journey. And while I might be feeling some Kubernetes complexity fatigue here and there, that’s part of the ongoing learning process.

As always, debugging these issues and setting up new tools is not just about solving problems but also about sharing knowledge and improving our processes for future projects. It’s a slow, steady grind, but one that’s deeply rewarding when everything finally works as expected—or at least more or less as expected.