$ cat post/a-segfault-at-three-/-the-index-was-never-rebuilt-/-the-deploy-receipt.md

08JUN20

a segfault at three / the index was never rebuilt / the deploy receipt

Title: Kubernetes Complexity Fatigue: A Reality Check

June 8, 2020. It’s a typical Tuesday morning, and I’m sipping my coffee as I look over the endless to-do list that seems to grow by the hour. Today, though, something feels different. Maybe it’s the recent news about Apple switching to their own processors or perhaps it’s the flurry of articles on internal developer portals like Backstage—whichever way you cut it, there’s a growing sense of complexity fatigue in our Kubernetes landscape.

The Weekly Doldrums

This week has been a bit of a doldrums. I’ve spent most of my time dealing with yet another incident where one of our services hit some kind of resource bottleneck and caused a cascade effect across the cluster. It’s the same old story, really: an application that was fine in dev but didn’t scale as well as we thought it would in production. Kubernetes can be a beautiful tool when you have someone who knows how to use it, but when things start breaking down, it’s like trying to navigate through a maze with no map.

Backstage and Internal Portals

Speaking of which, I’ve been doing some reading on internal developer portals. The idea of having a central hub where developers can easily access everything they need—docs, tools, environments—is appealing. Backstage is one such tool that seems promising. It’s not just about the technical aspects; it’s also about making life easier for everyone involved in the development process. But as I read through the documentation, I couldn’t help but feel a twinge of skepticism. Will this really be enough to simplify our workflows or will we just add another layer of complexity?

The Redis Saga

Over on Hacker News, there was a post titled “I Am Deleting the Blog.” It’s an interesting read about someone who found that their blog traffic had been hijacked by a botnet and decided to take it down. I can’t help but think back to our own internal struggles with maintaining reliable infrastructure for monitoring and logging. Redis has always seemed like such a simple solution—until something goes wrong, of course. The idea that someone could exploit it in such a way makes me realize how much we rely on these tools without fully understanding their intricacies.

COVID-19 and Remote Work

The world outside is changing rapidly too. With the ongoing pandemic driving remote work adoption, our infrastructure team has had to scale up quickly. We’re dealing with increased traffic and trying to ensure that everyone can still do their job effectively from home. It’s a balancing act between making sure we have enough resources while also avoiding unnecessary costs. The Kubernetes complexity fatigue is palpable when you’re managing an ever-growing fleet of containers and services.

SRE vs DevOps

As the days go by, I find myself thinking more and more about the role of Site Reliability Engineering (SRE) versus traditional DevOps practices. The lines between these roles are blurring, but they’re not exactly overlapping either. Our team is starting to see the benefits of having dedicated SREs who can focus on reliability and performance, but at what cost? Do we risk creating a siloed environment where developers and operations teams don’t communicate as well?

eBPF: The Next Big Thing?

On that note, eBPF has been gaining traction. It’s a powerful tool for monitoring and modifying the kernel in real-time without needing to reboot or patch anything. I’ve spent some time looking into it, but so far, it feels like more of a niche solution for specific use cases. While it could be game-changing, it’s not something that’s going to solve all our Kubernetes woes overnight.

Reflections and Resolutions

As the week draws to a close, I find myself reflecting on what we’ve accomplished and where we still have room for improvement. The challenge isn’t just about technology; it’s also about culture and communication within our teams. We need to find ways to make things simpler while also ensuring that everyone has the tools and knowledge they need to do their jobs effectively.

In the coming weeks, I hope to take a step back and reassess our approach. Maybe we can start small with some of these new tools like Backstage or eBPF, but only if it genuinely helps us move forward rather than adding another layer of complexity.

For now, though, it’s just another day in the life of a platform engineer dealing with Kubernetes complexities and the ever-changing tech landscape. But hey, we’ll get through it—I hope.