$ cat post/make-install-complete-/-we-never-did-fix-that-bug-/-a-segfault-in-time.md

09JUL18

make install complete / we never did fix that bug / a segfault in time

Title: July 9, 2018 - Kubernetes Dominance & the Battle for DevOps Mindshare

On a typical Tuesday in early summer 2018, I found myself at my desk, staring intently at a cluster of Kubernetes pods spinning up and down like a group of stressed-out circus performers. The day was unremarkable except for the fact that we had just transitioned to using Helm for our deployments—our first step into the world of application release orchestration.

Kubernetes (or K8s, as I prefer to call it) had been in our infrastructure long enough now that it felt like a natural part of the system. But under the hood, there was still a lot of hand-wringing and debate around best practices. “Should we use StatefulSets for stateful applications?” “Is rolling updates really better than rolling restarts?” These were just some of the questions we wrestled with daily.

One particularly frustrating morning, I found myself in the middle of an outage that had our production cluster on its knees. Pods were failing to start up properly, and it turned out that a recent update to Helm had inadvertently broken one of our services. The pain of this issue was palpable; we lost several hours debugging what should have been a straightforward upgrade.

This led us to question whether Helm was the right choice for us. While it provided an easy way to package and deploy applications, its complexity seemed to outweigh its benefits. We started exploring alternative approaches, like using Kustomize for templating, but found that even these tools had their own quirks.

Around this time, Istio emerged as a potential savior. It promised not just service mesh capabilities but also observability and security enhancements. The idea of an all-in-one solution was appealing, but we were wary of adding another layer to our infrastructure stack. The learning curve for Istio seemed steep compared to what we already had.

In the midst of this chaos, I stumbled upon a post by a maintainer discussing his decision to step down as project lead. It read: “I’m basically giving myself a permanent vacation from being BDFL.” The honesty in that statement resonated with me. Sometimes, stepping back and reassessing your priorities is necessary, even if it means taking a break.

Back at work, we continued our journey with Kubernetes, but now with more awareness of the challenges ahead. We started embracing GitOps principles, which seemed to offer a path forward for managing our infrastructure and applications in a more automated and repeatable manner.

As I look back on this period, it’s clear that 2018 was a critical year for DevOps tooling. Kubernetes had established itself as the de facto standard for container orchestration, but there were still many questions around how to use it effectively. Helm provided an easy entry point into the world of application release automation, while Istio promised a future where services could be managed with more sophisticated policies.

But amidst all this technology, I found myself reflecting on simpler things too. What was our end goal? Were we getting lost in the complexity of tools or staying focused on delivering value to our users?

With these thoughts swirling around in my head, I realized that sometimes, the most important thing is not the tool but how you use it. We needed to find a balance between leveraging the latest technologies and keeping things simple enough for us to understand.

And so, as July 9th came and went, I moved on from another frustrating day into a world where Kubernetes was firmly entrenched in our infrastructure, along with a newfound appreciation for GitOps and a healthy skepticism towards over-engineering solutions.