$ cat post/stack-trace-in-the-log-/-the-interrupt-handler-failed-/-i-wrote-the-postmortem.md
stack trace in the log / the interrupt handler failed / I wrote the postmortem
Title: Kubernetes Wars: A Platform Engineer’s Take on April 2018
April 9, 2018. Kubernetes was already the winner in the container wars. But as a platform engineer, I found myself knee-deep in debates and discussions about how to best leverage this technology for our team.
We had just migrated some of our older services from Mesos to Kubernetes, and while it was smooth sailing, there were still plenty of challenges ahead. Helm was starting to gain traction, but we weren’t sure if it was the right tool yet. Istio was emerging as a potential solution for service mesh, and Envoy looked promising as an implementation. But where did that leave us?
The Helm Question
Helm was all the rage in the community, and everyone seemed to be advocating its use. “If you’re not using Helm, you’re not Kubernetes,” one developer said. Another argued, “We should stick with Kubernetes native charts for consistency.” I couldn’t help but wonder: what if we had chosen a different path? Would it have made more sense to wait for the ecosystem to stabilize?
Platform Engineering Conversations
As we delved deeper into these debates, platform engineering conversations started to take center stage. The idea was gaining traction that the role of platform engineers is not just about building and maintaining infrastructure but also about providing robust abstractions and services that simplify development for our teams.
In our case, this meant thinking beyond just Kubernetes clusters and charts; it meant designing a system where developers could focus on writing code without worrying too much about underlying infrastructure. We started toying with the idea of a managed Kubernetes service—something that abstracted away the complexity but still gave us the flexibility we needed.
Debugging Real-World Issues
One day, I found myself debugging an issue with a custom monitoring setup we had put in place using Prometheus and Grafana. The metrics were inconsistent between our staging environment and production, which made it hard to diagnose problems when they occurred. We needed something more reliable and consistent.
After a few days of digging through logs and code, I realized the inconsistency was due to differences in how the environments were configured. This highlighted for me how critical it is to have a standardized approach to infrastructure as code (IaC). Terraform 0.x was our go-to tool then, but we hadn’t fully embraced its potential.
The Zeitgeist
Walking through the office and reading Hacker News, I couldn’t help but feel that we were at an exciting crossroads. Apple’s open-sourcing of FoundationDB reminded me that sometimes it’s okay to explore new technologies, even if they’re not directly related to our current stack. Meanwhile, discussions around privacy-first DNS services like 1.1.1.1 showed that security and privacy are always on the minds of users.
The Google-Amazon debate over AI projects and Apple’s chip plans made me think about the long-term implications of technology decisions. Was I making choices that would impact our company in five or ten years? And what about the productivity hacks mentioned—were they just shiny new tools, or did they have lasting value?
Lessons Learned
As April 2018 came to a close, I found myself reflecting on the past few months. While Kubernetes had undoubtedly won, there was still much work to be done in making our platform truly user-friendly and reliable. We needed to think more about how we could abstract away complexity for our developers while ensuring robust monitoring and logging.
Platform engineering isn’t just about technology; it’s about creating an environment where teams can thrive. As I stepped back from the day-to-day, I realized that the journey of improving our infrastructure wasn’t just about adopting the latest tools but also about understanding what really mattered to our users.
That’s my take on April 2018—full of debates, challenges, and a lot of thinking about the future.