$ cat post/the-prod-deploy-froze-/-the-orchestrator-chose-wrong-/-i-kept-the-old-box.md
the prod deploy froze / the orchestrator chose wrong / I kept the old box
Title: Kubernetes Conundrums: A Journey into the Container Wars
July 24, 2017 was just another day in my journey as an engineer working on platform infrastructure. The container wars were heating up like never before, and we found ourselves squarely in the middle of it all.
Kubernetes Takes Center Stage
Kubernetes had taken off with a vengeance since its open-source release by Google back in 2014. By 2017, it was clear that container orchestration wasn’t just about running Docker containers; it was about orchestrating your entire application stack, from deployment to scaling and even rolling updates. Our platform team had been following the Kubernetes developments closely and were starting to experiment with using it in our production environments.
The Helm of Our Journey
One day, we decided to take the plunge and start managing our deployments with Helm. For those not familiar, Helm is a package manager for Kubernetes that allows you to manage complex application deployments across multiple clusters. It’s like Docker, but for the configuration files of your applications—pretty neat!
We set up a few charts and began installing them in our development environment. Everything seemed rosy at first glance. But as we delved deeper, we realized Helm was just another layer to master. The learning curve was steep, and the documentation wasn’t exactly user-friendly.
Istio: A Side Venture
While we were busy with Helm, a new player emerged: Istio. This service mesh tool promised to help us manage service-to-service communication in our microservices architecture. At first glance, it seemed like a no-brainer. But as I dove into the documentation and began experimenting, I quickly realized that this was going to be another beast to tame.
Serverless Hype: A Siren’s Call
Meanwhile, serverless and Lambda were everywhere. The promise of function-as-a-service (FaaS) was enticing—no servers to manage, just pay for what you use. But as a platform engineer, I couldn’t help but wonder if we were jumping on the hype train too quickly. Our current infrastructure was built around containers, and making the transition would require significant changes.
Terraform 0.x: Building Bridges
On the infrastructure front, we were still using Terraform 0.x for provisioning our clusters and other resources. The versioning system in place wasn’t exactly user-friendly, but it got the job done. We spent a lot of time writing and debugging Terraform scripts to ensure that every resource was correctly configured.
GitOps: A New Frontier
The term “GitOps” was starting to gain traction as well. It seemed like everyone wanted their infrastructure to be version-controlled in Git. While I appreciated the idea, we had some reservations about how practical it would be for our team. We were already dealing with complex workflows and scripts; adding another layer of GitOps might just complicate things further.
Debugging Reality
One day, we ran into a particularly vexing issue: our Kubernetes cluster was randomly crashing pods. After hours of debugging, we realized that the problem lay in how we had configured our Node Resources. It turned out that one of our nodes was under-provisioned, and Kubernetes couldn’t handle the load properly.
Prometheus + Grafana: Monitoring Matters
Speaking of monitoring, we were in the process of transitioning from Nagios to Prometheus and Grafana. While Nagios was reliable, it simply wasn’t cutting it for modern applications. The shift to Prometheus and Grafana was challenging, but ultimately rewarding as we gained more visibility into our infrastructure.
Reflecting on the Month
As I look back at July 2017, it feels like we were in a constant state of transition. Kubernetes, Helm, Istio, Serverless, Terraform, GitOps—each new tool promised to revolutionize how we build and manage applications. But with each new technology came its own set of challenges.
The key takeaway for me was that while these tools can solve specific problems, they also add complexity. As platform engineers, our job is not just to adopt the latest tech but to evaluate whether it truly aligns with our goals and existing infrastructure. Sometimes, the simplest solutions are the best ones.
That’s my take on July 2017—full of excitement, challenges, and a whole lot of Kubernetes.