$ cat post/cron-job-i-forgot-/-a-certificate-expired-there-/-the-container-exited.md

06FEB17

cron job I forgot / a certificate expired there / the container exited

Title: February 6, 2017 – Kubernetes Reigns, Helm Emerges, and I Get Bit by a Bug

February 6, 2017. Kubernetes was still hot off the press. The container wars were just heating up as we transitioned from Mesos, Swarm, and Docker’s exec to this young, energetic new player. I had been diving deep into Kubernetes in the weeks leading up to this date, but today felt like a milestone for me and my team.

Kubernetes was still mostly about getting stuff running inside a cluster. We were dealing with version 0.17, where things were moving so fast that the documentation barely stayed relevant from one day to the next. But I was hooked. The concept of self-healing, stateless services that could be scaled up or down without manual intervention appealed to my ops brain in ways Docker and Mesos never did.

Today, I spent a good portion of the morning trying to get Helm set up for our cluster. For those who aren’t familiar, Helm is essentially a package manager for Kubernetes. It allows you to define applications as a combination of Kubernetes manifests plus some additional information that describes how they should be installed or updated. I was excited because it promised to make our life so much easier—no more manually configuring every single pod and service.

By mid-morning, after wrestling with the setup instructions and finally getting Helm running, I thought we were in business. But as soon as I tried to install a simple application, disaster struck. The helm install command returned an error: “Error: timed out waiting for the condition”. It seemed like some kind of timeout issue, but that wasn’t entirely it.

I spent the next few hours digging through logs and tracing dependencies between the Helm server, Tiller (the Helm agent), and our Kubernetes cluster. I even fired up Wireshark to inspect network packets for clues—anything to figure out what was going wrong.

It turned out to be a DNS issue, something simple that had eluded me in my initial troubleshooting. In our local development environment, we were using the local DNS resolver which maps all hostnames to 127.0.0.1. But on the production cluster, Tiller was resolving these hostnames differently. Once I adjusted the Helm server’s configuration to use a proper external DNS service and ensured it could resolve our internal domain names correctly, everything fell into place.

This experience taught me a valuable lesson about the importance of understanding your setup thoroughly. The rush to adopt new tools can sometimes lead us down rabbit holes that ignore fundamental infrastructure details. In this case, the Helm issue was a mere symptom of a broader problem with how our DNS was configured across environments.

Around lunchtime, I took a step back and reflected on the day. Kubernetes and its surrounding ecosystem were evolving so rapidly that it felt like we were always chasing the latest version, trying to keep up with best practices before they became outdated. But despite all the hurdles, there was an undeniable excitement in building something new and more scalable.

As I headed out for a late lunch, I couldn’t help but think about how far we had come since those early days of Mesos and Docker Swarm. Kubernetes wasn’t just a better way to manage containers; it represented a shift in how teams thought about infrastructure itself. And Helm was going to play an increasingly important role in making that shift smoother for everyone.

On this February 6, 2017, I felt like we were at the cusp of something big—something that would change how applications were deployed and managed across entire organizations. The future looked bright, but the present was filled with bugs and challenges. And that’s what makes it all so exciting.