$ cat post/uptime-of-nine-years-/-the-incident-taught-us-the-most-/-the-container-exited.md

06JUN16

uptime of nine years / the incident taught us the most / the container exited

Title: Docker Conundrum

June 6, 2016 was a Thursday, just like any other day. But in the tech world, it felt like the container wars had reached a fever pitch. I found myself knee-deep in a battle that seemed to be raging around me—Kubernetes and Docker were duking it out for supremacy, each with its own set of supporters. At work, we were still using Docker, but whispers about Kubernetes starting to gain traction were everywhere.

The morning started off like any other day. I had a meeting at 10 AM, but I figured I’d spend some time debugging our container orchestration setup. We were using a mix of Docker Swarm and Ansible for deployment, which was working reasonably well… most of the time. Today, however, was different.

I opened up my terminal to find that we had a minor issue: one of our services wasn’t coming up properly. After a few moments of frustration, I remembered that this particular service was using Docker Compose. I ran docker-compose up -d and watched it spin up the containers as expected. But when I checked our logging system (yes, still using Graylog at the time), the logs were not showing up for this service.

That’s when the real work began. I started digging through the logs, trying to figure out where the disconnect was. I realized that the problem might lie in how Docker Compose interacts with Docker Swarm. Docker Compose is great for local development and small-scale deployments, but it wasn’t designed for a large, distributed environment like ours.

I spent the next few hours researching both technologies and their interactions. It became clear that we needed to transition from Docker Compose to something more robust—enter Kubernetes (k8s). Kubernetes promised much: auto-scaling, self-healing, and better orchestration of containers across nodes. But as I delved deeper into k8s, the complexity seemed to outweigh its benefits for our current setup.

I remember sitting in a meeting with my team discussing whether we should migrate or stick with what was working (albeit with some issues). The conversation was intense; people were passionate about their choices. Some argued that Kubernetes would solve all our problems, while others thought it might introduce more complexity than necessary. I found myself torn.

That’s when the news came in: Microsoft announced its acquisition of LinkedIn for $26 billion. The deal seemed to capture a moment—a significant consolidation in tech—and reminded me that change was inevitable. Meanwhile, the serverless hype continued, and I couldn’t help but wonder how that might impact our infrastructure decisions down the line.

In the end, we decided to stick with Docker Swarm and refine our Ansible scripts to better handle the orchestration. The process wasn’t easy, but it felt right given our current environment. We learned a lot about the limitations of our existing setup and grew closer as a team in addressing these challenges.

Looking back, that day was just one small step in our ongoing journey to build more resilient infrastructure. But it was also a reminder of the rapid pace of change in tech, where what worked today might not work tomorrow. And so, we soldiered on, debugging, learning, and adapting as needed.