$ cat post/a-patch-long-applied-/-the-health-check-always-lied-/-i-kept-the-old-box.md

a patch long applied / the health check always lied / I kept the old box


Title: November 12, 2012: Config Wars, Chaos Engineering, and My Day Job


November 12, 2012. I remember it like yesterday. It was a Monday morning in the thick of one of the most interesting times in tech history. The term “DevOps” was starting to emerge, with people arguing over whether Chef or Puppet should rule the land. Chaos Engineering was gaining traction at Netflix, and OpenStack had just launched, shaking up cloud computing. Meanwhile, I was deep into a battle with some of my own config management issues.

Let’s start where it all began: breakfast. I woke up to the usual buzz on Hacker News. The top stories were about voting machine hacking, consulting rates doubling for some lucky individuals, and a suite of icons. A quick scan showed that the “Config Management War” was still heating up. Chef and Puppet were duking it out like two old pros.

I went through my morning routine—grabbing my coffee, logging into Jenkins to check the CI jobs—and there they were: a few more failed builds due to some misconfigured server instances. My team had been using Chef for about a year now, but it was clear that we needed to improve our automation practices. The build failures were costing us valuable time and resources.

During one of my stand-up meetings, I heard another developer mention that they’d recently switched to Puppet, and their builds were much more stable. “Why don’t we just use Puppet instead?” someone suggested. It was a fair question, but changing the tooling mid-stream would be non-trivial. We had our Chef cookbooks and all the associated infrastructure set up, so it wasn’t like we could just switch over overnight.

As the day went on, I found myself deep in a code review for one of our services. The team was debating whether to use Puppet or Chef for a new service deployment. I proposed a third option: write custom scripts. “Why not?” I said. “It’s simple, it works, and we can still maintain the benefits of automation.”

My suggestion was met with some skepticism. “But Chef has all these community plugins and cookbooks,” one developer argued. “We should stick to something that has a proven track record.” Another added, “What if we run into issues? What do we do then?”

These were valid points, but I knew our current setup wasn’t ideal. We needed a more lightweight approach for smaller projects. I pointed out that custom scripts could be written and maintained with minimal overhead, and the code would stay in version control just like any other part of the application.

After a long discussion, we decided to give it a shot. The result? It worked better than expected. Our deployment times reduced by half, and our configuration errors dropped significantly. It wasn’t as “sexy” as Chef or Puppet, but it got the job done.

Later that afternoon, I caught up with a colleague who was working on some chaos engineering experiments at Netflix. He shared his experiences with the Chaos Monkey, which was still relatively new. The idea of intentionally breaking systems to test resilience was fascinating. We talked about how we could incorporate similar practices into our own infrastructure. It might not be as flashy as what Netflix was doing, but it would be more practical for our scale.

As I left the office that evening, my mind was still racing with thoughts about config management and chaos engineering. The industry was moving so fast; every day brought new challenges and opportunities. Config wars, while entertaining to read about on Hacker News, were ultimately a waste of time if they didn’t directly benefit the users or the system.

That night, I couldn’t help but think that 2012 was shaping up to be an incredible year for tech. It was full of exciting developments and tough decisions. The best part? It made my job as a platform engineer all the more interesting.