$ cat post/net-split-in-the-night-/-we-scaled-it-past-what-it-knew-/-it-boots-from-the-past.md

20JUN16

net split in the night / we scaled it past what it knew / it boots from the past

Title: June 2016 - Kubernetes is Winning But What About My Legacy Monitoring?

June 20, 2016. The month the world seemed to pause as one of its most contentious votes was about to be cast.

I found myself reflecting on my own tech journey over a couple of cups of coffee that morning. As I sipped, the buzz from Hacker News stories like “UK votes to leave EU” and “Microsoft acquires LinkedIn for $26B” filled my head with thoughts of big changes ahead. Meanwhile, in the world of DevOps and infrastructure, Kubernetes was quietly gathering momentum, slowly overtaking Docker Swarm as the de facto standard for container orchestration.

But amidst all these headlines and buzzwords, I couldn’t shake off the nagging feeling that something critical wasn’t getting enough attention: my legacy monitoring stack.

You see, a few years back when Docker first started to gain traction in our organization, I was tasked with setting up an Observability framework. Back then, we were using Nagios for alerting and Grafana for visualization. The setup worked but felt like a Frankenstein’s monster. It lacked the scalability and flexibility that modern DevOps practices required.

Now, as Kubernetes started to take hold, I found myself faced with a dilemma: how do you effectively monitor a cluster of containers without breaking the bank? Sure, Prometheus was becoming popular, but integrating it with our existing stack felt like reinventing the wheel. Plus, there were all these new tools and concepts floating around – GitOps, Kubernetes-native monitoring solutions, and more.

I sat down to map out my strategy. We had a few small apps running in containers already, so I started by setting up Prometheus as a service mesh sidecar. This would give us visibility into the inner workings of our services without having to change much on the application side. But then came the question: should we switch over completely from Nagios and Grafana?

The thought was daunting. It wasn’t just about replacing old tools; it was about changing how my team operated. I started a series of proof-of-concepts, testing various integrations with Prometheus and Grafana. Each day, as I dug deeper into the configurations, the weight of this decision settled heavier on me.

Then came the weekend. I spent hours staring at logs and metrics trying to understand where we were falling short. Our network traffic was still unpredictable due to the new architecture, and every time something went wrong, it felt like starting from scratch again.

On Monday, the team gathered for our regular stand-up. I laid out my findings and potential solutions. The room was quiet as everyone absorbed what I had just said. “Let’s do this,” I announced, trying to keep my voice steady. “We need to move towards a more modern monitoring stack, but we’ll take it step by step.”

There was a mix of nods and murmurs in response. Some were excited about the change, others worried about the complexity. We agreed that the migration would be gradual, with small wins along the way.

As I left the meeting, my mind was still racing. I knew this transition wouldn’t be easy, but it was necessary. The world of DevOps was moving fast, and if we didn’t adapt, our monitoring efforts could quickly become outdated.

Looking back, that day in June 2016 marked a turning point for me and my team. It wasn’t just about changing tools; it was about embracing change and continuously improving our approach to DevOps. And while the path forward was uncertain, I felt confident that we were taking the right steps towards a more scalable and flexible monitoring solution.

That’s how June 2016 felt in tech back then. A blend of big decisions and small victories, all wrapped up in an ever-evolving landscape of tools and standards.