uptime of nine years / I read the RFC again / the pod restarted

Title: A Month in Review: When GitOps Wasn’t Quite Git-Fu

November 18, 2019 was a day of mixed emotions. On one hand, we were knee-deep into the holiday shopping season and wrapping up another year of hard work. On the other hand, our internal developer portal project—Backstage—was hitting some bumps in the road. It’s been about three months since we launched it, and while it had its share of success stories, we faced a few challenges that brought us back to reality.

Backstage: The Hype Versus Reality

Backstage was supposed to be our internal developer hub. It promised everything from API documentation to service dashboards. We were all excited about the idea, but as with any big project, there were growing pains. One particular issue stood out: managing and updating Kubernetes manifests across multiple environments.

Our hope was that Backstage could automate this process through a GitOps approach, where changes are made in a source control system, and then applied to our cluster via tools like ArgoCD or Flux. The theory was sound—just write the manifest once, let it be versioned, and have it sync with your cluster. In practice? Not so much.

Kubernetes Complexity Fatigue

As I sat down to debug some of these issues, I couldn’t help but feel a twinge of frustration. We had multiple clusters with different requirements, and our manifests were getting messy fast. The YAML files were sprawling and difficult to maintain. The reality was that writing manifests is not just about syntax; it’s also about understanding the state of your cluster at any given moment.

One night, I stayed late trying to get a simple service deployed correctly using ArgoCD. The process felt like magic when everything worked, but debugging a misbehaving sync job was another story entirely. It’s one thing to have a deployment failure with a clear error message, and quite another to sit through a 15-minute sync that just doesn’t seem to be working as expected.

The Slack Diatribe

To make matters worse, around this time, there were some internal discussions about the value of modern tools versus traditional methods. A colleague mentioned in passing, “Why bother with Backstage when we can just use kubectl?” This was a sentiment I couldn’t fully dismiss. While Backstage and GitOps are powerful, they require a certain level of discipline to maintain.

The next morning, as I scrolled through Hacker News, the Slack WYSIWYG debacle caught my eye. It resonated with me because it felt like we were facing similar challenges—hyped tools that might not live up to their promises if you didn’t manage them properly. In both cases, it was about finding a balance between modern convenience and practicality.

The Lessons Learned

By the end of the month, I realized that while GitOps is undoubtedly powerful, it’s also complex. We needed to be more disciplined in our approach, ensuring that every change had clear reasoning behind it. This meant investing time in writing better documentation and perhaps even creating a set of best practices for our team.

In retrospect, this experience was a valuable learning curve. It taught us the importance of humility when adopting new technologies and the need for constant reassessment of tools and processes. After all, in ops and infrastructure work, sometimes the simplest solution is the most effective one.

Looking Ahead

As we look ahead to 2020, I’m excited about what’s on the horizon—eBPF, more advanced GitOps tools like Flux 2, and the ongoing evolution of Kubernetes. But with these advancements come new challenges, and we must remain vigilant in our approach. The key is finding that sweet spot between innovation and practicality.

So here’s to another year of hard work and learning. May our developer portals (and other projects) continue to grow stronger and more reliable.