$ cat post/the-deploy-pipeline-/-i-git-bisect-to-old-code-/-the-stack-still-traces.md

the deploy pipeline / I git bisect to old code / the stack still traces


A Sudden Leap into Remote Work

April 27th, 2020 was just another day of working at my desk. The world was still going on with its business as usual, or so it seemed. Little did I know that within a few hours, everything would change. That Friday morning, we got an email: “Due to the current pandemic, all employees must work remotely until further notice.”

Shifting Gears

The transition from office to home was abrupt. My first challenge? Setting up my new workspace. The only thing worse than dealing with a noisy internet connection and trying to find a quiet corner is when your kids decide it’s playtime while you’re working on something important.

Then there were the tools I needed for work. GitOps became a crucial part of our platform engineering efforts, but managing Kubernetes clusters and services remotely required some adjustments. Tools like Backstage were great for creating internal developer portals, but they didn’t always integrate seamlessly with everything else. ArgoCD was a godsend, providing an easy way to sync configurations and updates across multiple clusters.

Debugging the Downtime

One of the first tasks I tackled was ensuring our infrastructure could handle the surge in demand due to remote work. We had a few services that were seeing increased load, and it was my job to figure out what was going on. Kubernetes complexity fatigue was real; how do you manage all those deployments without going insane?

I spent several sleepless nights tracking down issues with our monitoring tools. I ended up using Prometheus for metrics collection and Grafana for visualization. The learning curve was steep, but once everything clicked, it was worth it. Seeing the dashboard filled with data about our services gave me a sense of control amidst all the chaos.

The SRE Perspective

SRE roles were proliferating, and my responsibilities as an engineering manager began to shift more towards ensuring reliability and availability. I had to make sure that our applications didn’t just work today, but would continue to function even when things went haywire.

One of the most challenging parts was dealing with incidents in a remote-first environment. Communication became crucial; Slack and Zoom meetings were my lifelines during those times. We implemented a new incident response playbook, which helped streamline our process for identifying and resolving issues quickly.

Reflecting on the Times

The HN stories that month had their own unique flavor. The discussions around blind developers preparing for the future of tech resonated deeply with me. It’s a sobering thought to consider how technology can either enable or disable people, depending on its accessibility.

GitHub’s free plan for teams seemed like an incredible opportunity to collaborate openly and freely. But then again, we already had our own internal tools, so it felt more like a minor footnote rather than a seismic shift.

John Conway’s passing was a reminder that change is inevitable, both in technology and in life. It’s humbling to reflect on the impact he had on software development practices.

Looking Forward

As I write this, I find myself thinking about what comes next. The tech industry has shown incredible resilience during these times, but there are still challenges ahead. eBPF continues to gain traction for its performance and flexibility benefits, while Kubernetes complexity remains a hurdle that needs to be managed carefully.

The era of remote-first infra scaling is here to stay, at least for now. We’ll need to adapt our tools and practices to thrive in this new reality. And who knows, maybe one day I’ll wear stripes again (or not).

In the meantime, I’m just trying to keep the lights on and hope that tomorrow brings a little less chaos than today.