$ cat post/the-firewall-dropped-it-/-the-abstraction-leaked-everywhere-/-the-container-exited.md

15FEB21

the firewall dropped it / the abstraction leaked everywhere / the container exited

Title: February 15, 2021 - Embracing the Cloud: A Year of Remote Infrastructure and Platform Engineering

February 15, 2021. I woke up to the usual morning coffee and my inbox. But this time, it felt different. The world outside was in chaos, with people staying home due to a global pandemic. Inside, I was on another remote call from my living room. It’s hard not to think about how much life has changed since 2019.

One of the things that hasn’t changed is the work of platform engineering. This year felt like an acceleration of trends that were already in motion: SRE roles becoming more prominent, internal developer portals gaining traction with Backstage, and Kubernetes complexity leading to fatigue among some teams. But despite all this, I found myself thinking a lot about cloud infrastructure.

The Great Migration

Last year, we had been slowly moving our applications to the cloud. We started small, but as time went on, it became clear that we needed more than just a few servers in Azure or AWS. Our internal tools team had already built out a CI/CD pipeline using Jenkins and Docker, and now it was time to really take advantage of what the cloud could offer.

We began by migrating our database services. We moved from a traditional SQL setup to managed services like Azure Cosmos DB and Aurora. The performance gains were immediate. We also started using serverless functions for smaller tasks, which allowed us to scale without managing servers. But as we delved deeper into cloud services, I found myself grappling with the complexity.

Kubernetes Complexity Fatigue

Kubernetes was supposed to make our lives easier by abstracting away the complexities of container orchestration. Instead, it became a source of frustration. Every time we tried to update a deployment or troubleshoot an issue, I felt like we were fighting the system more than working with it. The community around Kubernetes was huge, but finding reliable information that actually solved our problems was like searching for a needle in a haystack.

One day, during a particularly difficult debugging session, I had a moment of clarity. Maybe the problem wasn’t Kubernetes itself, but how we were using it. We were overcomplicating things by trying to shoehorn everything into a single cluster. Perhaps breaking down our applications and deploying them more granularly would help. This led us to explore Argo CD and Flux for GitOps, which allowed us to manage multiple clusters with ease.

SRE Roles in Action

As we continued to move forward, the role of Site Reliability Engineers (SRE) became more pronounced. One of our SREs, John, took the lead on optimizing our load balancers and caching mechanisms. He helped reduce load times by 70%—a feat that would have been impossible without his deep understanding of both application performance and cloud services.

Another team member, Lisa, was instrumental in setting up monitoring and alerting systems. Her insights into proactive maintenance paid off when we faced a sudden surge in traffic due to an unexpected marketing campaign. Thanks to her hard work, we were able to predict and mitigate the impact before it became a real issue.

The Silver Lining

Despite the challenges, there was something liberating about working remotely. It allowed us to bring in talent from all over the world without being tied to a physical location. One of our recent hires, Sarah, is an incredible backend developer who worked out of her home office in Boston. She added a new dimension to our team and helped push us forward.

We also started experimenting with eBPF (extended Berkeley Packet Filter) for some of our network monitoring tasks. It was fascinating to see how we could gain insights into system performance without relying on traditional logging methods. However, the learning curve was steep, and it took a lot of trial and error before we got things right.

Looking Forward

As I sit here writing this, I can’t help but think about what lies ahead. The tech landscape is constantly evolving, and staying current with new tools and trends is crucial. But more importantly, the human side of engineering—collaboration, communication, and empathy—will continue to be at the heart of our success.

This year, 2021, has brought us closer together in unexpected ways. Whether it’s through shared struggles or the camaraderie of remote work, I’m excited to see where this journey takes us next.

It’s been a rollercoaster ride, but that’s what makes being an engineer so rewarding. There’s always something new to learn and challenges to overcome. Here’s to another year of growth and innovation.