$ cat post/august-2022:-a-month-of-infrastructure-fights-and-finops-challenges.md
August 2022: A Month of Infrastructure Fights and FinOps Challenges
August 2022 was a month where the tech world seemed to be boiling over with change. It started off with a bang as ChatGPT took the internet by storm, sparking an AI/ML infrastructure explosion. Meanwhile, platform engineering continued its march towards mainstream acceptance, and the CNCF landscape became even more overwhelming. WebAssembly was creeping into server-side development, while developer experience (DX) had become a discipline in itself. FinOps and cloud cost pressure were also at the forefront of many discussions.
One day, I found myself wrestling with an issue that perfectly encapsulated the challenges we face every day as platform engineers. We had a mission-critical application running on AWS, and for some reason, our production projects got suspended during a routine maintenance window at 1am on Saturday. It turned out to be a permissions issue—a simple misconfiguration in the CloudFormation template. While this might seem trivial, it highlights how easy it is to make mistakes when managing cloud resources.
This experience brought back memories of DORA metrics, which we had been working hard to implement across our teams. The incident triggered a flurry of discussions about how we could have prevented such an issue and how we can continuously improve our DevOps processes. We started brainstorming ways to automate more aspects of our CI/CD pipelines to reduce human error. It was a stark reminder that no matter how much automation you have in place, human oversight is still critical.
During one of these discussions, someone brought up the topic of FinOps and cloud cost pressure. Our team had been trying to get a better handle on our spend and find ways to optimize costs without compromising performance. We were exploring various tools like AWS Budgets and Cost Explorer to keep an eye on our bills. It’s funny how quickly you can go from being a developer focused solely on code to becoming a finance analyst managing budgets and optimizing resources.
The concept of using one big server instead of many smaller ones (as discussed in the HN post “Use one big server”) also caught my attention. While it seemed like an interesting idea, we needed to balance performance with cost efficiency. In our case, running multiple services on a single machine wasn’t feasible due to the diverse workloads and dependencies. However, I couldn’t help but think about how this approach might be applicable in certain scenarios where resource utilization can be maximized.
Another day, while debugging an application that was experiencing unexpected performance issues, I stumbled upon WebAssembly (Wasm). We had been experimenting with Wasm for server-side applications and found it to be a fascinating technology. It allowed us to run compiled binaries in the browser or on the server, which could potentially improve performance and reduce network overhead. However, integrating Wasm into our existing infrastructure was no small feat. We had to deal with issues like runtime compatibility, security considerations, and how to manage dependencies.
As I reflected on these experiences, it struck me that despite all the advancements and new technologies, the core challenges of platform engineering remain constant: managing complexity, optimizing performance, ensuring reliability, and keeping costs in check. The era we live in is one where we must constantly adapt and innovate while staying grounded in the fundamentals.
August 2022 was a month full of surprises and learnings, both big and small. From dealing with unexpected production issues to exploring new technologies like Wasm, it felt like there wasn’t a moment when my mind wasn’t racing with thoughts about how we could improve our platform engineering practices. It’s moments like these that remind me why I love this field—there’s always something new to discover and tackle.
Until next time, Brandon