$ cat post/the-floppy-disk-spun-/-the-thread-pool-was-too-shallow-/-the-service-persists.md

07FEB22

the floppy disk spun / the thread pool was too shallow / the service persists

Title: February Feels: A Platform Engineer’s Ramblings

February 7, 2022. Another day, another log line. But this one feels different somehow. The tech world is a whirlwind of change, and I find myself reflecting on the past year while looking ahead to what’s coming.

AI/LLM Infrastructure: ChatGPT’s Whirlpool

Since the dawn of 2022, the landscape has been dominated by the rise of large language models (LLMs). OpenAI’s ChatGPT was just a spark in January, but it quickly turned into a wildfire. Suddenly, every tech blog had an article about how AI would change everything. As platform engineers, we knew that infrastructure wasn’t going away; it’s just becoming more complex.

One of the biggest challenges has been figuring out how to support these models without breaking our existing infrastructure. The sheer compute and memory demands are insane. I spent a good chunk of January wrestling with how to scale our servers to handle both traditional workloads and LLMs concurrently. It’s not like we can just slap some GPUs on every server—there’s a lot more to it than that.

WebAssembly: Still Struggling

Speaking of scaling, WebAssembly (Wasm) has been a frustrating but fascinating project. I’ve been trying to get Wasm running in our server infrastructure for the past few months now. The promise is there—the ability to run compiled code outside the browser—but getting it to work seamlessly with our existing systems has been an uphill battle.

Every time I think we’ve got everything figured out, some minor change breaks something somewhere else. Debugging Wasm issues can be a nightmare. You’re essentially writing machine code in C++, and then you have to deal with all the nuances of running that on different platforms and environments. It’s like trying to thread a needle while blindfolded.

FinOps: The New Reality

FinOps is a buzzword that feels more real than ever. With cloud cost pressure being a constant concern, our team has been diving into Cost Management Tools (CMTs) more deeply than before. We’re now tracking every dollar spent on infrastructure and trying to optimize where we can.

One of the tools we’ve adopted is AWS Budgets, which allows us to set up alerts for when costs exceed certain thresholds. It’s a bit like budgeting your personal finances, but at scale. It’s not always easy to see the bigger picture when you’re deep in the weeds of code and servers, so having these tools helps keep us accountable.

DORA Metrics: The New Normal

DORA (DevOps Research and Assessment) metrics have become almost mandatory for teams like ours. We track our Lead Time Distribution (LTD), Deployment Frequency, Mean Time to Recovery (MTTR), and Change Failure Rate (CFR). It’s not just about shipping features; it’s about measuring the health of our development processes.

Recently, we had a bug that took us quite some time to fix because the code was too tightly coupled. The CF% for this particular issue was pretty high, which is something I’ve been thinking a lot about lately. We need to invest more in microservices and decoupling components so that when something does go wrong, it doesn’t cascade into other parts of the system.

Conclusion

As February turns to March, I’m left with a mixed bag of thoughts. The tech world is moving faster than ever before, but so are our responsibilities as engineers. We have to balance innovation and risk while keeping an eye on cost and maintainability. It’s a lot to handle, but that’s what makes it exciting.

Until next time,

Brandon Camenisch