$ cat post/a-merge-conflict-stays-/-i-parsed-the-pcap-for-hours-/-i-strace-the-memory.md
a merge conflict stays / I parsed the pcap for hours / I strace the memory
Title: July 31, 2023: A Day in the Life of a Platform Engineer
July 31st dawned with that peculiar feeling where you know tech is moving faster than ever. I woke up to more emails about LLaMA 2 and its impressive performance, alongside news about Kevin Mitnick’s passing. It was fitting; here we are, grappling with the new era of AI without fully understanding all the implications.
The Morning Audit
First stop: the platform audit meeting. Our team had been working on integrating WebAssembly (Wasm) into our server-side infrastructure to improve performance and flexibility. We’ve seen some promising results in certain microservices but hit a snag when we started deploying more complex functions.
One of the microservices, a custom-built financial calculator for real-time trading, was giving us fits. Despite being optimized for Wasm, it kept crashing intermittently. The stack traces pointed to weird heap allocation issues that didn’t make sense at first glance.
After some debugging and profiling, I realized we had hit an edge case in the garbage collection behavior of our runtime environment. It turned out that the trading logic was generating a lot of short-lived objects, which were causing GC pauses that eventually led to the service hanging. A tweak in the object allocation strategy fixed it, but the experience reminded me how complex these new runtimes can be.
Developer Experience
Post-meeting, I headed over to a brainstorm session with our developer experience (DX) team. We’ve been focusing on automating CI/CD pipelines and improving tooling for our developers. One of the big topics was whether we should switch from Jenkins to GitHub Actions. The pros and cons were hashed out, but the general consensus was that while GitHub Actions are simpler to set up, they lack some advanced features we currently rely on.
We also discussed how to better integrate static site generators like Jekyll or Gatsby into our workflow for developer blogs and documentation. There’s been a lot of talk about FinOps—financial operations—and ensuring that our DX doesn’t waste developer time on unnecessary tasks. It’s clear that as platform engineers, we need to be more conscious of the tools we choose.
Platform Engineering in the Age of AI
Later, I attended an impromptu discussion about the integration of AI/ML models into our platform. With LLaMA 2 and other large language models on the horizon, there’s a lot of interest in how they can be used to enhance our services. One of the points brought up was the challenge of deploying these models without increasing latency or reducing performance.
We’re also facing pressure from upper management to adopt DORA (DevOps Research and Assessment) metrics more fully. Our team is already pretty agile, but there’s always room for improvement. The question is: how do we ensure that our AI models are not just cutting-edge but also performant and reliable?
Reflecting on the Day
As I wrapped up my day, I couldn’t help but think about how far we’ve come in platform engineering. From the days of purely backend development to now integrating full stacks with frontends and backend services, Wasm, and AI models—everything is converging. The challenges are different, but the spirit of problem-solving remains the same.
Today was a mix of technical wins and reflections on where we stand. I’m looking forward to what July 31st will bring next year and how much more we’ll learn along the way.
That’s it for today. More to come in the days ahead.