Debugging the LLM Layer Cake

Merry Christmas—or whatever day it is that’s left this week. The calendar says December 25th, but by now, everyone should be in a festive mood, celebrating or at least trying to avoid the post-holiday blues. Over here on the platform engineering front, we’ve been dealing with the usual suspects: AI/LLM infrastructure explosions, FinOps pressures, and a seemingly endless stream of new tech trends.

I remember when ChatGPT first hit the scene, it was like everyone decided to get into AI all at once. Now, platforms like Gemini are taking things in exciting but also somewhat bewildering directions. I mean, Figma and Adobe deciding against their merger? That’s got to be a sign that even big tech has its doubts.

But let’s talk about something more real: the layer cake of LLMs we’ve been building. You know, those fancy new models that promise to do everything from writing code to drafting legal documents, all while being hosted on our platform. As an engineering manager and platform engineer, it’s been a wild ride trying to figure out how to support these beasts without breaking the bank.

The AI Layer Cake

Imagine you’re stacking cookies in a box: LLMs are like those oversized chocolate chip ones that almost fall apart when you try to open them. Each layer has its unique flavors and textures, but they all need to work together seamlessly to make the whole stack stable.

For us, it started with the base layer: serving requests from these models using serverless functions. We chose AWS Lambda for its ease of use, but as we scaled up, we realized that just wasn’t cutting it. The cold start times were too long, and the cost was getting out of control. Enter WebAssembly (Wasm) on the server side—now there’s a tool that can help us break free from the cloud provider’s constraints.

Wasm and the New Frontier

WebAssembly is like adding a new ingredient to your cookie recipe: it allows you to run compiled code in the browser or on the server, making it incredibly versatile. We started experimenting with running parts of our LLM serving functions directly in the browser using Wasm, which drastically cut down on cold start times and reduced latency.

But here’s where things got tricky: we needed a way to offload requests from the browser back to the server efficiently without losing performance. That’s when I found myself deep in a debate about whether we should use something like Deno for our Wasm runtime or stick with Node.js. The argument was as much about developer experience as it was about technical feasibility.

Developer Experience Matters

At the end of the day, it comes down to making sure developers can get their work done quickly and efficiently. We decided that keeping things consistent across teams by sticking with Node.js for now would be better than forcing everyone to learn Deno just because we wanted a cool new technology. But don’t worry; we’re definitely exploring hybrid approaches where Wasm shines.

Debugging the Stack

Debugging an LLM stack is like trying to find a needle in a haystack, and it’s not always easy. One particularly frustrating day, I spent hours tracing through logs, only to realize that the issue was due to a simple misconfiguration in our Wasm runtime. Once fixed, everything worked as expected—a reminder that sometimes the solution is right under your nose.

FinOps and Cost Control

And of course, we can’t forget about FinOps. As more teams join us on this LLM journey, costs are starting to pile up. We’ve had some intense discussions about how best to manage our budgets without compromising on performance or user experience. DORA metrics have become a part of our regular check-ins, helping us stay focused on continuous improvement.

Wrapping It Up

So there you have it—another day in the life of platform engineering during the LLM boom. We’ve got our cookies stacked up, but we’re not done yet. There’s always more to learn and debug, more ways to optimize, and more technologies to explore. But hey, that’s part of the fun.

As I sign off for this Christmas Eve, wishing you all a happy holiday season, remember: the tech world is full of challenges, but it’s also full of opportunities to grow and learn. Happy coding!

[Happy Holidays,] Brandon