$ cat post/first-commit-pushed-live-/-the-cluster-held-until-dawn-/-it-was-in-the-logs.md

first commit pushed live / the cluster held until dawn / it was in the logs


Title: AI Infusion: Debugging LLMs in the Era of ChatGPT


January 9, 2023, dawned like any other day. The tech world was abuzz with AI advancements post-ChatGPT. I started my morning routine, sipping coffee and glancing over Hacker News for the latest news. The top stories reflected a mix of personal anecdotes and technical innovations. But as I delved into the headlines, one story stood out: “Let’s build GPT: from scratch, in code, spelled out by Andrej Karpathy.” A familiar name, yes, but this was no ordinary project—it was an invitation to dive deep into building AI models.

I sat down with my laptop and opened up a text editor. The task before me wasn’t just about coding; it was about understanding the intricacies of language models from first principles. Andrej’s video explained the basics of transformers, attention mechanisms, and tokenization in a way that resonated deeply. I had always believed in the power of building things myself, but diving into an AI model felt like stepping into uncharted territory.

As I wrote my first lines of code, I encountered issues right away. The model didn’t behave as expected, spitting out gibberish instead of coherent text. This was a stark reminder that even with years of experience in platform engineering and web development, diving into the heart of AI models required an entirely different mindset.

I spent hours debugging, adjusting hyperparameters, and tweaking layers. The frustration mounted when my model’s performance didn’t improve despite all the tuning I did. It felt like every tweak was met with another roadblock. But then, one night, as I was staring at the code trying to debug a particularly stubborn part, an epiphany hit me: the issue wasn’t in the architecture or the data; it was in my assumptions.

I had assumed that certain parameters were fine without testing them thoroughly. This realization led me to rethink every assumption and test everything more rigorously. With each iteration, the model’s performance improved, and as I watched the text generation get better, a sense of satisfaction washed over me.

This experience taught me several valuable lessons:

  1. Attention to Detail: Small changes can have significant impacts.
  2. Iterative Debugging: A systematic approach is essential when dealing with complex models.
  3. Understanding Assumptions: Even in AI, assumptions can lead you astray if not rigorously tested.

On the professional front, our team had been grappling with how to integrate these new LLMs into our platform infrastructure. The CNCF landscape was overwhelming, but we settled on a few key technologies like K8s and Istio for containerization and service mesh respectively. However, the real challenge lay in managing the cost pressures brought by cloud providers.

FinOps became a buzzword, and rightly so. We started tracking costs meticulously, using tools like Datadog and AWS Budgets to monitor our expenses closely. The DORA metrics were widely adopted, driving us to be more agile and responsive to change. Staff+ engineering roles had become normalized, leading some in the team to explore career paths beyond just writing code.

One particularly heated discussion revolved around the use of WebAssembly on the server side. While the idea seemed promising, there was pushback from those who preferred keeping things simple with traditional languages like Go or Python. The argument centered around ease of development and maintainability versus performance gains.

In the end, we decided to prototype a few scenarios using WebAssembly and see how they performed in real-world use cases. It wasn’t a unanimous decision, but it was a step forward in exploring new technologies that could benefit our platform.

As I write this, the morning light is fading, and the day’s challenges feel like distant echoes. The journey into building GPT from scratch taught me more about AI than any theoretical paper ever could. It also highlighted the importance of staying adaptable and open to learning new things—especially in a rapidly evolving field like platform engineering.


That was my day on January 9, 2023. A mix of personal growth, technical challenges, and industry reflections. The tech world is moving fast, but so are we.