$ cat post/a-segfault-at-three-/-i-typed-it-and-watched-it-burn-/-it-failed-gracefully.md
a segfault at three / I typed it and watched it burn / it failed gracefully
Title: Dealing with DevOps Debt: A Case Study
September 26, 2022. Today feels like a strange mix of everything old and new in tech. Queen Elizabeth II’s passing is one of those things that makes you stop and reflect on the world we live in. Meanwhile, Figma’s acquisition by Adobe for $20B is another reminder that user experience design tools are becoming just as important as the underlying infrastructure that runs our applications.
But let me ground this reflection in my recent work at a place where I’m currently leading platform engineering efforts. We’ve been dealing with what we call “devops debt” – not a fun phrase, but it’s real and it’s something we’ve had to address head-on.
The Problem
We inherited a monolithic application that was once state-of-the-art but has since become an unwieldy beast. Over the years, developers have added features in a way that prioritized quick wins over long-term maintainability. This led to an architecture where services were tightly coupled, making it difficult to scale or replace any of them without causing ripple effects.
One specific issue I had to tackle was the database layer. Our PostgreSQL instance had grown so large and complex that even running a simple EXPLAIN query would take several minutes. The queries themselves were written in such a way that they locked tables, making read operations slow for users. On top of that, the application code wasn’t properly abstracted from the database, leading to tight coupling.
The Solution
I decided to break down this monolith into smaller, more manageable services using microservices architecture principles. This meant rewriting our entire stack and redesigning how data flows between services. It was a massive undertaking, but we started with small, incremental changes. We introduced tools like Docker Compose for local development environments, and used Kubernetes for deployment and management.
One of the key technologies we embraced during this transformation was WebAssembly (Wasm). We saw an opportunity to move some of our business logic into serverless functions using Wasm to improve performance and reduce the attack surface. This required us to learn a new language – Rust, which is known for its safety and performance. It took time, but it paid off in terms of stability.
The Challenges
The biggest challenge was managing expectations and timelines with stakeholders who were used to seeing quick wins every sprint. We had to communicate the long-term benefits of this approach, even if they wouldn’t see immediate results. This involved a lot of conversations about DORA (DevOps Research and Assessment) metrics and how improving our release cadence would lead to better overall system health.
We also faced pushback from developers who were accustomed to working with monolithic architectures. They worried that breaking things into microservices would complicate their lives. We had to be patient and provide the necessary training and support, showing them how these new tools and practices could make their jobs easier in the long run.
Lessons Learned
Debugging this tangled mess of a system was one of the most challenging aspects of my career so far. It taught me that sometimes you have to burn your boats (metaphorically) and start over if something has grown too large and complex. The key is to take it step by step, celebrate small wins along the way, and keep everyone informed about the progress.
In conclusion, dealing with devops debt isn’t just about technical challenges; it’s also about changing mindsets and processes. It’s a journey that requires patience, persistence, and sometimes the courage to let go of what once worked and start anew.