$ cat post/memory-leak-found-/-the-incident-taught-us-the-most-/-the-port-is-still-open.md

memory leak found / the incident taught us the most / the port is still open


Title: The Day We Migrated Off Xen


August 18, 2003. A day that felt like a small mountain in the long and winding road of our company’s infrastructure journey. I woke up to the sound of my old Windows machine groaning under the weight of various admin tools, and immediately knew it was going to be one of those days.

We had been running Xen virtual machines for about a year now—ever since Google’s aggressive hiring campaign and the buzz around Xen as a hypervisor got us all excited. At the time, it seemed like a dream setup: lightweight, flexible, and open-source. But as with many dreams, this one started to show its cracks.

Our web servers were running on these virtual machines, serving pages for our little but growing startup. We had grown from a few hundred daily users to several thousand, and the load was starting to put some real pressure on our infrastructure. The Xen VMs, while lightweight, weren’t exactly designed with high performance in mind. They were more about agility and ease of setup.

One morning, as I sat down at my desk, I noticed something strange in our monitoring dashboards: the CPU usage on several of our VMs was spiking unpredictably. It wasn’t a steady rise; it was like the VMs were playing some kind of weird CPU game with us. This had been happening for days, and no one seemed to know why.

After a few hours of digging, I realized that the issue wasn’t just in our application code or network traffic patterns—it was something deeper. The Xen hypervisor itself was misbehaving. It would periodically drop into what appeared to be a low-priority state, causing the VMs to thrash and CPU usage to spike.

I spent the next few days trying to figure out how to debug this. I dove into the Xen source code, ran countless top commands, and even tried to set up a remote debugger for the hypervisor (which was more of an academic exercise than anything practical). But nothing quite gave me the insights I needed.

Around lunchtime, I decided to take a break and grab some food. As I walked through our open-plan office, I saw our sysadmin team gathered around a whiteboard, hashing out ideas for how to handle this issue. They had come up with a few potential workarounds—like using different Xen configurations or even switching to a more mature virtualization solution—but none of them seemed ideal.

During the break, one of my colleagues mentioned that they were running some tests on KVM (Kernel-based Virtual Machine), another open-source hypervisor. The idea lingered in my mind as I returned to my desk.

By the end of the day, we decided to take a leap and migrate off Xen onto KVM. It wasn’t an easy decision—Xen was still relatively new compared to its competitors—but it felt like the right move for our growing infrastructure. We had a lot of code that needed porting, but the potential benefits in terms of stability and performance were too appealing to ignore.

The next few days were intense as we set up KVM on our servers, optimized the configurations, and gradually migrated each VM over one by one. There were some hiccups—of course there were—but we managed to pull it off without any major issues. The end result was a more stable, performant infrastructure that could handle our growing user base.

Looking back, I can’t help but feel a mix of relief and pride at how we handled the migration. It wasn’t just about swapping one hypervisor for another; it was about adapting to the changing landscape of open-source technology. In those days, the tech world moved fast, and staying agile meant being ready to pivot when necessary.

That day in August 2003 taught me that sometimes, the best decisions aren’t always the easiest ones. But they are often the right ones when you’re willing to take a chance on something new.


The migration off Xen was just one of many battles we fought and won as our infrastructure evolved. Each step forward was a lesson in resilience and adaptability—qualities that have served us well over the years.