$ cat post/debugging-the-big-red-button.md

Debugging the Big Red Button


November 7, 2005. Just another Tuesday in a world where open-source stacks and Web 2.0 were just beginning to reshape our day-to-day reality.

Today, I’m sitting at my desk, head buried in logs, trying to find out why one of our critical systems has gone down again. It’s not uncommon, but today it feels like more than a typical outage. This time, there’s something different—a big red button that should have been inaccessible somehow got pushed.

The Setup

We’re running a LAMP stack on Linux with Xen for virtualization. Our app is written in PHP, heavily using MySQL databases. We’ve got Apache and Nginx configured to balance the load between multiple servers. It’s all pretty standard, but sometimes it can be the mundane things that lead to the most complex issues.

The Outage

The system went down around 3 AM. Normally, I would have been one of the first to get a notification, but today was different. No alert, no nothing. I woke up and saw the message: “System down.” My first thought: “Did we forget to turn it on this morning?”

But something didn’t feel right. I checked our monitoring tools, which showed everything was green—no CPU spikes, no memory leaks, no network issues. The app logs looked clean too. No errors or warnings that could explain the sudden failure.

Digging In

I started digging through the server’s syslog files. There were some strange entries around the time of the crash: Kernel panic - not syncing: Attempted to kill init!, followed by a flurry of error messages about filesystem issues and missing shared libraries.

That didn’t make any sense. Our servers had been running smoothly for months, so why would something like this happen now? I was puzzled, but there was only one way to find out—boot into single-user mode and inspect the system manually.

The Revelation

In single-user mode, I ran ls commands to list the files in key directories. That’s when I saw it: a directory that shouldn’t have been there, /etc/big_red_button. And inside it was an executable file labeled “emergency.sh”. How did this get here? Who put it there?

I tried to remember if anyone had access to the server recently or if something like this could be part of our backup and restore processes. But no one else in the team seemed to know about this, and our backups didn’t include such a file.

The Solution

After some debate with my colleague, we decided to investigate further without touching anything. We booted into rescue mode on another server and mounted the failed disk as read-only. From there, I started looking at the file permissions and ownership of /etc/big_red_button.

It turns out someone had accidentally pushed a button (literally) that allowed remote root access over SSH. The script was designed to be used in emergencies but hadn’t been properly secured or documented. Someone must have found it and used it without realizing its potential impact.

Lessons Learned

This incident taught me a few important lessons:

  1. Documentation is key: Even emergency procedures should be well-documented.
  2. Permissions matter: Access controls need to be robust, especially for critical systems.
  3. Visibility is crucial: We need better visibility into the state of our servers and how they are being accessed.

Moving Forward

We decided to take a few steps:

  • Audit all emergency scripts and procedures
  • Enhance security measures, including SSH key management
  • Improve our monitoring tools to catch such issues early

This was one of those moments where you realize just how many things can go wrong, even in the simplest systems. It made me appreciate the daily challenges of keeping a system running smoothly and the importance of having robust processes in place.

Conclusion

While this might seem like an old tale now, it’s always good to remember that the basics—secure permissions, proper documentation, clear communication—are what prevent the big red button moments from becoming disasters. And for today, at least, I feel a bit more prepared.


Feel free to hit me up in the comments if you’ve got any tips or stories of your own!