$ cat post/ps-aux-at-midnight-/-we-shipped-it-on-a-friday-night-/-it-boots-from-the-past.md

20JAN03

ps aux at midnight / we shipped it on a Friday night / it boots from the past

Title: A Day in the Life of a Sysadmin at the Dawn of Web 2.0

January 20, 2003 was another crisp winter day in the Bay Area. The sun had just risen over San Francisco, casting a golden light on the bustling streets below. Today, I’m sitting at my desk, sipping coffee (filtered through a French press because it’s all about that detail), and staring at the screen where I’m debugging some Apache config issues on our LAMP stack.

The Stack

Our tech stack is fairly standard for the time: Linux (Red Hat 7.3), Apache 1.3, MySQL 4.0, and PHP 4.x — hence the “LAMP” moniker. We use Perl scripts for some basic automation and have a handful of Python scripts sprinkled around. The Xen hypervisor is still in its early stages, so we’re sticking with vanilla Linux for now.

The Problem

The issue I’m dealing with today involves one of our customer-facing applications that’s been running fine until recently. Users are starting to complain about slow response times and intermittent timeouts. After poking around the logs and monitoring tools (we’re using Nagios for alerting), I notice a spike in database load during peak traffic hours.

The Debugging

I start by tailing our Apache error logs (tail -f /var/log/httpd/error_log) to get an idea of what might be going wrong. It’s clear that the application is generating a lot of errors related to MySQL connection timeouts and slow queries. I then fire up top and htop (yes, we’re using htop on our development machines) to see which processes are hogging CPU and memory.

It turns out there’s one PHP process that’s consuming almost 90% of the server’s CPU. Upon inspecting its stack trace (ps -ef | grep <PID>), I see it’s stuck in a loop executing a particularly slow query from our application code. The offending line is something like SELECT * FROM users WHERE last_login > NOW() - INTERVAL 24 HOUR; — an attempt to update user activity records, but written in a way that’s not very efficient.

The Fix

I quickly whip up a Python script to optimize the query by adding indexes and rewriting it as:

SELECT id, username FROM users WHERE last_login >= DATE_SUB(NOW(), INTERVAL 24 HOUR);

Then I update our application code with this new query. After redeploying the changes, I monitor the server’s performance using top and htop again to ensure everything is running smoothly.

The Aftermath

By the end of the day, we’ve significantly improved the app’s performance and are no longer getting those pesky timeouts from our users. It feels good to ship a fix that not only improves the user experience but also gives me something concrete to show for my work today.

Reflections on the Day

Debugging in this era is both challenging and rewarding. The tools we have now, like htop, are much more advanced than what was available just a few years ago. Yet, the core principles of debugging — using logs, monitoring tools, and understanding the codebase — remain largely the same.

I’m also grateful for the open-source community that’s providing us with all these wonderful tools and libraries. Whether it’s MySQL for database management or Apache for web serving, we’re leveraging the collective knowledge of developers around the world to build robust systems.

As I close out my terminal window, I can’t help but think about how much has changed since 2003. From Firefox launching to Web 2.0 concepts starting to take shape, the tech landscape is in a constant state of flux. But for now, today’s work is done, and the system is stable.

That was a typical day in the life of a sysadmin back then. There were no HN stories to share, but there’s always something new to learn and fix.