$ cat post/when-scripting-was-still-a-job-title.md

When Scripting Was Still a Job Title


November 29, 2004. I remember it well—it was just before the holiday season when everyone was scrambling to get their holiday shopping done, and I was in the thick of a project that would end up being one of my earliest forays into writing real automation. Back then, scripting jobs were still considered a valid way to manage systems. It felt like we had just stepped out of the 1980s with our Bash scripts and Perl one-liners.

The Setup

At this point in my career, I was working at a small startup that had grown rapidly over the past year. We were using mostly Red Hat Linux, and we were starting to see the fruits of moving away from our old Windows servers. Our infrastructure team was lean but effective—just me, really. I wore multiple hats: sysadmin, developer, even sometimes designer.

We had a couple of critical services that needed constant uptime, including an internal ticketing system and a few other tools that kept our engineering team humming along. These systems were built on top of LAMP (Linux, Apache, MySQL, Perl) stacks, and we were using a mix of Bash and Perl scripts to keep them running smoothly.

The Challenge

One morning, I noticed something odd in my logs—our ticketing system was flaky. It would work fine for hours, then suddenly become unresponsive or slow. This was no bueno; we needed this thing to be rock-solid because it was our primary communication tool within the company.

I decided to take a look at the Perl scripts that were responsible for kicking off the background jobs and services that kept the system running. These were just basic scripts, but I knew they could be optimized.

Debugging and Learning

The first thing I did was dig into the logs. I wrote a quick Bash script to tail multiple log files simultaneously, which was something new for me at the time—I hadn’t written a multi-threaded application in years. But it worked like a charm:

tail -f /var/log/nginx/access.log & tail -f /var/log/httpd/error_log &

This allowed me to see both Nginx and Apache logs simultaneously, which helped pinpoint where things were going wrong. After a few hours of sifting through the logs, I noticed something strange: there was an excessive number of 404 errors being logged. It seemed like some requests were timing out or not reaching the expected destination.

Optimizing with Xen

To further debug, I decided to use Xen to run a small virtual machine (VM) that would mirror our production environment. This was one of my first encounters with Xen, and it felt like stepping into a new world compared to the KVMs we had been using so far. It allowed me to set up an isolated test environment where I could experiment without affecting live services.

I wrote another Perl script to replicate the user requests that were causing issues in our production system. This was a bit of a hack, but it did the job:

use LWP::UserAgent;

my $ua = LWP::UserAgent->new;
for (1..100) {
    my $response = $ua->get('http://our-ticketsystem-url.com');
    print "Status: ", $response->status_line, "\n";
}

This script sent 100 HTTP requests to our ticketing system and printed out the response status. Running this in parallel with the Xen VM allowed me to see exactly what was going on when these issues occurred.

The Solution

After a lot of trial and error, I finally found the issue: it was a caching mechanism in one of the Perl scripts that wasn’t handling requests as efficiently as it should. By tweaking the cache settings and optimizing the script, I managed to reduce the 404 errors significantly.

This project taught me a few things:

  1. Scripting is more than just automation: It can be used for debugging and troubleshooting too.
  2. Isolation is key: Using tools like Xen to set up isolated environments can save a lot of headaches.
  3. Automation saves time, but not without its quirks: Every now and then, you still need to dig into the nitty-gritty details.

Looking Back

That was one of my first big projects where I really dug deep into debugging using scripting and automation tools. It felt like we were in the early days of what would become a much more automated world, with frameworks like Ansible and Kubernetes yet to come. But it was a good learning experience, and even today, when I look back at those scripts, they still seem like a step towards a better way of doing things.


That’s how I debugged my first big system issue in 2004. It wasn’t fancy or glamorous, but it taught me the importance of digging deep and using every tool available to get to the bottom of problems.