$ cat post/ps-aux-at-midnight-/-a-webhook-fired-into-void-/-config-never-lies.md

ps aux at midnight / a webhook fired into void / config never lies


Title: On a Zoom Call, With SREs, and eBPF


April 20, 2020. A Wednesday afternoon. I find myself in an all-hands meeting with the team. The air feels thick with anxiety; everyone is on their second or third child working from home, dealing with Zoom fatigue, and wondering how to make it through this thing.

The meeting is about our infrastructure’s current state. We’ve been wrestling with a complex Kubernetes cluster for months now—too many services, too many secrets, and not enough automation. The ops team is tired, and the developers are frustrated. I’m in my home office, with a toddler trying to eat Play-Doh and another one who’s just learned to walk but doesn’t quite understand that we’re supposed to be working.

One of the engineers asks about SRE roles—how they might help us manage our infrastructure better. SRE (Site Reliability Engineering) is a buzzword around here, with some people pushing for it, others skeptical. I’ve been thinking about this a lot. After all, Kubernetes is becoming a bit too complex to handle.

The conversation turns to eBPF (Extended Berkeley Packet Filter). It’s the new kid on the block, and everyone’s curious. I’ve started using it in some of my projects because it gives you direct access to kernel data structures, which can be incredibly powerful for performance tuning. But I also know that it can be a double-edged sword. We need to tread carefully.

“Brandon,” one engineer asks, “what do you think about eBPF?”

I chuckle. “Well, it’s interesting. It gives us more control over the kernel level without writing C code. That’s a big win for performance and observability.”

Another person pipes up, “But isn’t that what Kubernetes is supposed to handle for us?”

That’s a fair point. The complexity of our Kubernetes setup has been growing exponentially. We need tools that can scale better than just adding more nodes or services. I’ve been thinking about how we might use eBPF for network policy enforcement and tracing, but it’s not an easy road.

The meeting wraps up, and as the rest of my team members head back to their workstations, I stay behind with a few SREs from another team. They’re discussing how they handle on-call rotations and incident response. It’s clear that their model is working for them. Maybe we need to consider adopting some of those practices.

As I start to pack up, my daughter comes over and hands me her toy car. She’s learned the concept of giving something—maybe she’s trying to teach me about generosity during these tough times. It strikes me then that this isn’t just about technology; it’s also about community and support.

I head home, where more Zoom calls await. I’ll probably spend the evening working on some of our internal developer portal tools using Backstage. The transition from physical office to remote work has been challenging, but it’s given us a chance to rethink how we operate as an organization. With eBPF, SRE roles, and Kubernetes complexity fatigue all swirling around in my head, I feel like we’re at a crossroads.

How do we move forward? That’s the question that haunts me as I close the laptop and join my family for dinner. We may not have all the answers yet, but we’ll figure it out together—one line of code at a time.

Stay tuned for more from this journey.