$ cat post/the-function-returned-/-i-ssh-to-ghosts-of-boxes-/-i-strace-the-memory.md

19NOV18

the function returned / I ssh to ghosts of boxes / I strace the memory

Title: Kubernetes Growing Pains: A Day in the Life

November 19, 2018 was just another day at work for me as an engineer, but looking back, it felt like a pivotal moment. The team I was managing had been working with Kubernetes for a few months now, and we were still fighting fires left and right.

Kubernetes had definitely taken off in the cloud wars. Google’s efforts with GKE, AWS’s EKS, and Azure’s AKS all added to its popularity. But as adoption grew, so did the complexity of managing our cluster. We’d gone from deploying a few simple applications to running multiple microservices with complex stateful sets.

That morning, I started off troubleshooting an issue with one of our critical services. It was acting up, and no matter how many times we rolled out new versions or tried scaling it, nothing seemed to stick. The logs were filled with errors that didn’t quite make sense—some obscure command failure that left me scratching my head.

After a few hours of digging, I realized the issue wasn’t just our service. It was something in the infrastructure itself. One of our pods kept crashing and restarting every 5 minutes. The logs showed it was failing to bind to its network port properly, but why? It didn’t make sense with how we had configured networking.

That’s when I remembered a talk I attended earlier that month about Istio. They mentioned how sidecars could add observability and security to your services, but we hadn’t really explored that yet. Maybe the issue wasn’t our service at all, but some weird network misconfiguration caused by one of these new proxy pods.

I went back to the drawing board and decided to try out an Istio sidecar for this service. It was a bit of a risky move since we were still in a learning phase, but it was time to start testing the waters with more advanced Kubernetes features.

As I rolled out the change, I couldn’t help but think about all the buzz around serverless and how companies like AWS and Google were pushing hard on that front. But serverless didn’t seem to fit our use case just yet; we still had a lot of stateful applications that required persistent storage and complex workflows.

After a few more rounds of debugging, things finally started falling into place. The service began behaving as expected, and the logs showed Istio sidecar working its magic. I was relieved but also curious about how many other issues were lurking in our Kubernetes setup.

That evening, I joined the weekly team meeting to share my findings. The conversation turned to platform engineering conversations that had been heating up. How could we build a more resilient and maintainable infrastructure? Could we leverage tools like Terraform for more consistent deployments? Should we move towards GitOps practices?

As the discussion raged on, I couldn’t help but feel a bit of nostalgia. Kubernetes had brought us here, but it wasn’t without its growing pains. We were still figuring out how to build a robust platform that could handle our team’s needs while staying flexible enough for future changes.

Looking back, that day marked a turning point in my journey as an engineer and manager. It reinforced the importance of constantly learning new technologies and techniques. Kubernetes was just one piece of the puzzle, but it had certainly opened up many opportunities for growth and innovation.

That’s how I remember that day—full of challenges, but also full of hope for what lay ahead. The tech landscape was in flux, with serverless hype reaching new heights, but our focus remained on building a solid foundation that could adapt to whatever came next.