Hey folks, Yoshik here.

This week’s digest is going out a bit late. I’ve been curating a few extra pieces because I didn’t want to send you a rushed issue with filler. I’d rather be a little late than waste your time. Good content coming up.

Thanks for sticking with my newsletter, I really appreciate it.

Welcome to this week's Uptime Sync. This issue covers Github Source Code Leak, Cloudflare's ClickHouse bottleneck that silently slowed a billing pipeline, GitHub's critical RCE warning for the git push pipeline, and Netflix's look at the human operations layer behind live-at-scale systems. We also examine Slack's move from SSH to REST for EMR data pipelines, AWS's deep dive into the invisible engineering behind Lambda's network, Databricks' 10 trillion samples per day monitoring challenge, and Wix's zero-downtime database migration service.

On the tutorial front: zero-downtime ECS deployments with automatic PostgreSQL migrations, going back to writing code by hand after AI fatigue, building a Kubernetes debugging AI agent with Claude Code, migrating from GitHub to Forgejo for digital sovereignty, replacing GPT-4 with a local SLM for deterministic CI/CD extraction, and tracing multi-agent AI swarms with Jaeger v2.

This week's projects bring you Grafana's all-in-one OpenTelemetry backend in a Docker image, Healthchecks for cron job monitoring, Infracost for cloud cost estimates and FinOps guidance, and K8sGPT for scanning Kubernetes clusters and triaging issues in plain English.

Newsworthy Reads

Tutorials of the Week

Projects of the Week

Join 1,000+ engineers staying ahead of the curve

Every week, Uptime Sync brings you:

  • Outage postmortems from Netflix, Cloudflare, Pinterest & more

  • Hands-on DevOps & SRE tutorials

  • Production-ready tools & open-source projects

👋 Find me on Twitter | Linkedin | Connect 1:1

Thank you for supporting this newsletter. Consider sharing this post with your friends.

Y’all are the best.

Keep Reading