YouTube Architecture

The Uptime Engineer

👋 Hi, I am Yoshik Karnawat

You'll learn why YouTube never abandoned MySQL (they sharded it to tens of thousands of nodes using Vitess), how their CDN spans 1,300+ cities with DNS-based routing that lets them rebalance load without code deploys, and why their two-stage DNN recommendation system optimizes for watch time instead of clicks.

🔧 This Week's Command

# Measure your own video streaming performance (like YouTube does)

curl -w "@-" -o /dev/null -s "<URL>" <<'EOF'
    time_namelookup:  %{time_namelookup}s (DNS lookup)
       time_connect:  %{time_connect}s (TCP handshake)
    time_appconnect:  %{time_appconnect}s (TLS handshake)
 time_starttransfer:  %{time_starttransfer}s (TTFB - first byte)
         time_total:  %{time_total}s (full download)
            speed_download:  %{speed_download} bytes/sec
EOF

Break down exactly where latency happens: DNS lookup, TCP handshake, TLS negotiation, and time to first byte. Use this to debug slow page loads or compare CDN performance across regions. Replace the URL with any endpoint you want to profile.

🔥Tool Spotlight

Vitess. MySQL clustering system that powers YouTube. Handles sharding, connection pooling, query routing, and automated failover across tens of thousands of MySQL nodes. Built by YouTube, now CNCF project.

📚 Worth Your Time

Google's Borg paper - Cluster management at scale

YouTube video delivery measurement study - DNS-based routing and multi-tier caching

TL;DR (30 seconds):

YouTube runs on Python, MySQL (via Vitess), Bigtable, and a multi-tier CDN not microservices magic
They scaled MySQL to tens of thousands of nodes using Vitess for sharding, pooling, and query safety
Their CDN spans 1,300+ cities with DNS-based routing that lets them rebalance load by updating DNS
Two-stage DNN (candidate generation → ranking) powers recommendations, optimized for watch time, not clicks

Scale Context: 2 Billion Users, 1,300+ Cities

YouTube serves over 2 billion users using Google's Media CDN. The same infrastructure now available to Cloud customers.

That network spans:

206+ countries/territories and 1,300+ cities
Multi-tier cache hierarchy (primary, secondary, tertiary) discovered via large-scale measurement studies
QUIC/HTTP/3, TLS 1.3, and BBR congestion control for optimal throughput

YouTube's recommendation system is "one of the largest scale and most sophisticated industrial recommendation systems in existence" according to Google's own research.

And all of it runs on Google's Borg cluster manager - the same infrastructure powering Search, Gmail, and Ads.

The Big Insight

Most scaling stories follow the same arc: "We outgrew MySQL, so we moved to NoSQL."

YouTube's story is different.

They started with a single MySQL instance. Growth caused:

Replication lag under heavy write load (async replication, single-threaded on slaves)
Connection exhaustion (too many app connections overwhelming MySQL)
Large tables that couldn't scale vertically

So they built Vitess - a database clustering system specifically designed to scale MySQL horizontally.

From the Vitess GitHub README (maintainer-authored):

❝

"Vitess was a core component of YouTube's database infrastructure from 2011, and grew to encompass tens of thousands of MySQL nodes."

That's not "thousands of database servers."
That's tens of thousands of MySQL instances, more database nodes than most companies have servers.

How Vitess Works

Vitess sits between your application and MySQL, providing:

VTGate (query router):

Routes queries to the correct shard automatically
Connection pooling to prevent MySQL connection exhaustion
Query safety: kills dangerous/long-running queries before they take down the database

VTTablet (per-shard agent):

Manages individual MySQL instances
Row-level caching via Memcached with invalidation via MySQL replication stream
Automated failover and backups with minimal manual intervention

Sharding & resharding:

Split/merge shards with minimal downtime
Vital for growing hot shards without service disruption

The impact:

Cache locality improved (fewer cache misses)
I/O reduced (less disk thrashing)
Hardware efficiency increased
Replica lag dropped to 0

The Four Request Paths

YouTube's architecture splits into four distinct request shapes. Each hits different walls.

1) Watch Page (Metadata + HTML)

Flow: Client → Load balancer → Backend services → Data layer (MySQL+Vitess, Bigtable, caches) → HTML

What breaks: Fan-out RPC latency, not application code.

Tech stack (confirmed by Google):

Python for most business logic
C++/Java for performance-critical paths (video processing, low-latency serving)
Go for modern services like Vitess

How they scaled it:
Pre-generate cached HTML blocks, cache Python objects (not raw DB rows), push computed data into local memory.

The counterintuitive part:
Adding more web servers worked because Python code spent most of its time waiting on RPCs, not burning CPU.

2) Video Serving (Multi-Tier CDN + DNS Magic)

Current reality: YouTube uses Google Media CDN spanning 1,300+ cities.

But here's the clever part most people miss: DNS-based routing.

A measurement study of YouTube's delivery network found:

Flat video ID space
Multiple DNS namespaces reflecting a multi-tier logical cache hierarchy
Video IDs map to logical servers, then to physical cache locations via DNS

This means YouTube can add servers or rebalance load just by updating DNS mappings, no code deploys needed.

Protocols:

DASH/HLS for adaptive bitrate streaming
QUIC/HTTP/3 with BBR congestion control
Player constantly adjusts quality based on bandwidth and buffer health

Multi-tier serving:
Edge caches serve video segments. On cache miss, fetch from regional tier. On regional miss, fetch from origin.

Most requests never touch origin.

3) Thumbnails (The Small-File Problem)

Watch pages display ~60 thumbnails. That's massive request volume hitting tiny objects.

The filesystem nightmare:
Early systems hit inode cache thrashing, directory limits (ext3), and brutal warmup times.

The solution: Bigtable

Post-Google-acquisition architecture notes describe Bigtable used to:

Replicate thumbnails across data centers
Store high-volume replicated data in wide-column, key-value patterns

Why Bigtable solves this:
It clumps data together, leveraging distributed multi-level caching across colocation sites.
Avoids the "billions of tiny files" problem that kills warmup times, inodes, and operational velocity.

Modern systems continue using Bigtable for:

Video metadata at massive scale
User activity logs
Time-series data

4) Databases (The Vitess Deep-Dive)

This deserves its own section because it's the most documented part of YouTube's architecture.

The evolution:

Stage 1: Single MySQL + read replicas
Worked until write load caused replication lag.

Stage 2: Vertical splitting
Split tables into different databases (user metadata vs. video metadata).
Bought time, didn't solve the fundamental problem.

Stage 3: Sharding + Vitess
Shard data across multiple MySQL instances (by user ID, video ID, etc.).
Vitess handles routing, pooling, safety, and operational automation.

Why sharding won:

Cache locality: Each shard has its own cache. No cache contention across unrelated data.
I/O isolation: Hot user tables don't compete with video metadata queries for disk I/O.
Blast radius: A failing shard doesn't take down the entire database fleet.
Horizontal scaling: Add capacity by adding shards, not vertically scaling monster boxes.

The operational win:
Vitess handles resharding (split/merge shards), failover, and backups with minimal human intervention.

This is why YouTube could scale to tens of thousands of MySQL nodes without an army of DBAs.

Video Processing Pipeline (Upload → Transcode → Serve)

From system design analyses consistent with Google's infrastructure patterns:

Upload:

Large files split into 10–50 MB chunks
Chunks go to nearest Google edge server, then into Google Cloud Storage (GCS)
Uploads can resume mid-transfer (chunking enables this)

Transcoding:

FFmpeg-based pipelines create multiple renditions (resolutions, bitrates, codecs)
Jobs handled asynchronously via message queues (Pub/Sub-style)
Keeps upload latency separate from heavy processing

Storage pattern:

Video blobs → GCS/Colossus (Google's distributed file system)
Metadata → MySQL+Vitess (users, video titles, view counts)
User activity → Bigtable (watch history, analytics, time-series)

This matches Google's broader architecture: large binary objects in GCS, structured metadata in MySQL, NoSQL data in Bigtable.

Recommendation System (Two-Stage DNN)

The most authoritative source: Google's own research paper ("Deep Neural Networks for YouTube Recommendations" by Covington et al.).

Stage 1: Candidate Generation

Goal: Reduce billions of videos to a few hundred candidates quickly.

How: Extreme multiclass classification - predict the next video a user will watch.

Inputs: User watch history, search history, demographics, context. All embedded into vectors, fed into a feed-forward network.

Stage 2: Ranking

Goal: Score each candidate based on expected watch time (not clicks).

Features: Expected watch time, freshness (video age), user/video embeddings.

Why watch time matters:
Clicks are gameable. Watch time measures actual engagement.
This single metric shift changed YouTube's entire content ecosystem.

The freshness problem:
How do you surface recently uploaded videos when your model is trained on historical data?

YouTube explicitly adds freshness features to ensure new videos can surface quickly, even without historical watch data.

Latency requirement:
Serve recommendations under millisecond-scale latency at global scale.
Continuously ingest new interactions and video uploads for model updates.

Cluster Management: Borg

YouTube doesn't run on dedicated hardware. It runs as one of many large workloads on Google's Borg cluster manager.

From Google's Borg paper and talks mentioning "transcoding YouTube videos" as example workloads:

Borg handles:

Hundreds of thousands of jobs across clusters with tens of thousands of machines
Both latency-sensitive services (video serving, API backends) and batch workloads (transcoding, analytics)
High utilization via bin-packing and over-commitment
Process-level isolation, admission control, and fast fault recovery

Mental model:
YouTube's microservices and batch pipelines share infrastructure with Search, Gmail, and Ads while using specialized stacks (Vitess, Bigtable, GCS, Media CDN) for its particular workload.

This is why YouTube doesn't need to worry about "Kubernetes vs. ECS."
They run on Google's internal cluster manager that predates Kubernetes (Kubernetes is the open-source version of Borg's lessons learned).

Data Center Strategy

Videos: Can come from any data center. Bandwidth matters more than latency.

Thumbnails/Images: Latency-sensitive (pages include dozens). Replicated using Bigtable. Code chooses replicas based on proximity metrics.

YouTube uses 5–6 data centers + CDN with strategic colocation for hardware customization and network negotiation.

The One Thing to Remember

YouTube didn't scale by predicting bottlenecks.
They scaled by fixing them faster than they appeared using simple tools (Python, MySQL, Vitess) at massive scale instead of rewriting everything.

The architecture isn't elegant. It's pragmatic.
And that's why it works.

Join 1,000+ engineers learning DevOps the hard way

Every week, I share:

How I'd approach problems differently (real projects, real mistakes)
Career moves that actually work (not LinkedIn motivational posts)
Technical deep-dives that change how you think about infrastructure

No fluff. No roadmaps. Just what works when you're building real systems.

👉 Subscribe for free to get it delivered every week

👋 Find me on Twitter | Linkedin | Connect 1:1

Thank you for supporting this newsletter.

Y’all are the best.

YouTube Architecture

The Uptime Engineer

🔧 This Week's Command

🔥Tool Spotlight

📚 Worth Your Time

Scale Context: 2 Billion Users, 1,300+ Cities

The Big Insight

How Vitess Works

The Four Request Paths

1) Watch Page (Metadata + HTML)

2) Video Serving (Multi-Tier CDN + DNS Magic)

3) Thumbnails (The Small-File Problem)

4) Databases (The Vitess Deep-Dive)

Video Processing Pipeline (Upload → Transcode → Serve)

Recommendation System (Two-Stage DNN)

Stage 1: Candidate Generation

Stage 2: Ranking

Cluster Management: Borg

Data Center Strategy

The One Thing to Remember

Keep Reading

The Uptime Engineer