The Uptime Engineer
👋 Hi, I am Yoshik Karnawat
If you touch Kubernetes or Docker, this 180-second read will change how you set limits forever.
Facts about cgroups
Containers don’t exist in the kernel. cgroups + namespaces = containers
The biggest failure mode in cgroups v1: limits applied, kernel ignores them at runtime
Linux introduced cgroups v2 in 2016, but most production clusters only migrated around 2022–2023
Kubernetes only considered cgroups v2 fully supported in v1.25
Docker on modern Ubuntu silently uses cgroups v2 by default
You set --memory=512m on a Docker container.
It crashes anyway.
You configure Kubernetes resource limits. Pods get OOMKilled randomly. Your monitoring shows CPU throttling that doesn't match what you configured.
Here's the thing, container resource limits aren't magic.
They're just cgroups. And if you're still running cgroups v1, you're dealing with a fragmented mess that even experienced engineers struggle to debug.
Let me show you what's actually happening.
The layer beneath containers
Control Groups (cgroups) are a Linux kernel feature that limits and tracks resource usage for processes.
Every time you run a container with resource limits, the runtime creates cgroups to enforce them. Docker, containerd, CRI-O, Kubernetes, they all use cgroups.
Cgroups control CPU allocation, memory usage, disk I/O, network bandwidth, and process limits.
Without cgroups, containers would just be regular processes with no boundaries.
Why v1 breaks at scale
cgroups v1 shipped in Linux 2.6.24 back in 2008.
It works. But it has a fundamental flaw: multiple hierarchies.
Each controller (CPU, memory, I/O) exists in a separate tree. Your container runtime manages multiple mount points:
/sys/fs/cgroup/cpu/
/sys/fs/cgroup/memory/
/sys/fs/cgroup/blkio/
/sys/fs/cgroup/pids/Each controller acts independently. Your CPU limits don't coordinate with memory limits. The blkio controller throttles I/O inconsistently across devices.
When multiple cgroups apply to the same process, behavior becomes unpredictable.
You set a memory limit. But the kernel's OOM killer doesn't respect it because memory accounting is split across hierarchies.
Container orchestration becomes a nightmare. Kubernetes manages separate cgroup paths for each resource type. Debugging means checking multiple locations.
The Fix
cgroups v2 launched in Linux 4.5 (2016) with one unified hierarchy.
All controllers live under a single mount point:
/sys/fs/cgroup/Every resource controller uses the same tree structure. CPU, memory, and I/O limits coordinate. The kernel enforces them consistently.
Unified enforcement: When you set limits, all controllers see the same hierarchy. No fragmented behavior.
Consistent API: Control files follow standard naming. cpu.max, memory.max, io.max. Simple and predictable.
Better memory control: v2 added memory.min (guaranteed memory) and memory.high (throttle before OOM). In v1, you only had hard limits that crashed your container.
Accurate I/O throttling: The io controller replaced blkio. You get precise read/write limits per device.
Pressure Stall Information (PSI): Real metrics on resource contention. You can see when containers are starved, not just hitting limits.
How limits become cgroups
When you run a container with limits, here's what happens behind the scenes.
Docker example:
docker run --memory=512m --cpus=2 nginxcgroups v1 creates:
echo 536870912 > /sys/fs/cgroup/memory/docker/<id>/memory.limit_in_bytes
echo 200000 > /sys/fs/cgroup/cpu/docker/<id>/cpu.cfs_quota_uscgroups v2 creates:
echo 536870912 > /sys/fs/cgroup/docker/<id>/memory.max
echo "200000 100000" > /sys/fs/cgroup/docker/<id>/cpu.maxWhy this matters in production
Predictable throttling: In v1, hitting a memory limit often meant instant OOM kill. In v2, memory.high throttles allocation gradually. Your container slows down instead of crashing.
Accurate metrics: PSI shows real resource pressure:
cat /sys/fs/cgroup/memory.pressure
some avg10=2.30 avg60=1.50 avg300=0.80 total=12340000
full avg10=0.00 avg60=0.00 avg300=0.00 total=0This tells you if your container is starved or just hitting configured limits.
Fair scheduling: The unified hierarchy enforces fairness across all resources. CPU-heavy containers don't steal I/O bandwidth from memory-heavy workloads.
Where we are today
Modern distros default to cgroups v2: Ubuntu 22.04+, RHEL 9+, Fedora 31+.
Kubernetes fully supports v2 since v1.25. Docker and containerd handle v2 automatically.
Check your system:
ls /sys/fs/cgroupcgroups v1 worked for years, but its fragmented design causes unpredictable behavior at scale. cgroups v2 fixes this with unified hierarchy, consistent APIs, and better resource control.
Check your kernel version. Upgrade if you're still on v1.
