A tiny explanation of how containers work. Updated November 11, 2022.
chroot is a Linux syscall that changes the root directory of a process. It is widely believed that containers are implemented using chroot. This is wrong, but it does make sense. If you run ls inside a container, you only see files from that container. chroot is more than capable of making that happen.
I too used to think that containers use chroot.
But now I know better.
If containers were implemented with chroot, you'd expect container runtimes to call chroot in their source code.
So I searched runc's code for chroot.
Hmm, it does appear there after all. Sixth line from the top:
But a closer look reveals that chroot isn't usually called! The highlighted code runs instead. (Normally configs.NEWNS is true.)
What is in that highlighted code? A mysterious function called pivotRoot.
pivotRoot is a wrapper for the Linux syscall pivot_root. What is pivot_root then? Basically, chroot++.
But what's wrong with chroot? For starters, it's trivial for a rogue processes to undo a chroot. It just needs to call chroot again and reverse the first call. Whoops. Isolation broken.
There are actually workarounds for that - which still use chroot - but like the man-page says, chroot "is not intended to be used for any kind of security purpose."
pivot_root on the other hand, is designed for that.
With pivot_root you can jail a set of processes inside a directory properly. And that's a must for containers.
For a deep dive on chroot vs pivot_root, see this post from tbhaxor.
A container at runtime is:
Matt Rickard recently covered cgroups and namespaces. Earthly did a deep dive on filesystem isolation, albeit they made the very mistake this post is about. I've talked about chroot on LinkedIn too.
Surprisingly, to build a container you must run a container! At least traditionally. More on that in a future post.
Compressed and layered filesystems with some metadata.
I think. Haven't dealt with that area much.
Yes, please. I'm on LinkedIn and Twitter.
Also, Robusta.dev just got a major update. If you're tired of noisy and unclear Kubernetes alerts, check it out. Best for people who like Prometheus.