Containers are nothing like virtual machines!
Now that we’ve cleared that up, this post will try to shed some light regarding:
- How containers, as we know them, came to exist
- Major differences between containers and virtual machines
- Examples of how to build minimal containers
- Demystifying the
- Examples of how to debug running containers using other containers
- Benefits of minimal containers
- Tools that help build minimal containers
I’ll be using docker throughout this post as it’s more widely used, but these concepts should apply to other runtime environments like, for example, rkt, lxd or containerd.
It’s all about abstraction
When virtual machine Hypervisors started their rise, they provided full or paravirtualization, fancy names for virtualizing everything or using special drivers on the guest to improve the manipulation of the real machine (host). Both guest and host had a full operating system copy, including their own kernel, libraries, tools and so on.
With containers (jails, zones, etc.), the host and the “guest” share the same kernel to achieve process isolation. Eventually, a set of nifty new Linux kernel features called cgroups(7) (CPU, memory, disk I/O, network, etc.) and namespaces(7) (mnt, pid, net, ipc, uts and user) appeared to better restrict and enforce that isolation.
It used to be a very daunting task to manage those kernel features, so tools were created to abstract that complexity. LXC was the first I used and spent more time with. It wasn’t very user friendly, but it got the job done.
Apparently, it was so cool that some folks created an abstraction layer over it to make it trivial to anyone. I first saw that abstraction, back in 2013, showcased on this talk by Solomon Hykes, an engineer working for a company called dotCloud, nowadays known as Docker.
And the rest is history. Eventually docker dropped the need for LXC, it now deals with the kernel features abstraction directly (libcontainer) and has an entire ecosystem for container management.
But it still looks like a Virtual Machine to me
I can understand why we compare containers to virtual machines. They “feel” the same, and that’s great.
But keep in mind virtual machines need their own kernel, init system, drivers, etc., and containers just use the host’s kernel to isolate processes (preferably, one process per container).
So, why are people shipping an entire kernel and system tooling inside a container, generating massive images with stuff that will never be used?
The container runtime provides the basic filesystem and kernel features for your application to run, which means you can focus on your aplication and benefit from the advantages of a minimal container.
I’ve prepared a few examples to help materialize these concepts.
Busybox is a very handy binary. It performs several functions depending on how it’s called. We’ll use it as our example application:
And now that we have a new shiny tar file (container image) with a binary, a couple of symlinks and with no kernel or extra junk, it’s time to import it:
At this time, things are starting to get interesting. Let’s try running our
myapp container and do a simple
Where did all that stuff come from? Shouldn’t it only have
There are differences between a container image and that same image during runtime. The Open Container Initiative (OCI) libcontainer spec explains it quite nicely.
That’s sourcery, I want my Dockerfile back
Sure, whatever rocks your boat.
You probably heard about the
scratch container. Let’s build our own and call it
If we’re on the same page, you must be realizing that all this fuss around containers should be instead around tar files, right?
I demand a shell
One could argue that a shell is mandatory for debugging. Obviously strace has to be present, but what if I need to copy files from/to the container? Maybe use a SSH daemon?
Well, let me put this crystal clear: You don’t!
As one of the underlying building blocks of containers are namespaces, you can use nsenter(1) to run programs with namespaces of other processes.
If that’s so, why don’t we use the same PID/NET namespace between containers, effectively sharing those resources?
For instance, you could build a toolkit container with all the tools one could ever need and attach it to a container that doesn’t even have a shell.
I, for one, did exactly that. And we’ll be using it in this example:
Now we’re on a bash shell in the
toolkit container attached to the running
myapp. Let’s look around.
We can see the
sleep process is running as PID 1, but where’s the
So, do you still think someone needs a shell and all those tools on the
I can argue that there’s someone who would if, per example, a remote code execution vulnerability on the application was found. In this case, that malicious someone would love to have a shell laying around and maybe some useful tools like curl/wget.
With that said, let’s then strive to restrict the attack surface on our containers and, as a bonus, you’ll get:
- Less network bandwidth required to move container images around
- Less storage requirements due to image size
- Less IOPS needed due to image size
- Less software equals less vulnerabilities to scan, manage, patch, upgrade…
- Faster build times
- Faster ship times
I get it, it’s hard to manage all the dependencies of a real application and completely detach it from the operating system where it was built, but rest assured, there are more people who feel the same and the community is here to help.
These are some tools to make things less painful:
If you want to have complete control of what’s inside your container and not depend on prebuilt packages (rpm, deb, etc.), just use buildroot.
For more buzzwordy tools I recommend this talk from Michael Ducy.
That’s it, that’s all
Well, not quite, this is just the beginning. There are a lot of standards/implementations evolving and being adopted (OCI, CNI, CRI, etc.).
All of them improve the ecosystem around containerization allowing everyone to step in and contribute.
Containers are here to stay and understanding what makes them tick is no longer optional.