Last week I had an interesting discussion with a colleague on containers (Docker mostly), VMs, as well as a more recent development in this space called unikernels. Regular geek speak. I’ve mashed up the most interesting parts of the discussion, together with some background information.
Containerization is lightweight OS virtualization that groups and isolates certain processes and resources from the host operating system and other containers. Containers share the operating system kernel and may share binaries and libraries.
Same meat, different gravy
Containerization is nothing new. Parallels released Virtuozzo (OpenVZ being its open source sibling) in 2001, Solaris Zones was released in 2004, to name just two. Containers leverage Linux features (e.g. cgroups, namespaces) that have been part of the kernel for quite some time.
Pros and cons
The benefits of running containers are obvious: fast deployment time, portability, small footprint and consolidation.
Security however is a whole different beast. The Docker architecture for instance runs a daemon that requires root and containers are run with root privileges as well (for now anyway).
Furthermore containers lack the hardware isolation (Intel VT-x) that VMs provide. Most hypervisors are battle hardened in production and have good security reputations. But make no mistake, even hypervisor isolation isn’t infallibly secure (ESXi, Xen).
This doesn’t necessarily make containers insecure, it requires a different security perspective. Isolation can be improved by running each container in its own lightweight VM. Rkt can leverage Intel Clear Containers and VMware uses something it calls jeVM (just enough VM) running an optimized container operating system, Photon OS.
Needless to say, creating a VM for every container adds overhead, although the performance impact is relatively small, according to VMware.
Application centric behaviour
Know your application and know how it behaves in a container. For example, when you run your application in a Docker container, it is launched as Process identifier 1. PID 1 is normally reserved for the initialization (init) system and one of its responsibilities is to adopt orphaned child processes (zombies).
If your Docker PID 1 process does not correctly reap adopted child processes, zombie processes will continue to consume resources indefinitely.
Also, properly stopping your Docker application is just as important as starting it.
In June 2015, Docker, CoreOS and a broad coalition of tech companies announced the formation of the Open Container Initiative. OCI will “…contribute to building a vendor-neutral, portable and open specification and runtime that deliver on the promise of containers as a source of application portability…“.
Docker took its own libcontainer runtime (the low-level plumbing that creates a container), modified it to run independently from the rest of the Docker infrastructure and donated the codebase to OCI, creating the runC runtime. Both Docker and CoreOS rkt (currently using the open appc specification) will be moving towards this new open OCI specification.
Data in Docker containers
Docker uses a combination of read-only and read-write layers on top of a Union filesystem. Docker images share the read-only layers and when a container modifies a file, it is copied to the read-write layer of that container, leaving the original file intact. Modified data will persist in the container (even if it is stopped), until the container is deleted.
You can use data volumes to store persistent data independent of the container’s lifecycle. This lets you bypass the Union filesystem and store files directly on the host.
The recommended way to store and share data between containers however is by using data volume containers. You basically run a container with the sole purpose of hosting a volume. Other containers that need the data simply mount the volume by referring to the data container. So data volume containers let you abstract the physical data location on the host.
When you need to share data between Docker hosts, you could for example mount a NFS folder on multiple Docker hosts and subsequently mount this folder as a volume in your containers. A better option would be to find a suitable third party volume plugin, Flocker is an example of an excellent data volume manager.
Unikernels or library operating systems got some extra exposure at the beginning of 2016 when Docker acquired Unikernel Systems. Examples of other unikernel implementations are MirageOS (being served from a unikernel), HaLVM and OSv, to name a few.
Unikernels are specialized operating systems that are compiled from the application code and the bare minimum OS libraries needed to support the application, all in a single address space. This means unikernels are lightweight, secure (small attack surface) and blazing fast, since there is no context switching between kernel and user space.
There are plenty of disadvantages also; poor debugging (there is no ps, ping etc.), unikernels run a single language runtime (OCaml, Haskell, C, Java etc.) and limits the developer, it is ‘new’, existing application will almost certain require code rewrite.
If you would like to know more about unikernels, I thought this was a pretty good presentation.
High level comparison
To close off, I’ve created an image that shows a high level overview of a VM versus container versus unikernel.
Thanks for reading, I hope I’ve taught you a thing or two on the concept of containerization.