Just got back from EMCWorld2016 this week but on the way there and back I was perusing the FAST’16 papers. One of the papers I read (see Slacker: Fast Distribution with Lazy Docker Containers, p. 181) discussed performance problems with initializing Docker container micro-services and how they could be solved using persistent, intelligent NFS storage.
It appears that Docker container initialization spends a lot of time provisioning and initializing a local file system for each container. Docker containers typically make use of an AUFS (Another Union File System) storage driver which makes use of another file system (like ext4) as its underlying storage which has beneath it either DAS or external storage.
When using persistent and intelligent NFS storage, Docker can take advantage of storage system snapshots and cloning to improve container initialization significantly. In the paper, the researchers used Tintri as the underlying persistent, enterprise class NFS storage but I believe the functionality that’s taken advantage of is available with most enterprise class NAS systems and as such, is readily available with other storage subsystems.
Docker containers in action
It appears you do 3 things with Docker containers push, pull and run. I understand run as executing a container’s functionality. Push is a way to create or publish new Docker container images and Pull is a way to fetch published Docker container images from a centralized repository.
Docker containers access storage in two ways:
- Mounted host files/directories – these can contain persistent functional storage required by the container service. For example, a containerized compiler could have source code on mounted host files/directories which is compiled and generates object code output onto other (or the same) mounted host files/directories
- Docker layers – these contain OS, application and associated libraries that are required to build the container executable functionality. These would typically take the form of Linux and containerized application binaries which are executed by the container. In the example above, the Docker layers for the containerized compiler would contain the Linux kernel and the compiler executables.
Diff and ApplyDiff Docker functions are used during the push and pull processes to generate images and build images for execution respectively.
Essentially a Docker container image consists of many layers of TAR files of executable functionality. DIFF is used to construct these layers of functionality that compose the containerized application images. ApplyDiff takes these layers of functionality and applies them in a proscribed sequence to build up the containerized application so that it can be executed. The Docker layer storage contains all of these layers that are used by ApplyDiff to initialize a containerized application.
Once built, a developer pushes an application image to a central repository, and others will pull this containerized application from the repository to run it.
Benchmarking Docker pushes, pulls and runs
The research team created a HelloBench benchmark to test Docker push, pull and run container operations. The HelloBench monitors container startup time and run time for simple operations. HelloBench used 57 images of various Docker applications that were available as of June’15 and pushes, pulls and/or runs them automatically. These 57 images share a number of layers such as Linux distro’s. Layers are designated as root or leaves and are counted as nodes. In the HelloBench benchmark there were 550 nodes and 19 roots.
During a typical HelloBench run, pushes take ~70 seconds, pulls ~20 seconds and runs take ~6 seconds. Pushes are only done by developers, pulls and runs are done by anyone deploying a Docker application. Given their data, a typical pull-run process takes ~26 seconds of which ~77% is the pull processing.
Of course, for developers that iteratively enhance or generate new Docker containerized applications, the push process takes 70 seconds.
The researchers spent some time looking at the layering of typical Docker containers and provided distributions for number of files, number of directories, and bytes of data in files for typical Docker container mass distribution images. They concluded from their analysis that over half of the bytes of a distribution are at a depth of layer 9. It turns out that AUFS performs best when all image data is at the top layer and performs worst with image data that is deeper down in layers.
Slacker storage for Docker containers
To implement Slacker the researchers created a storage-driver plugin that utilizes external NFS storage that was shared between all Docker daemons and registries. In Slacker with Tintri, Docker images are represented as read-only snapshots. Registries are essentially transformed into name servers that associate image metadata with corresponding snapshots. Docker pushes involve the creation of snapshots and updating image metadata to point to data snapshots rather than moving data around. Docker pulls involve inverse actions making clones of these snapshots for the local container.
It turns out that Docker containers only lazily fetch layer data for pull operations, on an as needed basis. In Slacker, Docker AUFS uses a separate ext4 file system which are then backed by NFS server files. The loop0 and loop1 facilities are a sort of loop back which converts ext4 operations from disk IO to NFS file calls.
Slacker makes use of “snapshot” and “clone” RESTful primitives provided by the Tintri storage system to create and utilize Copy on Write snapshots for image layers. Snapshot makes a read-only copy of a NFS file and assigns a unique snapshot-id. Clone creates a NFS file out of a data snapshot using the snapshot-id. Snapshot and clone are used in the Diff and ApplyDiff processes for Docker push and pull operations. Also the loop0 and loop1 layer contains a snapshot and clone block cache layer so that reads can optimize use of snapshot contents to satisfy reads of unmodified data.
Slacker also made use of “flattened” image layers. Because layers didn’t take any more space (essentially TARed snapshot-ids) they were able to flatten an image layer so that it contained all ancestral image data in one snapshot so that they wouldn’t have to transverse multiple layers to create the executable.
For instance, at push time, Slacker creates a snapshot of the NFS files that consists of the flattened image layer and embeds the snapshot-id in a TAR file for the registry. At pull time, Slacker receives the snapshot-id from the registry which it can then use to clone a new NFS file that contains the data from the image.
Slacker can also make use of the same snapshot for all pushes that use the same image layer, with some provisos. Slacker also does lazy clones that only clones a snapshot and uses it to initialize the file system when the code in that image is actually executed.
Although the researchers managed to do all these changes within their storage-driver plugin a few other enhancements outside their plug-in would have made life even better. One change they would have liked to see for compatibility purposes is different image layer types so that they could establish a Snapshot-id layer type for the registry rather than having to bother with TAR files. A second change was some mechanism to indicate that a layer had been flattened that is it contained all the data needed for the image and as a result there was no need to fetch ancestor layers.
The results were significant, for Docker deployments (pulls & runs) the median speed up was 5.3X and for development activities the median speed up was 20X. This was using the same Tintri storage for the normal non-Slacker storage as well as the Slacker storage so backend storage performance was leveled across both sets of timing. As such, all we are seeing is the speedup when making use of snapshot/clones, flattened images and lazy cloning.
It turned out the pushes and pulls were significantly faster but runs were not (due to lazy cloning of snapshots seen in the chart). They then tested the performance of long running containers by using a PostGress and Redis benchmark which ran for 5 minutes of activity using Slacker and non-Slacker AUFS storage. They found for long running containers, that once the pull process completed, actual run time was roughly equivalent between the native AUFS and AUFS with Slacker behind it.
So, with relatively minor modifications plus using standard, enterprise class NAS snapshot and clone services, it appears that we can speed up Docker deployment by over 5X and Docker development activities by 20X.
That’s great news. At one time, I was told that Docker didn’t have need for persistent storage. Now it seems that enterprise storage can be used to greatly speed up Docker development and use activities.