Volume all the (docker) things – Recursive Erudition

I’m starting to get the hang of the Docker thing. I’ve been doing it _just barely_ long enough to see some change, and I’ve read material over a longer period. I think it’s important (from an architecture planning perspective) to constantly keep in mind that Docker is a set of very rapidly changing abstractions over a set of solid long-term base functions. Often the abstractions really suck for a while, since they haven’t truly figured out what things are going to look like in the end.

So we have lots of instances of:

In the past we used this crazy workaround for a feature being missing <——> now we have a half-baked abstraction that may or may not be better than the crazy workaround <——> when we get to use docker 1.9 (or 2.5 or whatever) it will make sense in this way better way.

Logging is one of those that is probably pretty solidly fixed in docker 1.9.

In networking, the new shape in docker 1.9 looks awesome, so by docker 1.11, it should be solid.

Volumes are just starting to get a real picture laid out for us.

So for volumes, we have four ways that docker manages volumes:

Host mount (i.e. -v /home/josh/folder:/dock/app/folder): this completely makes sense, and is solid. The only problem with it is that it doesn’t allow any sort of clustering tools to manage/move the containers around, since the volume is outside of the management scope of docker. It’s a host volume, so it’s tied to the host. In the tight scope of our needs for doing Postgres, I think this is the option we should stick with. It just makes things explicit, and we don’t need to move our containers.
Anonymous volumes: (i.e. -v /dock/app/folder or VOLUME /dock/app/folder in dockerfile). This creates a randomly named folder somewhere in /var that is mounted into the container. If you don’t docker rm –v the container when you are done with it, this folder will get orphaned, and you’ll leak disk space. While you can use docker inspect to find the folder, there is no real tooling for working with the volume or managing it’s lifecycle. By themselves, these volumes aren’t really useful. They have some performance and reliability implications, but really they are more of a hole in the abstraction, than an operation tool in their own right.
Volumes-from: Using #2, we can create a container that has anonymous volumes, but give them a name/handle because they have a container that they are connected to. That container shouldn’t even be running, it should have been run and stopped or just by using the docker create command. Running the second container with —volumes-fom will have the storage from the first container be used as the persistent location for files from the running container. This idea is more compatible with docker cluster managers and portability, but it’s also awkward. You can’t create a data container with Docker Compose. You end up with IMPORTANT stopped containers lying around on your host (a lot of scripts that are out there for cleaning up orphans just delete all of your stopped containers…). For a long while, this has looked like the ‘docker way’ of managing persistent storage, but now it appears that things are changing…
Named volumes (i.e. -v myvolume:/dock/app/folder) in Docker 1.9, a cluster of new ‘docker volume’ commands showed up, and the ability to give an ‘anonymous volume’ a name showed up, and volume drivers are in there too. I wouldn’t dream of relying on this functionality yet, as it seems like things are rapidly changing, and theres a pile of bugs and feature requests on github on the subject, but this is a clear abstraction and vision for the future of volume management. In the future, there will be some sort of separate storage server, with it’s own management tools. You’ll be able to cluster and cache and backup volumes through that tool (probably there will be a number of competing solutions), and you will just run your container with it’s volumes specified by name and connect to the volume driver and it will be automagically managed for you. If you need to move containers around, the volume storage server will make sure your data moves too. We should absolutely use this. Next year or sometime.

TLDR: let’s just use host volumes.