Storage

This article is more than 1 year old

Docker and storage – solving the problem of data persistence

Pull on your vendor hoodie - it's a casual affair

Mon 18 Jul 2016 // 08:06 UTC

In June I was in in Seattle, the home of Starbucks, Boeing and Microsoft, for DockerCon 2016. Compared to the normal events I attend, this one promises to be a more “casual” affair, so I sported polo shirts and vendor hoodies as my standard attire.

One of the more interesting problems yet to be fully solved with container technology is that of persistent storage. When containers first appeared, the immediate assumption was they they would be ethereal in nature and move around the infrastructure at will. If a container/application needs to be moved, simply re-instantiate it elsewhere. Well it turns out that there are two problems with this: first, people like to have a degree of persistence with their containers (more on that in a moment) and second, application data has to reside somewhere.

Pets, cattle and pets again

The big comparison made between virtual machines and containers is that of pets versus cattle. VMs are pets to be nurtured, maintained and looked after; containers are cattle that can be culled at whim, to be replaced by another in the herd. This is an initially good analogy, apart from the obvious fact that as we provision containers they have to be configured and mapped to our application, including configuring security and network settings, storage and other permissions or access application data or other parts of the application hierarchy.

This means now we go back to maintaining pets again, but this time our pet is a set of configuration files that explain how to orchestrate the application rather than an infrastructure-centric representation of that application.

The result is that people like containers to hang around longer than initially expected because maintaining deployment manifests takes effort.

Data in or accessible by the container

So when it comes to data, should we put data (resiliently) into the container or should we have more persistent data repositories (possibly on VMs) that the containers access? In its purest form, we should be aiming for the former, but that’s a lot more work than moving the stateless part of the app (like the web server) into a container while keeping the data in a more traditional format.

Both issues present us with a problem. Data has inertia and latency makes it difficult for applications to access data over distance, unless the access protocols for that data are specifically latency tolerant. We’re already seeing some solutions come to market to answer these problems, including ClusterHQ with Flocker, Portworx, Hedvig and StorageOS (which launched in beta at DockerCon).

In terms of requirements, we need to fix the ability to move data (the container) from one location to another – and ensure permissions are correct so as to not expose data to the wrong application. We need to maintain integrity, if data is moving around and in transit, while being accessed. Of course we also have to back data up and ensure we can restore it, wherever the application resides in the future.

The architect’s view

I’m looking forward to getting into some detail on how persistent data is being managed. Storage is probably one of the last (big) container problems to solve and the issues are the same as they’ve always been. For the enterprise to adopt containers and Docker, operational issues around storage need to be fixed. I’m hoping we see start seeing some answers.

Topics

Special Features

Vendor Voice

Resources

Storage

Docker and storage – solving the problem of data persistence

Pull on your vendor hoodie - it's a casual affair

Pets, cattle and pets again

Data in or accessible by the container

The architect’s view

More about

More about

Narrower topics

More about

More about

More about

Narrower topics

TIP US OFF

Other stories you might like

How would you sum up a decade of Kubernetes?

SUSE's Captain Container on sailing the open source seas

From browser brat to backend boss: Will WASM win the web wars?

Protecting distributed branch office environments from ransomware

Kubernetes' Tim Hockin on a decade of dominance and the future of AI in open source

Incus 0.1 is Canonical's LXD 'containervisor' with Ubuntu integration stripped out

If the Linux Foundation was a software company, it'd be the biggest in the world

D2iQ's AI Navigator ready to answer your deepest cloud concerns

Behold, Incus: Check out this fork of Canonical's LXD 'containervisor'

Free-Teams-gate: Docker apologizes for shooting itself in the foot

Microsoft 'fesses to code blunder in Azure Container Apps

Alpine Linux 3.18 fixes DNS over TCP issue, now ready for all the internet's problems

About Us

Our Websites

Your Privacy