This article is more than 1 year old

One bit to rule them all? Forget it – old storage types never die

Pups' guide to running with the storage pack

Turtles all the way down

Typically, file storage is created on top of block storage, mediated by an operating system. In addition to storing "regular" files, however, file storage can store Virtual Hard Drives (VHDs). VHDs are (usually quite large) files that are made available to a computer as a form of virtual block storage.

These VHDs can be presented to the host operating system by a hypervisor (local or networked) or to a networked server as block storage. Those systems will then typically either overlay a file system or object storage system on these virtual hard drives.

The end result for today's virtual infrastructure is usually (at a minimum) a file system on top of virtual block storage from a VHD on top of a file system on top of the basic object storage of the disks. We'll add more layers of turtles later.

Creating a distributed file system poses a few problems. The first is that many files are huge. Do you replicate the entire 400GB movie file because someone updated a 4KB tag in the file?

Also: file systems are complex. They have security permissions and often contain the concept of "locking” – because one program is busy using a file no other programs may try to use it. This is important because file systems are responsible for the presentation of actually usable data to the operating system, applications and end users. You can't have one application clobbering a file while another is using it.

So to write a file-based replication system you need to both run the replication service with a high enough level of privilege to ignore any locking (usually at least somewhat dangerous) and you need to be able to detect changes to pieces of files rather than entire files.

This usually means some awareness of the underlying block storage. The end result being that any file system replication that is designed to be able to replicate large, in-use files (usually VHDs) ends up being part file system replication, part block storage replication.

Object storage

Objects are a completely different way of looking at things. Instead of storing blocks, object storage stores objects. Duh! A block is a fixed size. There are a given number of them on a drive. Objects can be (almost) any size, and the number of them on a drive is determined by the space of the drive and the distribution of the objects.

If that sounds a lot like files, you're not wrong. The big difference is that files have a complicated index with a hierarchical structure, complicated permissions and so forth. Most object stores do away with most of that. I realise that's ambiguous, but just as different file systems support different features, object stores differ in their feature sets too.

Objects have to have at least a few characteristics. A Globally Unique ID (GUID) for the object that is being stored. The size of the object and the physical position of the object on the drive are also necessary.

This is really hand for developers. Developers create their own index of what object is what inside their application. My "bachelor party funny face limo photo" is GUID deb17e15-d47c-449f-b1b0-4d553e7d143f. The developer can then categorise and index that photo any way they want and assign whatever permissions they deem fit at the application level.

All developers need to know is the GUID, and they issue developer friendly GET or PUT requests to the object storage system. No fussing about with locks, or permissions or ancient file system access code written just after mercury delay lines stopped being a thing.

Next page: Cross comparing

More about

TIP US OFF

Send us news


Other stories you might like