IBM parks parallel file system on Big Data's lawn

Original URL: https://www.theregister.com/2012/05/21/ibm_general_parallel_file_system_3dot5/

Mirror, mirror on the wall, who is the fattest of them all?

Posted in Storage, 21st May 2012 15:03 GMT

The IT universe is seeing a massive collision taking place as the worlds of high-performance computing, big data and warehousing intermingle. IBM is pushing its General Parallel File System (GPFS) further to broaden its footprint in this space, with the 3.5 release adding big data and async replication features as well as customer metadata and more performance.

GPFS is a large-scale file system running on Network Shared Disk (NSD) server nodes with the file data spread over a variety of storage devices and users enjoying parallel access. We got the GPFS 3.5 news from Crispin Keable, IBM's HPC architect based at Basingstoke.

The new release has Active File Management, an asynchronous version of the existing GPFS multi-cluster synchronous replication feature, which enables a central GPFS site to be mirrored with other remote sites, where users then get file access to the mirrored at local instead of wide area network speed. The link is duplex, so updates at either side of it are propagated across.

If the link goes down, the remote site can continue operating using the effectively cached GPFS data. Any updates are cached too, and as a way of preventing old data re-writing more recent data, the update of the central site from an offline remote site coming back online can be restricted to data newer than a pre-set date and time.

One thing to bear in mind is that there is no in-built deduplication in GPFS. If you wanted to reduce the data flowing across such a mirrored link you'd need something like a pair of Diligent dedupe boxes either side of it, or else use some other WAN optimisation/data reduction technique.

RAID and Big data

In petabyte-scale GPFS deployments there can be a thousand or more disks – and disks fail often enough for a RAID re-build to be going on somewhere in the deployment all the time. This limits GPFS performance to the performance of the device upon which the rebuild is taking place.

Keable says that, in de-clustered RAID, the NSD servers farm out GPFS to clients and have spare CPU capacity. They can use this to run software RAID. GPFS deployments can have data blocks randomly scattered across JBOD disks and this provides a stronger RAID scheme than RAID 6, says Keable. The big plus here is that it spreads the RAID re-build work across the entire disk farm, which helps the GPFS's performance to rise. Keable says this feature, which is a block-level algorithm and so capable of dealing with ever-larger disk capacities, was released on POWER 7.

He said IBM expected GPFS customers to use flash storage with de-clustered RAID "to hold its specific metadata – the V-disk as it's called."

GPFS is pretty much independent of what goes on below, the physical storage.

GPFS 3.5 can also be run in a shared-nothing, Hadoop-style cluster and is POSIX-compliant, unlike Hadoop's HFS. Keable says GPFS 3.5 is big-data capable and can deliver "big insights" from what he termed a "big insight cluster". This release of GPFS does not, however, have any HFS import facility.

Fileset features and metadata matters

Prior to GPFS 3.5, a sysadmin could take part of a GPFS file system tree, a fileset, and put it on a specific set of disks to provide a specific quality of service, such as faster responses from a set of fast Fibre Channel drives. The filesets can be dynamically moved without taking the filesystem down and the sysadmin can move data across disks' tiers on a per-day or some other time unit basis.

The fileset has an "i-node" associated with it – an i-node being a tag and a block of data – which points to the actual file data and contains metadata such as origination date, time of first access, etc. The GPFS stored all the fileset metadata on one system. With 3.5, the fileset metadata is no longer mixed but separated out and this has enabled fileset-based backup, snapshot, quotas and group quota policies to be applied. Previously backup policies were applied at the filesystem level, but now, Keable says, "We can apply separate backup policies at the fileset level. It makes the GPFS sysadmin's job easier and more flexible."

Because of this change GPFS has gained POSIX.0-compliance, which means the i-node can contain small files along with their metadata. So you don't have to do two accesses to get at such small files – for example one for the i-node pointer and then one more for the real data – as the i-node metadata and small file data are co-located.

It gets better. A customer's own metadata can be added to the i-node as well. Keable says you could put the latitude and longitude of the file in the i-node and enable location-based activities for such files, such as might be needed in a follow-the-sun scheme. You could do this before but the process was slow as the necessary metadata wasn't in the i-node.

GPFS object storage and supercomputing

A UK GPFS customer said that this opened the way for GPFS to be used for object storage, as the customer-inserted metadata could be a hash based on the file's contents. Such hashed files could thereby be located and addressed via the hashes, effectively layering an object storage scheme on to GPFS.

We also hear GPFS is involved with the Daresbury supercomputer initiatives. There are broadly three systems at Daresbury: a big SMP one, a conventional X86 cluster and Blue Gene – with some 7PB of disk drive data. GPFs underpins this and fronts a massive TS350 tape library with 15PB of capacity.

GPFS is a mature and highly capable parallel file system that is being extended and tuned to work more effectively with the increasing scale of big data systems as the worlds of scale-out file systems, massive unstructured data stores, high-performance computing data storage, data warehousing, business analytics and object storage collide and mingle, causing an intense and competitive development effort to take place.

IBM is pushing GPFS development hard so that the product more than holds its position in this collision – in fact it extends it. ®