Big money in Big Data: SGI debuts petabyte-juggling archiving tool
Watch out, Quantum...
The tidal wave that is unstructured file-based data is lumbering towards data centres. SGI is hoping the Big Data trend means that file access storage will become a hot property.
Meanwhile, customers love the idea of all file locations being stored in a single virtual silo, instead of multiple different silos with differing access methods and management facilities. Users accessing their files shouldn't need to know where the data is located; interfaces should be capable of varying their access methods accordingly.
So SGI has taken its Xeon-powered 4U MIS Storage Server, capable of storing 276TB in its disks and/or SSDs, and added software to it, turning it into the SGI InfiniteStorage Gateway and saying: "It uses a file-based interface either through NFS or CIFS access. The underlying file system on the Gateway is SGI CXFS."
The company says that targeted customers include "multi petabyte infrastructures spanning media, life sciences, manufacturing and other data-intensive industries".
Users see a single file storage resource with the InfiniteStorage Gateway taking care of the actual file location. The data mover is SGI's DMF (Data Migration Facility).
SGI InfiniteStorage Gateway
The InfiniteStorage Gateway supports spun-down disk arrays, known as MAID (Massive Array of Idle Disks - tech which SGI bought up back in 2010), tape libraries, object storage and the cloud. The Gateway has an API interface to Scality, which provides its RING object storage, and it will also support other cloud/object interfaces (such as S3, CDMI and OpenStack) in a future release.
SGI has an OEM relationship with Scality. IT managers can determine where data is placed by policy, without impacting users. Users see all files online and seamlessly available.
We view this as SGI providing our different file archiving possibilities:
- Object storage for fast access - it's all online - and virtually unlimited scalability
- MAID for longer latency access
- Tape for low cost and longest latency access
- Cloud for low cost and long latency access.
We don't hear that SGI is offering file migration between object storage and tape - the data encoding and formatting being radically different with these two ways of storing data. On that basis we wouldn't say this is a truly tiered method of storing files, as both object and tape are alternatives to each other.
We could see two tiering routes; Primary data --> SATA nearline data --> MAID --> object storage or tape storage, with cloud as a third end-point in the future. In practice we think that MAID will be an end-point, rather than a way-station for cold data to head into object storage, a tape library or the cloud. That is four archival end-points in all.
Also, it may turn out to be the case that object storage is treated as a form of nearline storage, an alternative to bulk data storage on SATA disk drive arrays, because of its access speed. SGI isn't making any strong recommendations here about which end-points to use and how to view object storage. It's going to depend upon the types of files being stored and the storing organisation's preferences, in terms of data centre space take-up, budget, energy consumption, data growth characteristics and desired data access latency.
El Reg reckons that Quantum's StorNext, which has added its own Lattus object storage capability is a close competitor to SGI's gateway, particularly in its media and entertainment stronghold. StorNext also provides block-level access.
The general idea of providing a single universal file access facility across heterogeneous file storage media types seems a good idea. File virtualisation has failed in the past. SGI and Quantum are betting that Big Data storage needs make its resurrection and re-invention worthwhile.
There appears to be no easier way of combining primary disk storage, a nearline disk tier and a protected file archive while storing data on a spun-down disk, objects, tape or the cloud. Cloud storage gateways (such as Nasuni's) with a large local cache may provide some of this functionality.
General availability of the Gateway is scheduled for 15 June. ®