Original URL: http://www.theregister.co.uk/2012/03/27/scality_esg/
Scality opens Ring for close scrutiny
First object storage to be openly performance tested
It seems to be a world first: public and independent assessment of object storage performance compared to hard disk drive (HDD) block-based SAN arrays. The object storage supplier is Scality and the assessor is tech analyst ESG, which finds that Scality's Ring has performance comparable to, if not better than, high-performance file and SAN storage arrays.
Object storage is based on the idea of storing non-block-based data as objects whose address and integrity in the storage space is based on mathematical hash table processing of the binary contents. There is no need for a file/folder system which object storage suppliers say can be slow and inefficient when storing billions of files. However, as ESG notes: "The disadvantage of object storage in the past has been performance, as data retrieval is generally considered slower than with a file system."
Scality's Ring is a scalable object storage system composed of X86-based nodes storing data in a self-healing way, and logically organised in a ring. An app in a server requests an object read access (get) or writes an object (put). In the get case, if the object is not on on the request-receiving node, only one hop across a 10-node ring will be needed to find it. This rises to two hops for an up-to-100-node ring, and three hops for Rings up to 1,000 nodes in size.
ESG says Scality's Ring "delivers high I/O using parallel loads (for lots of small files) and high throughput for large files, but with features traditionally associated with high-end SANs - data persistence with replication and checksum, geo-redundancy, snapshots, automatic tiering, etc."
Its report looked at manageability of the Ring and its ability to recover from a lost hardware component, such as a node, as well looking at actual performance. We only examine the performance testing here.
Scality Ring performance
First, ESG looked at object access, both gets and puts. A 3-server ring with 36 logical nodes delivered 26,274 put objects/sec and 41,573 get objects/sec.
Regarding puts, ESG states: "26,274 objects per second from a three node system is an excellent result for an object-based storage solution and a very good result compared to the I/O per second performance of industry leading block-based dual controller disk arrays."
That sounds as if it is saying that the Ring is faster than a traditional SAN array but it doesn't actually say it. Also ESG only tested up to six server nodes and so we don't know how a 100-node or 500-node Ring would perform.
ESG's chart of Scality Ring response times
ESG looked at response time and its results are charted above, showing that a 3-node Ring needed less than 5ms for an object get, put or delete: "ESG Lab confirmed response times of no more than 7.05ms for gets, puts, and deletes — 10X the performance of traditional architectures — as well as increasing aggregate performance as nodes are added to the RING."
It said: "The excellent response times of a Scality Ring are comparable with a traditional block-based disk array and considerably faster than [other] object-based storage systems that ESG lab has tested."
Using six storage devices per server node and logically partitioning them into two devices, ESG scaled the Ring from three server nodes and 36 logical nodes to five server nodes and 60 logical nodes, and found performance scaled in a straight line, from 41,573 get objects/sec to 60,410, with 385,000 get objects/sec projected for a 24-server node ring with 288 logical nodes.
ESG said: "As each server node is added to a Scality Ring, the overall performance of the system is increased using the CPU, disk, bus, and networking resources of the new server. An object-based Scality Ring that leverages the latest Intel server CPU and SSD technologies can be used to create an object-based storage solution with performance that exceeds the capabilities of a traditional block-based disk array."
ESG also found a five-server node ring delivered 211,424 MP3 audio files simultaneously, with a 128Kb/sec bit stream rate. This equates to an output of 26.43GB/sec and ESG said this "rivals that of high-performance computing systems".
If anything that understates the case. For comparison our records show a DataDirect Networks SFA10K-X delivers 17GB/sec per rack of 4U 60-drive enclosures. A Panasas PAS 2 array does 15GB/sec per rack. Scality's Ring is in fine company and more than holding its own.
All these numbers look good. When we looked at the test configuration though, we were intrigued.
ESG's lab testing was performed with three to five Scality RING server nodes connected to each other via a single 10GbE cable. They were each configured with 24 Intel Xeon CPU cores, 24GB of RAM and six 600GB Intel SSD drives. The software divided each SSD into two partitions, which created twelve I/O demons (software nodes) per server node.
ESG's Scality Ring configuration
The general object storage system, like Scality's Ring 1, has a disk-drive based design. Yet, in this ESG test, each Ring server node was a flash-based object storage node, not a disk-based one, yet its performance was "a very good result compared to the I/O per second performance of industry leading block-based dual controller disk arrays."
Why did Scality and ESG choose to compare an SSD-based Ring with HDD-based drive array?
Scality's CEO, Jérôme Lecat, said: "We are convinced that, contrary to standard opinion, our object-based storage is indeed faster than SAN for parallel load, but how could we prove it?
We studied how others (SAN, NAS, Scale-out NAS) reported performance numbers, and we found that in the way they have designed their test, most of the IOPS performance comes from RAM/Cache/SSD, not from the access to HDD. All storage systems have some controller memory and some mechanism for cache and/or tiering. With some optimisation, it is easy to make a test mostly hit that portion of the system rather than the disks.
Isilon, a scale-out NAS which is comparable to us in many ways, had the same approach when they did their famous IOPS world record with ESG (pdf) a year ago. You can read on page 6 of this report that [the] average response time was less than 3ms [which] is too short for reading data from a 10,000rpm disk.
He said that assumes that the industry at large assumes "our software, with its totally scalable distributed meta-data architecture, must introduce a significant delay. Once people understand our architecture, they do not challenge us on the power of parallelism, but on the latency cost for atomic operations. We decided that testing on SSD was the best way to measure the delay inherent to our software."
"The ESG Lab test successfully demonstrate that our system can write or read objects to/from SSD in less than 7ms, and that this number is very stable with load and with the addition of nodes. From this test, and from our production experience, we can extrapolate that a two-tier architecture only built with HDD (no SSD) would do an average 40ms for reads with 7,200rpm disks, and only 35ms with 10,000rpm disks, similar to those used by Isilon for their own ESG tests."
Lecat said DataDirect's Web Object Scalar DDN WOS2 had a 40ms latency on its HDD operations. He thinks that, by using SSDs, Scality could lower the latency significantly. Also, analysis showed that most of the latency came from the Ethernet network and not from Scality's software in the server nodes.
It could take the SSD latency down further, possibly to 3ms, with Infiniband node-to-node links rather than Ethernet, but Lecat said: "The reality is that, for most file applications, 40ms of latency is totally acceptable, and for those who need a lower latency, putting in some SSD is not a concern."
Yes, SSD would boost real-life object storage performance but it isn't generally needed, as far as Lecat is concerned: "We agree that an all SSD storage would not make sense at petabyte level ... Typically, in a petabyte scale environment, having just 5 per cent of the capacity in SSD greatly improves the performance, at a very reasonable cost. This being said, we only recommend SSD when the applications requires response time faster than 40ms."
Out goes Symmetrix
He cites a customer, Time Warner Cable, that has replaced an EMC array with a all-HDD Scality Ring: "Our Ring is being deployed as primary storage at Time Warner Cable for their consumer email platform entirely on HDD. No SSDs are required for this highly interactive application where we are replacing an EMC Symmetrix with over 1PB of data."
Lecat sums up storage system performance and how Scality's Ring compares like this: "There are really three measures of performance: IOPS, throughput and latency. Very often, it is assumed that a storage system which has one has the other, but they are three totally different metrics. Our architecture has excellent IOPS and throughput due to its entirely parallel design, and we do not even need SSD or fast HDD for that, nearline-SAS is enough. Our weak point is latency. The ESG test proves that we don’t do to bad on our weakest point, actually we even do very good as far as unstructured data applications are concerned."
"For virtual machines and relational database [applications], latency is a real issue since they have many serial operations. We leave this market to the likes of Pure Storage and SolidFire."
There's more performance testing that could be done, such as an SPECsfs benchmark, and a look at much larger server node numbers, but there's enough in the ESG report to show that the Ring object storage system is on a level with traditional SAN arrays and can serve tens of thousands of small objects, such as MP3 files, faster than high-performance computing arrays. The idea that object storage is slower than traditional filer and SAN arrays can be put out to grass. ®
1. Scality's Ring is implemented with a two-tier architecture, with the first tier using replication as a data protection mechanism, and leveraging fast disks for performance, and a second tier using erasure code technology to protect most of the data. Tier 2 uses cheaper SATA or nearline SAS disks.
The performance of the tier 2 disks in Scality's Ring comes from heavy use of parallelism, and is helped by the Erasure Code technology implementation having no penalty on read. The index of data on each node of tier 2 storage resides in memory. With this architecture, 10 per cent of the data, which represents 80 per cent of the requests, resides in tier 1 and provides less than 10ms object delivery time, while 90 per cent of the data representing 20 per cent of the requests resides on the cheaper second tier and delivers objects in 40ms with nearline SAS drives.