Feeds

Storage startup busts object location barrier

Scality's Ring cycle: One Ring to find them all

High performance access to file storage

File system trees are inefficient and slow when locating files in a filespace occupied by billions of files and folders. Storing the data as objects in a flat storage space is becoming a recommended alternative. But, as soon as you go for object storage to defeat this file system tree traverse problem, you face a fresh problem: how do you locate your objects?

Either you have a central object map or database or you have a distributed one. French startup Scality has gone for the distributed approach with its Ring technology.

The idea is to have virtual unlimited scalability, both of I/O and storage capacity, by using clustered commodity X86 servers organised as peer-to-peer nodes – conceptually occupying  a ring – with front end Accessor software nodes receiving requests from users and applications on servers.

Scality CEO Jerome Lecat says an Accessor node can access any Ring node, note the "any", and find the right node storing a requested object in one network hop with 10 nodes, two hops with 100 nodes, and three hops with 1,000 nodes.

Scality Ring technology

Holy Trinity in Scality's Ring technology: Accessors to the left, the Ring in the middle and secondary storage to the right.

A variety of Accessor node technologies are supported: native REST HTTP, NFS, BRS2 and Zimbra.

With each 10X increase in the Ring node count, the hop count goes up by one because of Scality's patented technology and its algorithm. We might call this a quite peculiar Ring cycle.

Lecat said: "There are really two 'tricks' here. [First] an algorithm delivering a maximum of Log(n) complexity – which basically gives one a 100-node network. Each node needs to know seven nodes, and a request may take seven hops. The minimum requirement from a mathematical standpoint is for each node to know a few other nodes. The number of nodes increases as Log(number of nodes), which means that when the number of nodes is x10, you need to add 1 to the number of nodes to be known, or number of hops.

"[Secondly] in practice, we allow nodes to know many more nodes, but this acts as a 'non authoritative cache', and it allows for a request to 'usually' converge in two hops, while keeping all the mathematical properties of the model (Log complexity, limited number of hops, good behaviour when a node is lost or added)."

Each node can handle 10 to 50TB of storage, with 1,000 nodes supporting up to 50PB of capacity, and accessing the right object in that 50PB with three hops on a gigabit LAN takes 20ms or less.

Distributed hash table

How does that work? Scality documentation says that the Ring nodes are organised into segments. Objects are stored with a Distributed Hash Table (DHT) algorithm, which produces a value for the object and its associated key. Key and value pairs are stored in the DHT and nodes retrieve the value associated with a particular key. Responsibility for maintaining the mapping from keys to values is distributed among the nodes. Keys embed information about class of service, and each node is autonomous and responsible for consistency checking and rebuilding replicas automatically for its keys.

We can think in terms of Scality's Ring nodes crossing a key space. This is organised into a hierarchy such that a 10-node ring requires one node-to-node hop to find the target node, a 100-node ring needs two hops and a 1,000-node monster needs three hops.

Lecat says: "The key space is distributed among all the nodes. The key space is very large (20 bytes), and distributed nearly evenly, but never exactly evenly. The underlying algorithm is a distributed hash table. The 'segments' do not have a constant size (as everything has to be dynamic in the system to allow real elasticity).

"Two key properties of the key space are that keys have an order, and they are organised into a circle (which gives trigonometic properties)."

Let's take a 10-node Ring as an example. An Accessor sends in a object retrieval request to node 1, which doesn't have it. We're told the object can be retrieved with one hop, a jump from node 1 to the right node. Node 1 has enough information to send the request on to the right node, the one that holds the object, and so does every other node in the 10-node ring: that's how a distributed hash table works.

Scality doesn't say in detail how this works. I think it is a variation on this concept: each node has an ID and nodes are organised in a ring, a double-linked list, with each node having a reference to the previous node on the ring, its address, and the next ring node, and its address. Nodes going round the ring have successively greater node IDs until you return to the starting node.

Okay? Keep that in mind and let's move on to the request receiving node, which gets the key from the Accessor request and hashes it to generate a key of exactly the same number of bits as the node reference. The system uses this as a node ID and goes round the ring node by node, looking for a node ID that is the closest possible to the key hash while still being larger. That node should store the desired object.

A reversal of this is used to store incoming objects on the Ring and ensure they are locatable.

Lecat said: "If a node is lost, the ring rebalances itself without human intervention. [It's the] same if a ring node is added (human intervention needed to decide to add a node), the new node is automatically placed well in the key space, and rebalances only occur when necessary and automatically."

To understand any more than this requires a computer science skill set and access to the Scality Ring designers.

High performance access to file storage

More from The Register

next story
Seagate brings out 6TB HDD, did not need NO STEENKIN' SHINGLES
Or helium filling either, according to reports
European Court of Justice rips up Data Retention Directive
Rules 'interfering' measure to be 'invalid'
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
Cisco reps flog Whiptail's Invicta arrays against EMC and Pure
Storage reseller report reveals who's selling what
Bored with trading oil and gold? Why not flog some CLOUD servers?
Chicago Mercantile Exchange plans cloud spot exchange
Just what could be inside Dropbox's new 'Home For Life'?
Biz apps, messaging, photos, email, more storage – sorry, did you think there would be cake?
IT bods: How long does it take YOU to train up on new tech?
I'll leave my arrays to do the hard work, if you don't mind
prev story

Whitepapers

Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.