Baa NooBaa black sheep, have you any storage?
Yes sir, yes sir, three bags full sir... stuffed with the ashes of Exanet
NooBaa sounds like a lamb in a child's fairy story or one of those wacky new-style web properties offering on-demand hair dressing, garden tool sharing or a cocktail recipe exchange. In fact, it's a scale-out object storage startup offering what it calls frictionless storage for unstructured data.
The company came out of stealth at VMworld with the launch of free Community Edition software. It was founded in 2013 by CEO Yuval Dimnik and CTO Guy Margalit. It has offices in Rehovot, Israel (R&D) and Silicon Valley (marketing) and picked up $922,500 in funding this year, being VC-backed by JVP, OurCrowd and angels – prominent industry leaders. We have no details of any previous funding.
Dimnik was an Israeli Defense Force project leader, then at IBM before becoming a software engineer, then R&D Team leader and ultimately support director at file storage startup Exanet in Israel. This crashed and was bought by Dell. Dimnik became Dell's FluidFS System Engineering Director. He left in June 2013 to co-found NooBaa.
Margalit came out of the Israeli airforce as a software architect and became an R&D Core Team leader at Exanet, and then a clustered NAS software architect at Dell following the Exanet acquisition.
They want to make using object storage seem like using an Amazon service.
There are three components to NooBaa's software, and they run in virtual machines:
- Core – central controller
- Storage daemon – node
- Access node
The Access nodes and storage daemons constitute a data plane, with the core providing a separate control plane.
Essentially, a storage node can be any server with local storage and NooBaa says its system can use idle, unallocated storage on the server nodes. In effect it uses stranded, in-place storage and so, initially, doesn't need – or more accurately, may not need – any purchase of new storage hardware.
The applications carry out S3-based IO on storage nodes, looking up on the core to see where data is located and where written data should be sent. The core maintains the overall system status, responding to requests from access nodes and tabulating storage node availability, performance and status via heartbeat messages sent to it by the storage nodes.
So study this diagram for a minute or two and then carry on reading...
NooBaa component scheme
There is no hardware dependency, NooBaa says, and it can use capacity found anywhere on the network.
The storage in NooBaa storage nodes is pooled into a single resource, NooBaa saying there can be thousands of storage nodes; up to 20,000 have been tested. These nodes are not clustered.
Data is managed in buckets – virtual volumes. There are lots of files in each bucket. Incoming data is divided into chunks with three copies (by default) stored on different nodes for resilience. The data is globally deduplicated using a sliding window, compressed, and encrypted.
The storage nodes send heartbeat messages to the NooBaa core, which uses them to synthesize a global view and drive data placement. There is a complicated placement algorithm, embracing on-premises and public cloud storage, which is optimized for performance, with data localization a factor.
The algorithm has the concept of a data access rate heat map. High-performance data is mapped to faster resources. It will also take on board cost hints from customers, and will tend to fill up lower-cost resources first. There are N-way tiering policies and machine-learning techniques are involved in the automation activities.
CMO Mike Davis said, "We can automate DR. We'll find the DR location." He also said: "We measure disk latency ... We directly measure performance, we know about network connectivity," and the heartbeat messages contain more than just "I am operational" signals.
The core contains a metadata store, a key:value store, and it supports sharding and replication and is equipped for high availability (HA).
A CloudSync feature lets you create a copy of your data in the public cloud that's in the file's native format in S3; a migrate out facility. This is a two-way street; you can migrate in to NooBaa from S3.
NooBaa says it can be deployed in 15 minutes, even off a USB key, with minimal configuration needs. You bootstrap and reclaim existing unused storage. That storage can be on any hardware or even a Google node. Davis said, "Cloud resources are fungible with on-premises ones ... We really don't care. They are capacity resources with different properties."
The free software, which is not HA, covers up to 20TB of storage and deployment needs three storage nodes as a minimum. Any more than 20TB and you will need a pay-for-use enterprise license. The enterprise edition will have HA.
There is no POSIX file system and no NAS interface. Davis says we will not see an appliance from NooBaa, nor a hardware compatibility list. "We'll always have the lowest hardware cost. We'll beat Ceph ... we'll tell customers to get any array when they need hardware." The idea is that it helps the channel open source its own hardware.
Erasure coding is a road map item. So too is an intention over time to focus on entertainment and media and the life sciences markets.
We're left pondering this question: does the world need another object storage startup? The answer to that is going to depend upon the excellence of NooBaa's vision and code and how users respond to its functionality and freemium business model. Good luck Noobers. ®