Size doesn't matter to database thrusting Clustrix

Easily pleased, quickly done

Internet Security Threat Report 2014

Clustrix clustered server nodes loaded with Intel SSDs chew through parallelised database queries in a flash.

At a press event in San Jose, database startup Clustrix introduced its new CEO and described its technology. CEO Robin Purohit joined Clustrix 30 days ago, coming from HP and Mercury, and Veritas before that. His arrival saw the founders Paul Mikesell and Sergeo Tsarev step back from day-to-day operations at the 45-strong company.

Clustrix is a parallelised clustered database (CDS) designed for online transaction processing (OLTP) that runs on a set of servers giving it high-performance and fault-tolerance.

Let's hear it from the company's mouth:

CDS … can handle queries from simple point selects and updates to complicated SQL joins and aggregates. It is optimised for highly transactional OLTP workloads and also works for OLAP queries. The Clustrix architecture can start small and expand seamlessly with business needs to arbitrary scale. Tables can range from 0 to billions of rows in size. Workloads can range from a few to hundreds of thousands of transactions per second. It can handle simple key / value operations to full ACID-compliant transactional SQL.

Clustrix's intellectual property is mainly its Sierra database software but it has also designed server nodes clustered using InfiniBand. These are needed to run the software as this is not a a database that runs on commercial, off-the-shelf (COTS), servers - although they are X86 engines.

Co-founder Paul Mikesell was the founder and director of engineering at Isilon, where he designed, architected, and developed all of Isilon’s products up to the EMC purchase. Also, chief technology officer Aaron Passey comes from Isilon. Where Isilon is clustered nodes for files (unstructured data), Clustrix uses clustered hardware node technology for structured databases.

This is the cutting edge and flash is the blade

According to Clustrix a traditional monolithic database cannot scale "simply by bolting on an expandable storage layer. A distributed storage engine with a traditional planner and execution environment does not allow sufficient concurrency to scale a table to billions of rows and still obtain reasonable performance."

Local queries with local locking

The company needed to bridge the gap somehow between these two positions and saw that data and node locality was the way to go:

The key observation to be made is that local queries can be satisfied with local locking, local data, and local cache. A query operating on local data need not talk to other nodes. Locks on the data structures can be very short lived. Operations on different bits of data can be completely independent and operate with perfect parallelism.

The amount of total concurrency supported becomes a simple function on the number of independent data stores that contain that data. The magic then becomes the engine that ties these independent, high performance data stores into a global single-instance database.

There's a lot of local processing going on that has to be co-ordinated and COTS server engines can't cut it. Hence the somewhat specialised Sierra hardware engines. These use Intel SSDs, not PCIe flash Fusion-io-style. It took three-and-a half-years for Mikesell and Tsarev to get the core software written and the hardware odes designed and specced.

The nodes are 2U enclosures with Intel processors, multi-level cell Intel SSDs and an Intel flash controller inside them.

Map Reduce for structured data

Purohit says: "We do Map Reduce for the structured world," and bring the query to the data, not the data to the query, so to speak. The pay-off, he say, is that CDS is faster, simpler to manage, scales more linearly, and is less expensive than an Oracle alternative,.

He argues that Oracle's Exadata system was not built for internet-scale and is expensive "It costs around $1m versus Clustrix' starting price of $140K." This is not really an apple-for-apples comparison as Exadata is for data warehousing and business intelligence whereas Clustrix is for OLTP, the heart of Oracle's business.

The CDS cluster is effectively an appliance and is managed as a single database instance, according to Purohit.

Fifteen customers are evaluating Clustrix technology and "they have never lost any data." PhotoBox in Europe is one tester, an online backup company in the Mozy and Carbonite class is another. It uses Clustrix to store its metadata, but not the basic backup file data.

Three-year lead?

Clustrix will launch itself next year and will then probably seek a third funding round after the current Series B funding of $12m runs out. Purohit will spend some of this building a sales and marketing team - he reckons Clustrix has a three-year lead over competitors and his biggest issue is growing Clustrix's business capabilities and infrastructure fast enough to generate sales and so both sustain Clustrix' lead and bring it to profitability.

We might see future Clustrix node technology using Intel PCIe flash. The hardware is clever but not that complex; its the Sierra database engine software that does the business. This provides a big enough barrier to the entry of competitors into this space with the InfiniBand-connected, flash-enhanced servers making it that little bit more difficult.

A lesson we can draw; spindle-based storage is no longer fast enough for advanced data-hungry servers which will have 2-tier memories; DRAM with NAND. This is the cutting edge and flash is the blade. It's getting sharper and sharper and will surely become a permanent fixture of server design. ®

Internet Security Threat Report 2014

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
IBM storage revenues sink: 'We are disappointed,' says CEO
Time to put the storage biz up for sale?
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
prev story


Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Three 1TB solid state scorchers up for grabs
Big SSDs can be expensive but think big and think free because you could be the lucky winner of one of three 1TB Samsung SSD 840 EVO drives that we’re giving away worth over £300 apiece.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.