Feeds

SciDB: Relational daddy answers Google, Hadoop, NoSQL

Stonebraker doesn't drop ACID

HP ProLiant Gen8: Integrated lifecycle automation

Mention "relational databases" and a few people's names might spring to mind: Oracle's Larry Ellison, thanks to his billions, or Monty Widenius, main author of the ferociously popular MySQL. Geekier types might plump for Oracle's former Dr DBA Ken Jacobs or open-sourcer Brian Akers, who helped architect MySQL.

Michael Stonebraker's name probably doesn't jump very high in many minds outside computer science, yet it was Stonebraker's quick thinking 40 years ago that paved the way for the industry these better-knowns call home.

A faculty member of the University of California in Berkeley, Stonebraker seized on the research of IBM mathematician Tedd Codd to start work co-developing the industry's first relational database in 1973: Ingres. Ellison wasn't around until 1977, while it took lumbering IBM, owner of the mighty DB2, another eight years before it had something.

Ingres fed into Sybase as Sybase founder Robert Epstein was one of those who worked on Ingres, and then Microsoft's SQL Server. Through his various teaching positions, Stonebraker has also schooled CEOs, CTOs, founders and vice presidents of engineering at VMware, Sleepycat Software, Tibco, Oracle, Documentum, Alfresco and Cloudera. Along the way, Stonebraker found time to deliver Postgres, Mariposa, Aurora, C-Store and H-Store and help found startups to sell and support them: Illustra Information Technologies, Cohera, Streambase, Vertica and VoltDB.

After 40 years, though, Stonebraker finally thinks it's no longer a "one-size-fits-all" world and that there could be more to life than just relational. His latest work, SciDB, is going post-relational to serve the needs of those working with big data - large volumes of information crunched on thousands of nodes in distributed data centers.

"In the 1980s, the 'answer' was if all you wanted to do was business data processing, then it was relational databases. Try to stretch SQL to do everything, though, and that's an unnatural act."

And as this pioneer from the past keeps working, he's come into conflict with those on today's leading edge - the NoSQL movement - as he's put the sacred cows of the Web 2.0 crowd in their place for cheaply sacrificing the benefits of the relational technology he pioneered.

As autumn kicks in, the 66-year-old MIT adjunct professor is on the cusp of releasing the first code under open source for SciDB, a collaboration with long-time colleague Dave DeWitt.

SciDB is Stonebraker's big-data analytics play in an era of Google's MapReduce, Apache Software Foundation's Hadoop and the NoSQL evangelists who seem to be setting the pace, if not hogging the limelight, on big data in massive data centers today.

The database targets boffins, number crunchers and computer scientists and will scale, it's claimed, from megabytes of data to petabytes running on tens of thousands - all on industry standard, multi-core x86 servers with little human administration.

"The 'answer' is the current thing I'm focused on," Stonebraker told The Reg about the work on SciDB.

"The 'answer' in the 1980s was there was only one database market. In 2010, there are business processing databases with OLTP, science databases, document databases. There are genomic databases. The horizontal world of the database space has mushroomed.

"In the 1980s, the 'answer' was if all you wanted to do was business data processing, then it was relational databases. Try to stretch SQL to do everything, though, and that's an unnatural act."

According to Stonebraker, the relational model he helped popularize doesn't work in data-intensive scientific discovery because the data is multidimensional.

Combining and sharing that data means complicated engineering work on the part of developers and database admins, and it produces bottlenecks. Scientists have been rolling their own architectures or - recently - deploying Hadoop, the open-source implementation of Google's MapReduce.

SciDB goes beyond the relational world Stonebraker helped pioneer by swapping rows and columns for mathematical arrays that put fewer restrictions on the data and can work in any number of dimensions. Stonebraker claimed arrays are 100 or so times faster than a RDBMS on this class of problem.

A database for all seasons

It's a world away from where Stonebraker started. In addition to being multidimensional and offering array-based scaling from megabytes to petabytes and running on tens of thousands of clustered nodes, SciDB's will be write once read many, allow bulk load rather than single road insert, provide parallel computation, be designed for automatic rather than manual administration, and work with R, Matlab, IDL, C++ and Python.

SciDB's being piloted by healthcare products giant Novartis, the Large Synoptic Survey Telescope (LSST), Fermilab and an unnamed Russian astronomy project.

"Postgres is no good at the data warehouse market because the science market wants arrays, they don't want tables. But arrays are impossibly slow on top of tables. Postgres has arrays, but they were supported by blobs, so weren't first-class citizens," Stonebraker said.

"I learned if you want to advance the data warehouse market and want to go fast, you need a column store, not a row store...There are unbelievable advantages to specialization."

SciDB's is Stonebraker's biggest departure from the rules of the relational road. It's a journey that in the last five years has seen Stonebraker reinvent different parts of the relational stack.

The Power of One eBook: Top reasons to choose HP BladeSystem

Next page: Battle of the rows

More from The Register

next story
Apple fanbois SCREAM as update BRICKS their Macbook Airs
Ragegasm spills over as firmware upgrade kills machines
HIDDEN packet sniffer spy tech in MILLIONS of iPhones, iPads – expert
Don't panic though – Apple's backdoor is not wide open to all, guru tells us
Mozilla fixes CRITICAL security holes in Firefox, urges v31 upgrade
Misc memory hazards 'could be exploited' - and guess what, one's a Javascript vuln
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
Captain Kirk sets phaser to SLAUGHTER after trying new Facebook app
William Shatner less-than-impressed by Zuck's celebrity-only app
Cheer up, Nokia fans. It can start making mobes again in 18 months
The real winner of the Nokia sale is *drumroll* ... Nokia
EU dons gloves, pokes Google's deals with Android mobe makers
El Reg cops a squint at investigatory letters
Chrome browser has been DRAINING PC batteries for YEARS
Google is only now fixing ancient, energy-sapping bug
prev story

Whitepapers

Designing a Defense for Mobile Applications
Learn about the various considerations for defending mobile applications - from the application architecture itself to the myriad testing technologies.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.