Feeds

SciDB: Relational daddy answers Google, Hadoop, NoSQL

Stonebraker doesn't drop ACID

Beginner's guide to SSL certificates

Mention "relational databases" and a few people's names might spring to mind: Oracle's Larry Ellison, thanks to his billions, or Monty Widenius, main author of the ferociously popular MySQL. Geekier types might plump for Oracle's former Dr DBA Ken Jacobs or open-sourcer Brian Akers, who helped architect MySQL.

Michael Stonebraker's name probably doesn't jump very high in many minds outside computer science, yet it was Stonebraker's quick thinking 40 years ago that paved the way for the industry these better-knowns call home.

A faculty member of the University of California in Berkeley, Stonebraker seized on the research of IBM mathematician Tedd Codd to start work co-developing the industry's first relational database in 1973: Ingres. Ellison wasn't around until 1977, while it took lumbering IBM, owner of the mighty DB2, another eight years before it had something.

Ingres fed into Sybase as Sybase founder Robert Epstein was one of those who worked on Ingres, and then Microsoft's SQL Server. Through his various teaching positions, Stonebraker has also schooled CEOs, CTOs, founders and vice presidents of engineering at VMware, Sleepycat Software, Tibco, Oracle, Documentum, Alfresco and Cloudera. Along the way, Stonebraker found time to deliver Postgres, Mariposa, Aurora, C-Store and H-Store and help found startups to sell and support them: Illustra Information Technologies, Cohera, Streambase, Vertica and VoltDB.

After 40 years, though, Stonebraker finally thinks it's no longer a "one-size-fits-all" world and that there could be more to life than just relational. His latest work, SciDB, is going post-relational to serve the needs of those working with big data - large volumes of information crunched on thousands of nodes in distributed data centers.

"In the 1980s, the 'answer' was if all you wanted to do was business data processing, then it was relational databases. Try to stretch SQL to do everything, though, and that's an unnatural act."

And as this pioneer from the past keeps working, he's come into conflict with those on today's leading edge - the NoSQL movement - as he's put the sacred cows of the Web 2.0 crowd in their place for cheaply sacrificing the benefits of the relational technology he pioneered.

As autumn kicks in, the 66-year-old MIT adjunct professor is on the cusp of releasing the first code under open source for SciDB, a collaboration with long-time colleague Dave DeWitt.

SciDB is Stonebraker's big-data analytics play in an era of Google's MapReduce, Apache Software Foundation's Hadoop and the NoSQL evangelists who seem to be setting the pace, if not hogging the limelight, on big data in massive data centers today.

The database targets boffins, number crunchers and computer scientists and will scale, it's claimed, from megabytes of data to petabytes running on tens of thousands - all on industry standard, multi-core x86 servers with little human administration.

"The 'answer' is the current thing I'm focused on," Stonebraker told The Reg about the work on SciDB.

"The 'answer' in the 1980s was there was only one database market. In 2010, there are business processing databases with OLTP, science databases, document databases. There are genomic databases. The horizontal world of the database space has mushroomed.

"In the 1980s, the 'answer' was if all you wanted to do was business data processing, then it was relational databases. Try to stretch SQL to do everything, though, and that's an unnatural act."

According to Stonebraker, the relational model he helped popularize doesn't work in data-intensive scientific discovery because the data is multidimensional.

Combining and sharing that data means complicated engineering work on the part of developers and database admins, and it produces bottlenecks. Scientists have been rolling their own architectures or - recently - deploying Hadoop, the open-source implementation of Google's MapReduce.

SciDB goes beyond the relational world Stonebraker helped pioneer by swapping rows and columns for mathematical arrays that put fewer restrictions on the data and can work in any number of dimensions. Stonebraker claimed arrays are 100 or so times faster than a RDBMS on this class of problem.

A database for all seasons

It's a world away from where Stonebraker started. In addition to being multidimensional and offering array-based scaling from megabytes to petabytes and running on tens of thousands of clustered nodes, SciDB's will be write once read many, allow bulk load rather than single road insert, provide parallel computation, be designed for automatic rather than manual administration, and work with R, Matlab, IDL, C++ and Python.

SciDB's being piloted by healthcare products giant Novartis, the Large Synoptic Survey Telescope (LSST), Fermilab and an unnamed Russian astronomy project.

"Postgres is no good at the data warehouse market because the science market wants arrays, they don't want tables. But arrays are impossibly slow on top of tables. Postgres has arrays, but they were supported by blobs, so weren't first-class citizens," Stonebraker said.

"I learned if you want to advance the data warehouse market and want to go fast, you need a column store, not a row store...There are unbelievable advantages to specialization."

SciDB's is Stonebraker's biggest departure from the rules of the relational road. It's a journey that in the last five years has seen Stonebraker reinvent different parts of the relational stack.

Top 5 reasons to deploy VMware with Tegile

Next page: Battle of the rows

More from The Register

next story
Microsoft to bake Skype into IE, without plugins
Redmond thinks the Object Real-Time Communications API for WebRTC is ready to roll
Mozilla: Spidermonkey ATE Apple's JavaScriptCore, THRASHED Google V8
Moz man claims the win on rivals' own benchmarks
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
FTDI yanks chip-bricking driver from Windows Update, vows to fight on
Next driver to battle fake chips with 'non-invasive' methods
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
Ubuntu 14.10 tries pulling a Steve Ballmer on cloudy offerings
Oi, Windows, centOS and openSUSE – behave, we're all friends here
Apple's OS X Yosemite slurps UNSAVED docs into iCloud
Docs, email contacts... shhhlooop, up it goes
prev story

Whitepapers

Cloud and hybrid-cloud data protection for VMware
Learn how quick and easy it is to configure backups and perform restores for VMware environments.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
How to simplify SSL certificate management
Simple steps to take control of SSL certificates across the enterprise, and recommendations centralizing certificate management throughout their lifecycle.