Feeds

SciDB: Relational daddy answers Google, Hadoop, NoSQL

Stonebraker doesn't drop ACID

Boost IT visibility and business value

Mention "relational databases" and a few people's names might spring to mind: Oracle's Larry Ellison, thanks to his billions, or Monty Widenius, main author of the ferociously popular MySQL. Geekier types might plump for Oracle's former Dr DBA Ken Jacobs or open-sourcer Brian Akers, who helped architect MySQL.

Michael Stonebraker's name probably doesn't jump very high in many minds outside computer science, yet it was Stonebraker's quick thinking 40 years ago that paved the way for the industry these better-knowns call home.

A faculty member of the University of California in Berkeley, Stonebraker seized on the research of IBM mathematician Tedd Codd to start work co-developing the industry's first relational database in 1973: Ingres. Ellison wasn't around until 1977, while it took lumbering IBM, owner of the mighty DB2, another eight years before it had something.

Ingres fed into Sybase as Sybase founder Robert Epstein was one of those who worked on Ingres, and then Microsoft's SQL Server. Through his various teaching positions, Stonebraker has also schooled CEOs, CTOs, founders and vice presidents of engineering at VMware, Sleepycat Software, Tibco, Oracle, Documentum, Alfresco and Cloudera. Along the way, Stonebraker found time to deliver Postgres, Mariposa, Aurora, C-Store and H-Store and help found startups to sell and support them: Illustra Information Technologies, Cohera, Streambase, Vertica and VoltDB.

After 40 years, though, Stonebraker finally thinks it's no longer a "one-size-fits-all" world and that there could be more to life than just relational. His latest work, SciDB, is going post-relational to serve the needs of those working with big data - large volumes of information crunched on thousands of nodes in distributed data centers.

"In the 1980s, the 'answer' was if all you wanted to do was business data processing, then it was relational databases. Try to stretch SQL to do everything, though, and that's an unnatural act."

And as this pioneer from the past keeps working, he's come into conflict with those on today's leading edge - the NoSQL movement - as he's put the sacred cows of the Web 2.0 crowd in their place for cheaply sacrificing the benefits of the relational technology he pioneered.

As autumn kicks in, the 66-year-old MIT adjunct professor is on the cusp of releasing the first code under open source for SciDB, a collaboration with long-time colleague Dave DeWitt.

SciDB is Stonebraker's big-data analytics play in an era of Google's MapReduce, Apache Software Foundation's Hadoop and the NoSQL evangelists who seem to be setting the pace, if not hogging the limelight, on big data in massive data centers today.

The database targets boffins, number crunchers and computer scientists and will scale, it's claimed, from megabytes of data to petabytes running on tens of thousands - all on industry standard, multi-core x86 servers with little human administration.

"The 'answer' is the current thing I'm focused on," Stonebraker told The Reg about the work on SciDB.

"The 'answer' in the 1980s was there was only one database market. In 2010, there are business processing databases with OLTP, science databases, document databases. There are genomic databases. The horizontal world of the database space has mushroomed.

"In the 1980s, the 'answer' was if all you wanted to do was business data processing, then it was relational databases. Try to stretch SQL to do everything, though, and that's an unnatural act."

According to Stonebraker, the relational model he helped popularize doesn't work in data-intensive scientific discovery because the data is multidimensional.

Combining and sharing that data means complicated engineering work on the part of developers and database admins, and it produces bottlenecks. Scientists have been rolling their own architectures or - recently - deploying Hadoop, the open-source implementation of Google's MapReduce.

SciDB goes beyond the relational world Stonebraker helped pioneer by swapping rows and columns for mathematical arrays that put fewer restrictions on the data and can work in any number of dimensions. Stonebraker claimed arrays are 100 or so times faster than a RDBMS on this class of problem.

A database for all seasons

It's a world away from where Stonebraker started. In addition to being multidimensional and offering array-based scaling from megabytes to petabytes and running on tens of thousands of clustered nodes, SciDB's will be write once read many, allow bulk load rather than single road insert, provide parallel computation, be designed for automatic rather than manual administration, and work with R, Matlab, IDL, C++ and Python.

SciDB's being piloted by healthcare products giant Novartis, the Large Synoptic Survey Telescope (LSST), Fermilab and an unnamed Russian astronomy project.

"Postgres is no good at the data warehouse market because the science market wants arrays, they don't want tables. But arrays are impossibly slow on top of tables. Postgres has arrays, but they were supported by blobs, so weren't first-class citizens," Stonebraker said.

"I learned if you want to advance the data warehouse market and want to go fast, you need a column store, not a row store...There are unbelievable advantages to specialization."

SciDB's is Stonebraker's biggest departure from the rules of the relational road. It's a journey that in the last five years has seen Stonebraker reinvent different parts of the relational stack.

Build a business case: developing custom apps

Next page: Battle of the rows

More from The Register

next story
PEAK LANDFILL: Why tablet gloom is good news for Windows users
Sinofsky's hybrid strategy looks dafter than ever
KDE releases ice-cream coloured Plasma 5 just in time for summer
Melty but refreshing - popular rival to Mint's Cinnamon's still a work in progress
Leaked Windows Phone 8.1 Update specs tease details of Nokia's next mobes
New screen sizes, dual SIMs, voice over LTE, and more
Fiendishly complex password app extension ships for iOS 8
Just slip it in, won't hurt a bit, 1Password makers urge devs
Mozilla keeps its Beard, hopes anti-gay marriage troubles are now over
Plenty on new CEO's todo list – starting with Firefox's slipping grasp
Apple: We'll unleash OS X Yosemite beta on the MASSES on 24 July
Starting today, regular fanbois will be guinea pigs, it tells Reg
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
Secure microkernel that uses maths to be 'bug free' goes open source
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Maximize storage efficiency across the enterprise
The HP StoreOnce backup solution offers highly flexible, centrally managed, and highly efficient data protection for any enterprise.