Feeds

SciDB: Relational daddy answers Google, Hadoop, NoSQL

Stonebraker doesn't drop ACID

Protecting against web application threats using SSL

Mention "relational databases" and a few people's names might spring to mind: Oracle's Larry Ellison, thanks to his billions, or Monty Widenius, main author of the ferociously popular MySQL. Geekier types might plump for Oracle's former Dr DBA Ken Jacobs or open-sourcer Brian Akers, who helped architect MySQL.

Michael Stonebraker's name probably doesn't jump very high in many minds outside computer science, yet it was Stonebraker's quick thinking 40 years ago that paved the way for the industry these better-knowns call home.

A faculty member of the University of California in Berkeley, Stonebraker seized on the research of IBM mathematician Tedd Codd to start work co-developing the industry's first relational database in 1973: Ingres. Ellison wasn't around until 1977, while it took lumbering IBM, owner of the mighty DB2, another eight years before it had something.

Ingres fed into Sybase as Sybase founder Robert Epstein was one of those who worked on Ingres, and then Microsoft's SQL Server. Through his various teaching positions, Stonebraker has also schooled CEOs, CTOs, founders and vice presidents of engineering at VMware, Sleepycat Software, Tibco, Oracle, Documentum, Alfresco and Cloudera. Along the way, Stonebraker found time to deliver Postgres, Mariposa, Aurora, C-Store and H-Store and help found startups to sell and support them: Illustra Information Technologies, Cohera, Streambase, Vertica and VoltDB.

After 40 years, though, Stonebraker finally thinks it's no longer a "one-size-fits-all" world and that there could be more to life than just relational. His latest work, SciDB, is going post-relational to serve the needs of those working with big data - large volumes of information crunched on thousands of nodes in distributed data centers.

"In the 1980s, the 'answer' was if all you wanted to do was business data processing, then it was relational databases. Try to stretch SQL to do everything, though, and that's an unnatural act."

And as this pioneer from the past keeps working, he's come into conflict with those on today's leading edge - the NoSQL movement - as he's put the sacred cows of the Web 2.0 crowd in their place for cheaply sacrificing the benefits of the relational technology he pioneered.

As autumn kicks in, the 66-year-old MIT adjunct professor is on the cusp of releasing the first code under open source for SciDB, a collaboration with long-time colleague Dave DeWitt.

SciDB is Stonebraker's big-data analytics play in an era of Google's MapReduce, Apache Software Foundation's Hadoop and the NoSQL evangelists who seem to be setting the pace, if not hogging the limelight, on big data in massive data centers today.

The database targets boffins, number crunchers and computer scientists and will scale, it's claimed, from megabytes of data to petabytes running on tens of thousands - all on industry standard, multi-core x86 servers with little human administration.

"The 'answer' is the current thing I'm focused on," Stonebraker told The Reg about the work on SciDB.

"The 'answer' in the 1980s was there was only one database market. In 2010, there are business processing databases with OLTP, science databases, document databases. There are genomic databases. The horizontal world of the database space has mushroomed.

"In the 1980s, the 'answer' was if all you wanted to do was business data processing, then it was relational databases. Try to stretch SQL to do everything, though, and that's an unnatural act."

According to Stonebraker, the relational model he helped popularize doesn't work in data-intensive scientific discovery because the data is multidimensional.

Combining and sharing that data means complicated engineering work on the part of developers and database admins, and it produces bottlenecks. Scientists have been rolling their own architectures or - recently - deploying Hadoop, the open-source implementation of Google's MapReduce.

SciDB goes beyond the relational world Stonebraker helped pioneer by swapping rows and columns for mathematical arrays that put fewer restrictions on the data and can work in any number of dimensions. Stonebraker claimed arrays are 100 or so times faster than a RDBMS on this class of problem.

A database for all seasons

It's a world away from where Stonebraker started. In addition to being multidimensional and offering array-based scaling from megabytes to petabytes and running on tens of thousands of clustered nodes, SciDB's will be write once read many, allow bulk load rather than single road insert, provide parallel computation, be designed for automatic rather than manual administration, and work with R, Matlab, IDL, C++ and Python.

SciDB's being piloted by healthcare products giant Novartis, the Large Synoptic Survey Telescope (LSST), Fermilab and an unnamed Russian astronomy project.

"Postgres is no good at the data warehouse market because the science market wants arrays, they don't want tables. But arrays are impossibly slow on top of tables. Postgres has arrays, but they were supported by blobs, so weren't first-class citizens," Stonebraker said.

"I learned if you want to advance the data warehouse market and want to go fast, you need a column store, not a row store...There are unbelievable advantages to specialization."

SciDB's is Stonebraker's biggest departure from the rules of the relational road. It's a journey that in the last five years has seen Stonebraker reinvent different parts of the relational stack.

The next step in data security

Next page: Battle of the rows

More from The Register

next story
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
iOS 8 release: WebGL now runs everywhere. Hurrah for 3D graphics!
HTML 5's pretty neat ... when your browser supports it
Mathematica hits the Web
Wolfram embraces the cloud, promies private cloud cut of its number-cruncher
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Intel: Hey, enterprises, drop everything and DO HADOOP
Big Data analytics projected to run on more servers than any other app
Mozilla shutters Labs, tells nobody it's been dead for five months
Staffer's blog reveals all as projects languish on GitHub
SUSE Linux owner Attachmate gobbled by Micro Focus for $2.3bn
Merger will lead to mainframe and COBOL powerhouse
prev story

Whitepapers

Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Protecting users from Firesheep and other Sidejacking attacks with SSL
Discussing the vulnerabilities inherent in Wi-Fi networks, and how using TLS/SSL for your entire site will assure security.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.