Feeds

SciDB: Relational daddy answers Google, Hadoop, NoSQL

Stonebraker doesn't drop ACID

Secure remote control for conventional and virtual desktops

Mention "relational databases" and a few people's names might spring to mind: Oracle's Larry Ellison, thanks to his billions, or Monty Widenius, main author of the ferociously popular MySQL. Geekier types might plump for Oracle's former Dr DBA Ken Jacobs or open-sourcer Brian Akers, who helped architect MySQL.

Michael Stonebraker's name probably doesn't jump very high in many minds outside computer science, yet it was Stonebraker's quick thinking 40 years ago that paved the way for the industry these better-knowns call home.

A faculty member of the University of California in Berkeley, Stonebraker seized on the research of IBM mathematician Tedd Codd to start work co-developing the industry's first relational database in 1973: Ingres. Ellison wasn't around until 1977, while it took lumbering IBM, owner of the mighty DB2, another eight years before it had something.

Ingres fed into Sybase as Sybase founder Robert Epstein was one of those who worked on Ingres, and then Microsoft's SQL Server. Through his various teaching positions, Stonebraker has also schooled CEOs, CTOs, founders and vice presidents of engineering at VMware, Sleepycat Software, Tibco, Oracle, Documentum, Alfresco and Cloudera. Along the way, Stonebraker found time to deliver Postgres, Mariposa, Aurora, C-Store and H-Store and help found startups to sell and support them: Illustra Information Technologies, Cohera, Streambase, Vertica and VoltDB.

After 40 years, though, Stonebraker finally thinks it's no longer a "one-size-fits-all" world and that there could be more to life than just relational. His latest work, SciDB, is going post-relational to serve the needs of those working with big data - large volumes of information crunched on thousands of nodes in distributed data centers.

"In the 1980s, the 'answer' was if all you wanted to do was business data processing, then it was relational databases. Try to stretch SQL to do everything, though, and that's an unnatural act."

And as this pioneer from the past keeps working, he's come into conflict with those on today's leading edge - the NoSQL movement - as he's put the sacred cows of the Web 2.0 crowd in their place for cheaply sacrificing the benefits of the relational technology he pioneered.

As autumn kicks in, the 66-year-old MIT adjunct professor is on the cusp of releasing the first code under open source for SciDB, a collaboration with long-time colleague Dave DeWitt.

SciDB is Stonebraker's big-data analytics play in an era of Google's MapReduce, Apache Software Foundation's Hadoop and the NoSQL evangelists who seem to be setting the pace, if not hogging the limelight, on big data in massive data centers today.

The database targets boffins, number crunchers and computer scientists and will scale, it's claimed, from megabytes of data to petabytes running on tens of thousands - all on industry standard, multi-core x86 servers with little human administration.

"The 'answer' is the current thing I'm focused on," Stonebraker told The Reg about the work on SciDB.

"The 'answer' in the 1980s was there was only one database market. In 2010, there are business processing databases with OLTP, science databases, document databases. There are genomic databases. The horizontal world of the database space has mushroomed.

"In the 1980s, the 'answer' was if all you wanted to do was business data processing, then it was relational databases. Try to stretch SQL to do everything, though, and that's an unnatural act."

According to Stonebraker, the relational model he helped popularize doesn't work in data-intensive scientific discovery because the data is multidimensional.

Combining and sharing that data means complicated engineering work on the part of developers and database admins, and it produces bottlenecks. Scientists have been rolling their own architectures or - recently - deploying Hadoop, the open-source implementation of Google's MapReduce.

SciDB goes beyond the relational world Stonebraker helped pioneer by swapping rows and columns for mathematical arrays that put fewer restrictions on the data and can work in any number of dimensions. Stonebraker claimed arrays are 100 or so times faster than a RDBMS on this class of problem.

A database for all seasons

It's a world away from where Stonebraker started. In addition to being multidimensional and offering array-based scaling from megabytes to petabytes and running on tens of thousands of clustered nodes, SciDB's will be write once read many, allow bulk load rather than single road insert, provide parallel computation, be designed for automatic rather than manual administration, and work with R, Matlab, IDL, C++ and Python.

SciDB's being piloted by healthcare products giant Novartis, the Large Synoptic Survey Telescope (LSST), Fermilab and an unnamed Russian astronomy project.

"Postgres is no good at the data warehouse market because the science market wants arrays, they don't want tables. But arrays are impossibly slow on top of tables. Postgres has arrays, but they were supported by blobs, so weren't first-class citizens," Stonebraker said.

"I learned if you want to advance the data warehouse market and want to go fast, you need a column store, not a row store...There are unbelievable advantages to specialization."

SciDB's is Stonebraker's biggest departure from the rules of the relational road. It's a journey that in the last five years has seen Stonebraker reinvent different parts of the relational stack.

Next gen security for virtualised datacentres

Next page: Battle of the rows

More from The Register

next story
Why has the web gone to hell? Market chaos and HUMAN NATURE
Tim Berners-Lee isn't happy, but we should be
Apple promises to lift Curse of the Drained iPhone 5 Battery
Have you tried turning it off and...? Never mind, here's a replacement
Sin COS to tan Windows? Chinese operating system to debut in autumn – report
Development alliance working on desktop, mobe software
Microsoft boots 1,500 dodgy apps from the Windows Store
DEVELOPERS! DEVELOPERS! DEVELOPERS! Naughty, misleading developers!
Eat up Martha! Microsoft slings handwriting recog into OneNote on Android
Freehand input on non-Windows kit for the first time
Linux turns 23 and Linus Torvalds celebrates as only he can
No, not with swearing, but by controlling the release cycle
This is how I set about making a fortune with my own startup
Would you leave your well-paid job to chase your dream?
prev story

Whitepapers

A new approach to endpoint data protection
What is the best way to ensure comprehensive visibility, management, and control of information on both company-owned and employee-owned devices?
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Maximize storage efficiency across the enterprise
The HP StoreOnce backup solution offers highly flexible, centrally managed, and highly efficient data protection for any enterprise.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Next gen security for virtualised datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.