Original URL: http://www.theregister.co.uk/2008/11/14/large_databases/
Time to reject traditional database techniques?
'Big' data and the BI challenge
Mainstream database management system (DBMS) technology faces a challenge from new approaches that reject the relational model. The battleground is set to be the market for business intelligence based on very large databases.
Some main players in DBMS software are already jockeying for position with revamped database products aimed at recapturing ground lost to newer products. Recently Microsoft unveiled Kilimanjaro, the next massively scalable version of its SQL Server with a strong BI flavor, while database market number-one Oracle joined forces with Hewlett-Packard to launch its Exadata storage grid.
Both announcements shared a common theme: How to make huge volumes of data easily available to power business intelligence applications?
And the numbers in question are huge. We are, of course, familiar with "huge" numbers in time of billion-dollar banking-industry bail outs. But these figures are dwarfed by the numbers of transactions pouring into some company databases and the amount of storage needed to accommodate them.
Back in January, Google reckoned it processed 20 petabytes of data a day - a number that has doubtless grown significantly since. And even lower down the scale, LGR Telecommunications is reported to be adding 13 billion records each day to a 310 terabyte data warehouse system and expects its petabyte of disks to double in the next year.
Although such huge volumes are still relatively unusual, it will not be long before even relatively small organizations will think of terabytes and petabytes of data as commonplace. If they want to make practical use of the data in business intelligence applications, they will find their traditional relational DBMS technology stretched.
Cracks in the edifice
It is not only the logistics of storing and managing such enormous amounts of data that poses a big challenge to DBMS builders. There is also the problem of giving users access to the data in a form that it might actually be useful. User queries have grown more complex and the limitations of traditional access methods based on Structured Query Language (SQL) have been exposed.
The cracks in relational DBMS and the inadequacies of SQL were highlighted in a paper called The End of an Architectural Era, presented at the conference on Very Large Databases (VLDB) in September 2007. The collaborative work of several DBMS gurus - including Ingres/PostIngres originator Michael Stonebraker - the paper, declared the relational model obsolete and argued that alternative approaches were better suited to today's data management and access problems.
Specialized databases such as those built by Google and Yahoo, data warehouse software such as Vertica and Monet and innovative DBMS such as H-Store were all claimed to outperform relational DBMS products. Even in Online Transaction Processing (OLTP) - the traditional strong point of classic relational databases - H-Store was claimed to perform better.
At the heart of the argument against relational DBMS are, firstly, what are seen as the limitations of the old relational model and SQL and, secondly, how they may either be upgraded or replaced. Some argue in favor of new approaches such as the MapReduce technology used by Google to power its massive search engine operations. Others hold true to the transactional integrity and ACID properties built into traditional DBMS such as IBMS DB2, Oracle, and Microsoft's SQL Server.
Even the DBMS gurus can be confusing. While advocating new approaches to DBMS in the 2007 VLDB paper, Stonebraker provoked a storm in January when he co-authored a critique of Google's MapReduce that was widely acknowledged as a "new approach".
One of Stonebraker's criticisms of MapReduce was the lack of SQL-like tools. The omission was remedied in August by newcomers Aster Data and Greenplum - so it appears that there's still a need for at least some bits of relational DBMS technology.
But even MapReduce has limitations. Recent analysis carried out by eBay revealed some resource usage problems.
The future of DBMS technology rests on a combination of tried-and-tested techniques from the past and innovations to cope with large data volumes and more demanding users.
The recent announcements from Oracle and Microsoft embody some of the changes that point towards some sort of consensus on future development of DBMS. Oracle's Exadata and Microsoft's Kilimanjaro take on ideas from more modern approaches to DBMS and fold them into the tradition.
Oracle and Microsoft's new plans also include in-memory processing, massively-parallel processing, and the column-storage approach used in data warehouse products such as Sybase IQ and, more recently, Vertica, and Google's BigTable
SQL and the relational model appear, it seems, are positioned to survive intact. ®