Fed up with database speed? Meet Big Blue's BLU-eyed boy

El Reg drills into 10TB/s acceleration tech

High performance access to file storage

Analysis Like other system vendors with their own software stacks, IBM is trying to boost the processing speed of its database software so it can take on larger and larger data munching jobs.

The company launched its BLU Acceleration feature for several of its databases a few weeks ago as part of a broader big data blitzkrieg, saying that it could do analytics and process reports on the order to 8 to 25 times faster than plain vanilla DB2. IBM was a little short on the details about how this turbocharger for databases works at the time, but El Reg has hunted around to get the scoop.

Tim Vincent - chief architect for DB2 on the Linux, Unix and Windows platforms, an IBM Fellow, and chief technology officer for IBM's Information Management division - walked us through the BLU Acceleration details.

The feature is in tech preview and is only available for DB2 10.5 database and its TimeSeries extensions for the Informix database. (Yup, IBM is still peddling Informix.) And the database gooser is restricted to reporting and analytics jobs, but there is every reason to believe that Big Blue use it to help goose transaction processing as well and, equally importantly, make BLU Acceleration available for the versions of DB2 for z/OS and other IBM proprietary operating systems.

Like other IT vendors, IBM wants companies to think that every bit of data that they generate or collect from their systems or buy from third parties in the course of running their business is valuable, and the reason is simple.

This sells storage arrays, and if you can make CEOs think this data is potentially valuable, then they will fork out the money to keep it inside of various kinds of data warehouses or Hadoop clusters for data at rest or in InfoSphere Streams systems for data and telemetry in motion.

There is big money in them there big data hills, and with server virtualization pulling the rug out from underneath the server business in the past decade, hindering revenue growth, the funny thing about these big data jobs is that none of them are virtualized and based on the massive amounts of data they need to absorb every day, they keep swelling like a batch of yeast.

IBM is not yet making any promises about bringing BLU Acceleration, which can boost analytics queries while at the same time reducing storage capacity needs by a factor of ten thanks to columnar data compression, to other databases. But Vincent hinted pretty strongly.

"We do plan on extending this," Vincent said in a presentation following the BLU Acceleration launch, "and we are going to bring the technology into new products going forward."

So what exactly is BLU Acceleration? Well, it is a lot of different things.

First, BLU implements a new runtime that is embedded inside of the DB2 database and a new table type that is used by that runtime. These BLU tables coexist with the traditional row tables in DB2, and have the same schema and use storage and memory the same way.

The BLU tables orient data in columns instead of the classic row structured table used in relational databases, and this data is encoded in such a manner (using what Vincent called an approximate Huffman encoding algorithm) that has an extra feature whereby the data is kept in order so it can be searched even while it is compressed.

The BLU Acceleration feature has a memory paging architecture so that an entire database table does not have to reside in main memory to be processed, but the goal is to use the columnar format to allow the database to be compressed enough so it can reside in main memory and be much more quickly searched. But again, it is not required, like some in-memory database management systems, and you can move chunks of a BLU database into main memory as you need to query it.

BLU Acceleration also knows about multiple core processors and the SIMD engines and vector coprocessors on chips, and it can take advantage of the cores and coprocessors to compress and search data. The Actionable Compression algorithm, as IBM calls it, is patented and allows for data to be used without decompressing it, which is a neat trick.

The acceleration feature also can do something called data skipping, which means it can avoid processing irrelevant data in a table to do a query.

Here's the compare and contrast between the way DB2 works now, with all of the snazzy features to improve its performance that have been added over the years, and the way the BLU Acceleration feature works:

You have to do a lot of stuff to make a relational database do a query

You have to do a lot of stuff to make a relational database do a query

This system hack at El Reg is no a database expert, but that comparison is funny.

The freaky thing about BLU Acceleration is that it does not have database indexes. You don't have to do aggregates on the tables, you don't have to tune your queries or the database, and you don't have to make any changes to SQL or database schemes.

"You just load the data and query it," as Vincent put it.

The reason that you don't need a database index is that data is compressed so a BLU table can, generally speaking, reside in memory. Vincent said that 80 percent of the data warehouses in the world had 10TB of capacity, so if you can use the Actionable Compression feature of BLU and get a 10X compression ratio, then you can fit the typical data warehouse in a 1TB memory footprint.

But there are more tricks that speed up those database queries, as you can see here:

How BLU Acceleration works

An example query showing how BLU Acceleration works

Once you have compressed the data so it all fits into main memory, you take advantage of the fact that you have organized the data in columnar format instead of row format. So, in this case, you put each of ten years of data into ten different columns each, for a total of a hundred columns. And when you want to search in 2010 only for a set of the data, as the query above - find the number of sale deals that the company did in 2010 - does, then you reduce that query down to 10GB of the data in the entire set.

The data skipping feature in this case knows to look for sales data, not other kinds of data, so that reduces the data set down to around 1GB. The machine you are using to run this BLU Acceleration feature not only has 1TB of main memory but 32 cores, so you parallelize the query and break it up so 32MB chunks of the data are partitioned and parceled out to each of the 32 cores and their memory segments.

Now, use the vector processing capability in an x86 or Power processor, and you get around a factor of four speedup in scanning the data for the sales data.

The result is that you can query a 10TB table in a second or less. And, IBM has something to sell against Oracle's Exadata and Teradata's appliances, which both have columnar data stores and other features to goose performance. IBM really needs to get this BLU Acceleration feature running on OLTP jobs. ®

High performance access to file storage

More from The Register

next story
Windows 8.1, which you probably haven't upgraded to yet, ALREADY OBSOLETE
Pre-Update versions of new Windows version will no longer support patches
Android engineer: We DIDN'T copy Apple OR follow Samsung's orders
Veep testifies for Samsung during Apple patent trial
OpenSSL Heartbleed: Bloody nose for open-source bleeding hearts
Bloke behind the cockup says not enough people are helping crucial crypto project
Microsoft lobs pre-release Windows Phone 8.1 at devs who dare
App makers can load it before anyone else, but if they do they're stuck with it
Half of Twitter's 'active users' are SILENT STALKERS
Nearly 50% have NEVER tweeted a word
Windows XP still has 27 per cent market share on its deathbed
Windows 7 making some gains on XP Death Day
Internet-of-stuff startup dumps NoSQL for ... SQL?
NoSQL taste great at first but lacks proper nutrients, says startup cloud whiz
US taxman blows Win XP deadline, must now spend millions on custom support
Gov't IT likened to 'a Model T with a lot of things on top of it'
prev story


Mainstay ROI - Does application security pay?
In this whitepaper learn how you and your enterprise might benefit from better software security.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
3 Big data security analytics techniques
Applying these Big Data security analytics techniques can help you make your business safer by detecting attacks early, before significant damage is done.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.