Fed up with database speed? Meet Big Blue's BLU-eyed boy

El Reg drills into 10TB/s acceleration tech

Analysis Like other system vendors with their own software stacks, IBM is trying to boost the processing speed of its database software so it can take on larger and larger data munching jobs.

The company launched its BLU Acceleration feature for several of its databases a few weeks ago as part of a broader big data blitzkrieg, saying that it could do analytics and process reports on the order to 8 to 25 times faster than plain vanilla DB2. IBM was a little short on the details about how this turbocharger for databases works at the time, but El Reg has hunted around to get the scoop.

Tim Vincent - chief architect for DB2 on the Linux, Unix and Windows platforms, an IBM Fellow, and chief technology officer for IBM's Information Management division - walked us through the BLU Acceleration details.

The feature is in tech preview and is only available for DB2 10.5 database and its TimeSeries extensions for the Informix database. (Yup, IBM is still peddling Informix.) And the database gooser is restricted to reporting and analytics jobs, but there is every reason to believe that Big Blue use it to help goose transaction processing as well and, equally importantly, make BLU Acceleration available for the versions of DB2 for z/OS and other IBM proprietary operating systems.

Like other IT vendors, IBM wants companies to think that every bit of data that they generate or collect from their systems or buy from third parties in the course of running their business is valuable, and the reason is simple.

This sells storage arrays, and if you can make CEOs think this data is potentially valuable, then they will fork out the money to keep it inside of various kinds of data warehouses or Hadoop clusters for data at rest or in InfoSphere Streams systems for data and telemetry in motion.

There is big money in them there big data hills, and with server virtualization pulling the rug out from underneath the server business in the past decade, hindering revenue growth, the funny thing about these big data jobs is that none of them are virtualized and based on the massive amounts of data they need to absorb every day, they keep swelling like a batch of yeast.

IBM is not yet making any promises about bringing BLU Acceleration, which can boost analytics queries while at the same time reducing storage capacity needs by a factor of ten thanks to columnar data compression, to other databases. But Vincent hinted pretty strongly.

"We do plan on extending this," Vincent said in a presentation following the BLU Acceleration launch, "and we are going to bring the technology into new products going forward."

So what exactly is BLU Acceleration? Well, it is a lot of different things.

First, BLU implements a new runtime that is embedded inside of the DB2 database and a new table type that is used by that runtime. These BLU tables coexist with the traditional row tables in DB2, and have the same schema and use storage and memory the same way.

The BLU tables orient data in columns instead of the classic row structured table used in relational databases, and this data is encoded in such a manner (using what Vincent called an approximate Huffman encoding algorithm) that has an extra feature whereby the data is kept in order so it can be searched even while it is compressed.

The BLU Acceleration feature has a memory paging architecture so that an entire database table does not have to reside in main memory to be processed, but the goal is to use the columnar format to allow the database to be compressed enough so it can reside in main memory and be much more quickly searched. But again, it is not required, like some in-memory database management systems, and you can move chunks of a BLU database into main memory as you need to query it.

BLU Acceleration also knows about multiple core processors and the SIMD engines and vector coprocessors on chips, and it can take advantage of the cores and coprocessors to compress and search data. The Actionable Compression algorithm, as IBM calls it, is patented and allows for data to be used without decompressing it, which is a neat trick.

The acceleration feature also can do something called data skipping, which means it can avoid processing irrelevant data in a table to do a query.

Here's the compare and contrast between the way DB2 works now, with all of the snazzy features to improve its performance that have been added over the years, and the way the BLU Acceleration feature works:

You have to do a lot of stuff to make a relational database do a query

You have to do a lot of stuff to make a relational database do a query

This system hack at El Reg is no a database expert, but that comparison is funny.

The freaky thing about BLU Acceleration is that it does not have database indexes. You don't have to do aggregates on the tables, you don't have to tune your queries or the database, and you don't have to make any changes to SQL or database schemes.

"You just load the data and query it," as Vincent put it.

The reason that you don't need a database index is that data is compressed so a BLU table can, generally speaking, reside in memory. Vincent said that 80 percent of the data warehouses in the world had 10TB of capacity, so if you can use the Actionable Compression feature of BLU and get a 10X compression ratio, then you can fit the typical data warehouse in a 1TB memory footprint.

But there are more tricks that speed up those database queries, as you can see here:

How BLU Acceleration works

An example query showing how BLU Acceleration works

Once you have compressed the data so it all fits into main memory, you take advantage of the fact that you have organized the data in columnar format instead of row format. So, in this case, you put each of ten years of data into ten different columns each, for a total of a hundred columns. And when you want to search in 2010 only for a set of the data, as the query above - find the number of sale deals that the company did in 2010 - does, then you reduce that query down to 10GB of the data in the entire set.

The data skipping feature in this case knows to look for sales data, not other kinds of data, so that reduces the data set down to around 1GB. The machine you are using to run this BLU Acceleration feature not only has 1TB of main memory but 32 cores, so you parallelize the query and break it up so 32MB chunks of the data are partitioned and parceled out to each of the 32 cores and their memory segments.

Now, use the vector processing capability in an x86 or Power processor, and you get around a factor of four speedup in scanning the data for the sales data.

The result is that you can query a 10TB table in a second or less. And, IBM has something to sell against Oracle's Exadata and Teradata's appliances, which both have columnar data stores and other features to goose performance. IBM really needs to get this BLU Acceleration feature running on OLTP jobs. ®

Sponsored: Minds Mastering Machines - Call for papers now open

Biting the hand that feeds IT © 1998–2018