Teradata hitches Aster hybrid database to Hadoop
SQL-H makes elephants chatty
Hadoop World 2012 Like everyone else in the business analytics racket, Teradata has to come up with ways to integrate its products with batch-style Hadoop data munchers.
The company partnered with Hadoop distie Cloudera in September 2010 to create a pipe between Hadoop clusters and Teradata data warehouses, and now Teradata is providing a little more insight into how it will link Hadoop to its Aster Data hybrid row-column database for analytical processing.
We already knew that Teradata was working on Hadoop integration with Aster Data databases, which can store data and search them by row or column and which has its own SQL-MapReduce algorithms that overlay this massively parallel database and perform similar functions that MapReduce does on a Hadoop cluster – albeit a lot faster and on a lot smaller data sets.
Teradata announced a partnership with Hortonworks, the Hadoop distie that was spun out of the Yahoo! engineering team that actually created Hadoop (or more precisely, what was left after some employees left to form other big data firms, including Cloudera), back in February of this year.
At Hadoop World 2012, Teradata lifted the veil a little bit on how it will do the integration between Hadoop data stores and Aster Data databases as part of a preview of its upcoming Aster Database 5.0 release. It turns out that HCatalog, the metadata overlay for file formats for Hadoop Distributed File System and the different components of the Hadoop stack that is being championed by Hortonworks and that is a key component of its Data Platform 1.0 Hadoop distribution, also announced this week, is the key superglue that will link Hadoop to Aster databases. And so is a query language feature of the future Aster Database 5.0 release called SQL-H.
SQL-H is an extension of ANSI-standard SQL, Steve Wooledge, senior director of marketing at Aster Data, explains to El Reg, and it is one that will allow for business analysts to use SQL-like statements to work through HCatalog to see and query data stored in HDFS and suck that data into memory on the Aster cluster so it can be sorted, diced, sliced, and otherwise analyzed.
The data that SQL-H extracts from HDFS is done without going through Pig, a high-level language to run MapReduce routines, or Hive, an ad hoc query language for Hadoop, which are both Apache projects as well and are usually part of a Hadoop distribution. SQL-H requires the Aster Database – and the forthcoming 5.0 release at that – and can be thought of as a more relational friendly way of getting at Hadoop data than Pig or Hive (at least if you are used to SQL and have no idea how to use Pig or Hive).
The other neat thing about SQL-H, says Wooledge, is that if you want to grab a chunk of data out of Hadoop and plunk it directly into the Aster database for processing later, you can extract the data through HCatalog and save it in Aster Database tables.
How Aster SQL-H hooks into Hadoop HDFS
Either way you use the data, inside memory or on persistent disk, you can integrate it with other business intelligence tools, such as the Aprimo marketing automation software now owned by Teradata or MicroStrategy dashboarding and reporting software, just to name two use cases. The point is, business analysts can work through SQL-H and not even know they are smacking against the very alien HDFS file format.
There's no word on what kind of performance this SQL-H add-on for Aster Database 5.0 has, of course, since it is not shipping until the third quarter. And while pricing has not been set yet, Wooledge says that the plan is to charge a "nominal fee" over and above the Aster Database license fees rather than a lot more because Teradata believe that once customers start playing with SQL-H, they will want to store significant amounts of data inside Aster Database rather than culling it from HDFS repeatedly. This will, of course, drive Aster Database sales, and that is really the point of this exercise. ®