Ellison munches unstructured data with Endeca buy

A massive Oracle big data/ e-commerce/analytics mashup

Only weeks after announcing that it is going to create its own Hadoop distribution running atop its own Berkeley DB NoSQL database, Oracle has snapped up Endeca Technologies, which has cooked up a data store called the MDEX Engine and some analytics and e-commerce front ends to it that Ellison & Co. want to weave into their own cohesive big data-commerce suite.

Endeca, which is apparently a bastardization for the German verb entdecken ("to discover"), was founded in 1999 at the height of the dot-com boom. The company was one of the innovators for the faceted search engines that are common on retail sites, which let you pick a subset of an online catalog by product type or brand and then search within it.

The Endeca toolset is a lot more sophisticated these days, and as you can see from this presentation that Oracle put out as part of the acquisition announcement, the company has a number of ways it will be integrating Endeca's wares into the Oracle stack.

Here's the gist of it.

The Oracle 11g database is where you put your structured operational data, and the MDEX Engine is where you plunk your operational semi-structured or unstructured data. So all that talking that Larry Ellison did only three weeks ago about how "we really don't want to have two separate databases", one for structured and the other for unstructured data, well, er, not so much.

The way it is going to work is this. You want to chew on big data, you use Hadoop and Berkeley DB. The output of that gets dumped into the MDEX Engine data store, which sits alongside it – perhaps even in the same Exadata cluster.

The MDEX Engine data store is a columnar database instead of having the row-based orientation of the Oracle database – and most other relational databases, for that matter.

Oracle is using a hybrid columnar compression technique in the Exadata storage servers underlying the Exadata platform, so this is a bit of a mashup there, too. And Teradata has just added columnar support to its data warehousing database, too.

The MDEX Engine doesn't have a set schema, but rather one that changes on the fly (that's the faceted part), and which also has some in-memory attributes like the TimesTen database that Oracle just put at the heart of its Exalytics BI appliance.

The MDEX Engine is a bit funky in that it takes a column of data and stores it partially in memory and partially on disk, and sorts it two ways, one by value and one by key. A tree-structured index is cached in memory to zip through those columns looking for data.

If you look at the datasheet for the MDEX Engine, you might think that you wouldn't need the Oracle database at all. You can pump in data extracted from your production ERP applications, content management systems, and application files, as well as clickstreams and social media data from Twitter and Facebook.

The MDEX Engine runs on 64-bit Windows or Linux platforms, and presumably will be ported to Solaris now that Oracle owns it.

That's not where the mashing up ends, however.

Endeca has a set of applications that ride on top of the MDEX Engine. One is called InFront, and it is used to customize the "customer experience" on retail Web sites, delivering targeted and relevant data on Web pages as customers browse and buy. This is done by paying attention to who you are and what you do.

Another tool that uses information stored in that columnar database is called Latitude, and it's a more traditional BI analytics tool. This will be combined with the Oracle BI suite, which is based on relational OLAP and multi-dimensional OLAP databases, so Oracle can do analytics on unstructured or semi-structured data.

MDEX will also be put side-by-side with Oracle Content Server to give better search and faceted navigation capabilities to Oracle-based Web sites, and Oracle also plans to weave together its ATG Commerce e-commerce software with Endeca's InFront, the latter of which will bring guided navigation to this retailing front-end.

Oracle did not announce the terms of the acquisition for Endeca, but the company has raised $65m in venture capital in four rounds, according to CrunchBase, so presumably Ellison paid a reasonable amount of dough to get his hands on MDEX, InFront, and Latitude before rivals SAP, HP, or IBM did. Endeca has over 600 customers worldwide.

Oracle expects the deal to close before the end of 2011. ®

Sponsored: Network DDoS protection