Feeds

Business intelligence startup tarts up Hadoop for managers

Platfora puts lipstick on the elephant

New hybrid storage solutions

For the world's most lauded open source data platform, Hadoop is remarkably difficult to use, so Tuesday brings another company slinging a tool that entices managers and analysts into fiddling with the elephant.

This time it's analytics startup Platfora with the general release of its in-memory business intelligence layer atop Hadoop. Unlike rival BI engines, Platfora lets you interrogate your Hadoop-stored data via a graphical user interface – no need for terminal here, folks*.

Platfora is an "exploratory BI interface in the spirit of Tableau, Spotfire [but] built natively for the [Hadoop] stack. ... the primary interface is definitely a visual way of working with data," Platfora chief and former head of products for EMC Greenplum Ben Werther told The Register.

The company's plan to make Hadoop as easy to query as possible has struck a chord with the venture capital community, who smelled money and pumped $20m into the company in November, 2012.

Its GUI-heavy approach stands out from other methods of interrogating HDFS. Alternate tools designed to make the obtuse platform accessible work either by layering a SQL engine on top of Hadoop (Concurrent, EMC/Greenplum's HAWQ), making do with the worthy-but-clumsy Hive (Intel), or by pulling the data into another more friendly analytics system, such as ParAccel.

Though these systems can be useful – and in the case of Cloudera's query layer Impala or EMC/Greenplum's Hawq, much faster – they lack the ease-of-use features of Platfora, Werther says.

Platforma can also be accessed via SQL-like and JSON-like APIs, but this is not the priority, he said.

The technology also competes with standard BI tools such as Tableau, Qlikview, and Tibco Spotfire. "These are all fine solutions in a traditional SQL world," Werther says. "They claim they want to be Hadoop and work in a Hadoop world, but they don't have any of the architecture necessary to make this a first-class experience."

Platfora integrates directly with Hadoop, so companies do not need to suck the data into another ETL or data warehouse, he explained.

The technology has three layers – the web-based exploratory BI layer, a scale-out columnar-compressed in-memory engine, and the Hadoop data refinery which runs MapReduce jobs across HDFS data.

Platfora works by grabbing samples of data from HDFS to create a catalog that can be accessed via the web GUI. The system can handle delimited data, AVRO JSON, log records, regex-parseable data, and "other formats," Werther said. When users select the particular data they want to analyse, the system will plan a series of MapReduce jobs to spew data into a partitioned, columnar-compressed dimensional data mart – Platfora calls this a "lens" – which runs automatically. When this is done, the resultant blocks of data are pulled into the Platfora nodes and triple-replicated across disks for redundancy, then when a user makes a query the pieces are pulled into memory.

Perhaps the technology most similar to Platfora is SAP HANA, with both companies having the same belief about analytics – if you can, do it from memory. However, SAP is focused on bridging SAP transactional data and keeping all of it in memory, Werther said, while Platfora is more about providing a way to interface with a massive pool of HDFS data and selectively load it into memory.

The company has no special plans for an intermediary storage layer, like flash, Werther said. Pricing is done on a per-node basis, but was not disclosed.

There's a feeling brewing among users and developers that big-data tools cost too much and do too little, probably emanating from the eye-watering salaries needed to support Hadoop-whisperers and the fact that although these people may speak HDFS, they might not be the best at designing queries for it. Platfora's strategy of making money by prettying-up Hadoop is representative of the overall big-data industry, which is waking up to the fact that if HDFS truly is becoming the all-purpose storage format for ingested data, then there's money to be made by designing tools to let more people analyse it. ®

*Bootnote

This begs the question as to how easy-to-use a data analysis system needs to be – after all, nothing is more dangerous for an organization than the pointy-haired denizens of the upper floors suddenly being able to query all stored data and develop opinions about what the business should really be doing, right?

Secure remote control for conventional and virtual desktops

More from The Register

next story
Not appy with your Chromebook? Well now it can run Android apps
Google offers beta of tricky OS-inside-OS tech
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
Greater dev access to iOS 8 will put us AT RISK from HACKERS
Knocking holes in Apple's walled garden could backfire, says securo-chap
NHS grows a NoSQL backbone and rips out its Oracle Spine
Open source? In the government? Ha ha! What, wait ...?
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Intel: Hey, enterprises, drop everything and DO HADOOP
Big Data analytics projected to run on more servers than any other app
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Providing a secure and efficient Helpdesk
A single remote control platform for user support is be key to providing an efficient helpdesk. Retain full control over the way in which screen and keystroke data is transmitted.