Feeds

Business intelligence startup tarts up Hadoop for managers

Platfora puts lipstick on the elephant

Choosing a cloud hosting partner with confidence

For the world's most lauded open source data platform, Hadoop is remarkably difficult to use, so Tuesday brings another company slinging a tool that entices managers and analysts into fiddling with the elephant.

This time it's analytics startup Platfora with the general release of its in-memory business intelligence layer atop Hadoop. Unlike rival BI engines, Platfora lets you interrogate your Hadoop-stored data via a graphical user interface – no need for terminal here, folks*.

Platfora is an "exploratory BI interface in the spirit of Tableau, Spotfire [but] built natively for the [Hadoop] stack. ... the primary interface is definitely a visual way of working with data," Platfora chief and former head of products for EMC Greenplum Ben Werther told The Register.

The company's plan to make Hadoop as easy to query as possible has struck a chord with the venture capital community, who smelled money and pumped $20m into the company in November, 2012.

Its GUI-heavy approach stands out from other methods of interrogating HDFS. Alternate tools designed to make the obtuse platform accessible work either by layering a SQL engine on top of Hadoop (Concurrent, EMC/Greenplum's HAWQ), making do with the worthy-but-clumsy Hive (Intel), or by pulling the data into another more friendly analytics system, such as ParAccel.

Though these systems can be useful – and in the case of Cloudera's query layer Impala or EMC/Greenplum's Hawq, much faster – they lack the ease-of-use features of Platfora, Werther says.

Platforma can also be accessed via SQL-like and JSON-like APIs, but this is not the priority, he said.

The technology also competes with standard BI tools such as Tableau, Qlikview, and Tibco Spotfire. "These are all fine solutions in a traditional SQL world," Werther says. "They claim they want to be Hadoop and work in a Hadoop world, but they don't have any of the architecture necessary to make this a first-class experience."

Platfora integrates directly with Hadoop, so companies do not need to suck the data into another ETL or data warehouse, he explained.

The technology has three layers – the web-based exploratory BI layer, a scale-out columnar-compressed in-memory engine, and the Hadoop data refinery which runs MapReduce jobs across HDFS data.

Platfora works by grabbing samples of data from HDFS to create a catalog that can be accessed via the web GUI. The system can handle delimited data, AVRO JSON, log records, regex-parseable data, and "other formats," Werther said. When users select the particular data they want to analyse, the system will plan a series of MapReduce jobs to spew data into a partitioned, columnar-compressed dimensional data mart – Platfora calls this a "lens" – which runs automatically. When this is done, the resultant blocks of data are pulled into the Platfora nodes and triple-replicated across disks for redundancy, then when a user makes a query the pieces are pulled into memory.

Perhaps the technology most similar to Platfora is SAP HANA, with both companies having the same belief about analytics – if you can, do it from memory. However, SAP is focused on bridging SAP transactional data and keeping all of it in memory, Werther said, while Platfora is more about providing a way to interface with a massive pool of HDFS data and selectively load it into memory.

The company has no special plans for an intermediary storage layer, like flash, Werther said. Pricing is done on a per-node basis, but was not disclosed.

There's a feeling brewing among users and developers that big-data tools cost too much and do too little, probably emanating from the eye-watering salaries needed to support Hadoop-whisperers and the fact that although these people may speak HDFS, they might not be the best at designing queries for it. Platfora's strategy of making money by prettying-up Hadoop is representative of the overall big-data industry, which is waking up to the fact that if HDFS truly is becoming the all-purpose storage format for ingested data, then there's money to be made by designing tools to let more people analyse it. ®

*Bootnote

This begs the question as to how easy-to-use a data analysis system needs to be – after all, nothing is more dangerous for an organization than the pointy-haired denizens of the upper floors suddenly being able to query all stored data and develop opinions about what the business should really be doing, right?

Internet Security Threat Report 2014

More from The Register

next story
Preview redux: Microsoft ships new Windows 10 build with 7,000 changes
Latest bleeding-edge bits borrow Action Center from Windows Phone
Google opens Inbox – email for people too thick to handle email
Print this article out and give it to someone tech-y if you get stuck
Microsoft promises Windows 10 will mean two-factor auth for all
Sneak peek at security features Redmond's baking into new OS
UNIX greybeards threaten Debian fork over systemd plan
'Veteran Unix Admins' fear desktop emphasis is betraying open source
Entity Framework goes 'code first' as Microsoft pulls visual design tool
Visual Studio database diagramming's out the window
Google+ goes TITSUP. But WHO knew? How long? Anyone ... Hello ...
Wobbly Gmail, Contacts, Calendar on the other hand ...
DEATH by PowerPoint: Microsoft warns of 0-day attack hidden in slides
Might put out patch in update, might chuck it out sooner
Ubuntu 14.10 tries pulling a Steve Ballmer on cloudy offerings
Oi, Windows, centOS and openSUSE – behave, we're all friends here
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Security for virtualized datacentres
Legacy security solutions are inefficient due to the architectural differences between physical and virtual environments.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.