This article is more than 1 year old

Elasticsearch tells us all about its weighty Big Data tool

Customers include EMC and Cisco, says firm

Use cases: Detecting, er, failed ATMs

The raw data on HDFS and other stores is cleaned, cleared of junk, and enriched with, for example, IP addresses turned into Geo code, like post codes. It's normalised into JSON (developer format) and stored in multiple files, such as indices (metadata) with pointers to the actual data.

Elasticsearch puts data on the nodes in a cluster: "Its our distributed file system on disk." Then, Kibana is used to look at the search result data and visualise it. Users, who don't need to be data scientists, can check out these graphically-presented results then ask other questions of the data to investigate what's happening and see, for example, how successful an ad campaign was; why a DBMS is running slowly; and even when did an ATM break down?

This example sounded weird. People send tweets about ATMs failing and these tweets can be used to detect and locate a failed ATM faster than the bank's ATM infrastructure (and, apparently, faster than its obviously inadequate sensor and sensor data tracking mechanism can).

Banon said they have developed machine-learning algorithm to bubble up outliers in a data set region, the corner cases. You can apply this idea to fraud detection. "We have algorithms in our system that are the results of training ... We keep on pushing the boundaries into what is unsolvable - getting a structure out of unstructured data," said Banon.

One use case for Elasticsearch is to send out alerts to people, a reverse search. "If a doc comes in, search it and send out alerts to registered people. The alert request is in effect a query."

In four years, Elasticsearch has come from nowhere to being one of the premier big data search engines with a vast and growing set of use cases. This is Linux-type software but for Big Data and when we say "Big" we mean BIG and getting BIGGER.

Elasticsearch says it's had 600,000 downloads/month and that number is rising. Customers include British Airways, Chevron, Comcast, Walmart, PayPal, ebay, nielsen and Cisco.

Could it be true that the marketing hype for Big Data is not only real but underestimates its importance? ®

* What is EMC using Elasticsearch for? The company said it was a secret. Do tell (at some point).

More about

TIP US OFF

Send us news


Other stories you might like