Every millisecond counts in large amounts
The third element is data for context. Storing data in highly scalable distributed backend databases designed for real-time queries makes it possible to deliver this extra context. It also helps to handle on-demand queries rather than truly real-time analytics. Sometimes, instead of updating on-screen financial data by the millisecond as you look for signals to buy and sell, you might be looking for answers to complex queries on demand.
“When we expand out to analysis on demand we are really trying to query a lot of data at any one time. Usually this comes down to how the data is stored,” Callaghan said. “This involves the data structures themselves and the resources available to process the request.”
He cited Google's search engine as an example. The search engine front-end can't look up all the web's data from a single database instantly to get a particular user's answers on demand, so the engine must create multiple instances and indexes of its crawled data behind the scenes to provide fast look-up times for all netizens across the planet.
While Callaghan is discussing structured data here, unstructured data is also playing a large part in real-time analytics – data such as voice or video from devices.
Such is the growth in both structured and unstructured data, we are seeing machine-learning models beginning to be employed in real-time analytics process.
ML digests vast amounts of data and learns how to recognize things. Its computationally intensive training algorithms – usually run on a set of GPUs – produce a statistical model that can classify new things such as images, phrases or sounds, with varying degrees of certainty.
“We use plenty of unstructured data in our machine learning models,” explained Alex Tilcock, director of digital foundations at IT services company Luxoft. The company deployed a system that analyses chats between traders in real time and provides assistance during negotiations, giving them information including current market prices and last deals done with the counterparty.
“Chats are by their very nature unstructured and the throughput is unpredictable, so it is important to choose an architecture that can adapt to the scale horizontally,” he said.
Drawing on this longer-term information when processing data in real time is what closes the loop for things like predictive real-time analytics. If you want to know whether a part in an engine will fail in the next minute, you want to know now. But you also need to distill that long-term engine time series data to make that prediction. Mixing these two kinds of data is important in many real-time analytics scenarios.
Machine learning models can help provide that context by digesting vast amounts of data and using it to recognize patterns in real-time information.
One example of this is playing out on London’s roads right now. Luxoft is collecting streaming data engine sensors on 9,000 buses in the UK capital via cellular networks, and processing that data via AWS Kinesis streams and Amazon’s IoT Analytics engine.
Small, low-powered computing devices in the buses collect the sensor data, and pass it to Luxoft to analyse the data for telltale warning patterns.
Luxoft uses this data to train machine learning models that recognize potential problem signals from the bus engines. It then uses Amazon GreenGrass ML Inference, a service that lets developers run these trained machine learning models directly on IoT devices as serverless Lambda functions.
In short, if a bus feels poorly, it uses on-bus real-time analytics, powered by cloud-based machine learning training, to let someone know.
“At the moment it identifies the potential for failure and it sends that data to the cloud,” said Tilcock. “What’s entirely possible is for the bus to have some logic that detects when a valve is going to fail, and reduce the pressure.”
Devices drive data change
IoT is growing – what’s in question is the pace of that growth, especially given that hurdles to adoption remain in place. These hurdles include concerns over security – of devices and protection of data – and over the prevailing lack of commonly agreed standards.
Despite this, the perception among going IoT is it has value. Analytics have, therefore, become synonymous with this new world, as organisations look to try and understand either what’s happening in the now or to analyse events as they happened for later prediction and analysis.
That’s challenging and changing existing enterprise data architectures forcing orgainsations to grow from structured and batch to real-time and unstructured. Understanding that and adapting should be one way of taking IoT out of its current, relatively small, footprint. ®
Sponsored: Webcast: Simplify data protection on AWS