It's OK, you can pick up real-time IoT analytics – it won't bite... unless you ignore this advice

Your gentle first-steps to processing live information streaming from networked sensors

Someone checking the time

Comment The Internet of Things is growing, and it feels unstoppable.

Smart home appliances, energy meters, and wearable gizmos are the public face of IoT, and you can add to the list industrial control and factory equipment, warehouse pallets and packaging, delivery vans, and so on – all things that generate real-time data, which require real-time analytics to process.

In manufacturing and warehousing, these gadgets are tracking the movement of goods and freight through a supply chain; the transfer of raw materials to customer. The data coming off these trackers and gizmos can be used to calculate the best route for a delivery to hit a given deadline, without guzzling fuel or punishing vehicles. This isn't science fiction: online retailer Ocado, for one, uses this flow of information to optimize its shipping.

Science fact: Failure is an option

Ganesh Ramamoorthy, a Gartner senior analyst, claimed in 2016 that eight out of ten IoT projects would fail before they were even launched. The reason: those failing IoT systems were solutions looking for a problem, and likely a problem that simply did not and would not exist.

It sounds obvious, but it's such a fundamental point it can get lost in project development. If you're planning and building out an infrastructure of internet or network-connected sensors and embedded devices for your business, ensure they serve a specific and clearly defined purpose, and that they generate data that is useful – information that can be used to fix problems immediately and optimize processes long-term.

That hosepipe of data, that telemetry, from your IoT network will likely need real-time processing and analysis to give you a condensed and informative on-the-spot report of what's happening within your organization.

You can crunch the numbers quarter by quarter or week by week later, provided you have the storage to hold all that info in the meantime.

But if you want to know where and when components are coming and going, where customers or website visitors are arriving and leaving, which servers and network links are up or down, when assembly lines are slowing down or are under-utilized, and so on, minute by minute, or hour by hour, you'll need a framework in place to collect, process, and present that information live. Something like the banks of displays in a NASA mission control room – just, preferably, in a way that can fit on a monitor or two, for your own sanity.

If you're not processing this continuous sensor data in real-time, there's not really much point in collecting it live over the network as it happens. This takes us back to ensuring you have a specific clearly defined purpose for your Internet of Things gadgets: if you're picking gear that produces real-time intelligence, have something in the backend or network edge that catches and processes and formats it for you to understand. Again it seems obvious, but you'd be surprised how many times someone's been told to tail -f /var/log/* for live feedback, rather than being provided with a real-time dashboard.

A major benefit of real-time analytics is the ability to set alarms and warnings on certain conditions: when inventory falls too low, when pressure goes too high, when demand outstrips supply, and so on. This triggers immediately – sending emails, pager alerts, phone and desktop notifications, and so on – allowing you to act immediately, rather than work out what went wrong days or weeks later from archives of telemetry.


The graveyard of failed IT projects is littered with business intelligence and analytics projects that didn’t meet expectations, or proved too complicated to use. How, then, do you step into the world of real-time analytics, and avoid the costly mistakes of the past?

First, accept that there’s no one infrastructure or tool that will do the job perfectly; you need to integrate a mix of solutions. You will also have to apply quality filters to all that incoming information from your combination of sources and frameworks, because poor data quality was one of the top five reasons IoT projects collapsed, according to Cisco.

An IFS study from 2017 found that 84 per cent of manufacturers had yet to integrate the data produced by their “connected devices” with data being generated by more traditional systems such as ERP. Gartner has also suggested that “poor quality” data costs organizations on average $15m per year in losses.

Power behind data

After integration comes analysis – the real crux. This can be done on or off premises, at the network edge or in the backend – whatever fits your budget, scale, software stack, or security model. To crunch end-of-quarter or end-of-week numbers, powerful batch processor like Google’s MapReduce or its open-source cousin Hadoop can be fired up to turn the raw data into insights.

For more instant forms of reporting – such as for alerts on web servers or medical devices, you need something that can handle streams of information with low-latency and incremental processing. Here the Apache software stack, for one, is often used: that includes the Storm computation system, the Spark cluster compute framework, and Flume to aggregate and move large amounts of log data. Storage is offered in the form of a number of memory and non-relational databases.

So, you have choices. Ultimately, the infrastructure you implement will vary according to use cases and desired outcomes.

From sensors in delivery vans to web-server logs and CCTV camera footage, real-time data is growing. IoT is the fishing net in this real-time world, and managing and making sense of the data is the challenge. ®

Biting the hand that feeds IT © 1998–2019