What's wrong with network monitoring tools? Where do I start...

That red screen? It's just embarassment

Build a business case: developing custom apps

Opinion For as long as I can remember I've worked in an environment where there's a screen on the wall showing the status of the company's systems. Or actually, in one case, showing the status of the company's systems unless there was a test match on.

From time to time that information's been useful. Unfortunately, most of the time we've known that there's a problem because half a dozen users have called to raise tickets – the screens haven't necessarily updated in time, and when they have I've had to correlate in my head the impact of the fact that I've just been told that port 12 on switch 3 has gone down.

I've seen dozens of monitoring packages, and they've all been hideously inadequate. Some have been hideously expensive alongside their hideous inadequacy. So why is this? Why does nobody write monitoring packages that actually monitor stuff and tell you what you need to know when you need to know it?

Dodgy protocols

To be fair to monitoring software vendors, they're off to a bad start because the tools available to them are simply appalling.

SNMP (the Simple Network Management Protocol – though frankly there's nothing simple about it) is unwieldy and clunky to use, but we're stuck with it because its longevity has made it ubiquitous. Let's face it, nobody with any sense is about to try to produce an alternative because the barriers to entry into the market are insurmountable.

WMI (Windows Management Instrumentation) is actually very good, but of course it's a Microsoft-only concept so you're stuck with using it only on your Windows estate. Finally you have Syslog... well, you can give a simple priority to each type of alert but the content is largely unstructured and so the usefulness is limited.

Protocol-driven software

The next problem is that many monitoring engines are written by people who understand the protocols but have never really had to monitor anything in real life. So it's all oriented around comparing CPU usage with thresholds, alerting when a switch interface has gone down, and so on.

I've yet to use a monitoring tool that looks like the first step in its development was to send a bunch of analysts to interview network managers and say: “OK, what do you want to be able to do?”

Or if they have, they've gone back to the developers who've said: “Sorry guys, SNMP can't do that, we'll just have to make the dashboard prettier and hope people won't notice it's the same as before.”

So what would the analysts find? Let's imagine, then, that I'm an infrastructure manager and one of the aforementioned analysts descends on me for a couple of hours. What would I be saying I want? Well, here are my top 10.

1. Wildlife camera feature

The camera crews that follow Sir David Attenborough around are these days blessed with cameras that are constantly recording – the last few seconds/minutes of footage are retained and overwritten in a loop. When something interesting happens they hit the “Record” button and the last few seconds/minutes are committed to storage. This means they don't have to have the trigger finger of John Wayne on speed. I want that for my core network ports: when I have a problem, the traffic I care about is what has flowed for the past five, 10, 15 minutes so I want to retain it for a sensible amount of time.

2. Filter by device

If a switch lights up red on the monitoring screen, I want to click on it and pop up the alerts and Syslog entries that relate to it. If a port lights up I want to see that data filtered for that port.

3. Muppet detector

I want the network monitoring package to tell me that the end-to-end connection between a virtual server and the backup server is inefficient because one of the eight or 10 LAN ports the traffic is traversing hasn't got Jumbo Frames turned on.

4. Which way?

I want to see (visually and legibly) the path used by traffic between two endpoints. That means understanding what the load balancer is doing, figuring out which of the physical nodes in a Virtual Router Redundancy Protocol group is carrying the traffic, and so on. And when you've done it, show me the step-by-step operation of the application traffic so I can see where the delays are (and do it at application level, please, so that I can see that, say, the network is fast but the app is being killed by DNS timeouts).

Boost IT visibility and business value

More from The Register

next story
Sysadmin Day 2014: Quick, there's still time to get the beers in
He walked over the broken glass, killed the thugs... and er... reconnected the cables*
Auntie remains MYSTIFIED by that weekend BBC iPlayer and website outage
Still doing 'forensics' on the caching layer – Beeb digi wonk
SHOCK and AWS: The fall of Amazon's deflationary cloud
Just as Jeff Bezos did to books and CDs, Amazon's rivals are now doing to it
VVOL update: Are any vendors NOT leaping into bed with VMware?
It's not yet been released but everyone thinks it's the dog's danglies
BlackBerry: Toss the server, mate... BES is in the CLOUD now
BlackBerry Enterprise Services takes aim at SMEs - but there's a catch
prev story


Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.