Graphite core? There are other ways to monitor your operation's heart
But why would you want to?
The BBC and NHS epitomise enterprise: the BBC has 23,000 staff while the NHS is one of the world's largest employers, with 1.4 million. Their IT estate is vast and central to the delivery of their services. The BBC's iPlayer is on the front line in a world of on-demand TV defined by Netflix, and among its layered infrastructure NHS servers run complex systems of reconciling payment for treatments.
Each organisation monitors these systems with the aid of something you can download and install for free under an open-source licensing model.
Both iPlayer and those NHS treatment servers use Graphite, written by Chris Davis, formerly of web travel site Orbtiz as a side project which became the firm's monitoring tool. Graphite stores realtime systems data and renders cool graphs. It was released under an Apache 2.0 licence in 2008. Nearly 10 years later, use of Graphite is widespread.
But is this the way we should all be going? Graphite isn't a collection agent, but orders and presents your data. It is, however, unsupported – arguably a big fail for those in working in mission-critical fields. You could also argue it's a "point solution". There are more monitoring options out there than you can shake a stick at. With that in mind, which approach should you go for?
The commercial commodity product
I've used SolarWinds Orion in several of my recent jobs, and it's a fine example of what you'd call a solid, popular, commercial monitoring tool. SolarWinds has a market capitalisation of about $4.3bn, which is interesting because it reflects just how many people have bought the product. We're not talking about something that costs tens of thousands per annum and sells a few thousand – a 250-element licence will set you back a tad over £5,000 at today's prices so it's a mass-market offering.
The SolarWinds estate has grown over the years. A while back they hoovered up Kiwi Software, which was a fairly obvious addition to the family given I was using its Syslog server and CatTools configuration manager alongside SolarWinds' monitoring engine.
Like many of its peers in the market, SolarWinds now has a product catalogue that runs almost off the bottom of the browser screen. The most comparable competitor is probably ManageEngine, which, while its OpManager suite will do your network monitoring for you, has a family of bazillions, which will do everything from Active Directory maintenance to mobile device management. Of course the list goes on – WhatsUp Gold being the next popular choice, for example.
These tools are popular primarily because they're accessible. The pricing is competitive, they're usually pretty easy to get running, they don't need vast banks of CPU, RAM and storage, and the vendor's there with support contracts if you need one.
The free ones
Graphite is one of the free ones out there. You can run it up on a fairly puny little machine with little to no knowledge, but trust me – you'll want to stick with a run-of-the-mill Linux distribution rather than, as I did initially, muck about getting it to fly on MacOS.
There's another really popular choice out there in the free monitoring market, though: Spiceworks. To say it's established itself in the market would be an understatement – I just looked back and found that my first review of the product (then release 1.7) was in June 2008, almost nine years ago. Now, although there are some paid-for options to the suite, you can genuinely monitor your world perfectly well with the free offerings, and you don't need to be a rocket scientist to use it.
One of the things you don't get with free software is someone on the end of a phone to help you. Some open-source providers make their living by selling commercial support for their free products (such as Canonical for Ubuntu and Red Hat for RHEL) but, generally speaking, it's down to online help and a support community. But you know what? I can't remember the last time I actually had to call a software vendor helpline to get assistance with a product – so maybe it doesn't really matter.
The socking big ones
Back in February 1996 I was sitting in IBM's offices in Raleigh, North Carolina, talking about ATM networking and how it was going to take over the world (in case you've not noticed, it didn't). As we sat there it was announced that IBM had just agreed an acquisition with Tivoli, the makers of network management software. Packages like Tivoli are best described as "big" and you tend to find them in the same sentence as "enterprise".
Just as millions of people buy Ford Focuses and VW Golfs, there's still a market for Range Rovers and BMW X5s – and the same applies in the monitoring software market. Now, of course, the likes of Tivoli are far more than just monitoring tools, but to be fair so are SolarWinds – at least they are these days, with the sideways growth in features and extra products that have sprouted from the system monitoring core.
Enterprise tools use big hardware and come with big price tags, and you'll generally have them supporting – and supported by – a big team. Interestingly you have to try quite hard to escape the clutches of IBM in this space too – the other obvious choice for enterprise monitoring is the excellent Netcool that was, er, bought by IBM in the 00s.
The unusual ones
Netcool brings us to our next category of monitoring applications: the ones that do something a bit unusual. And the standout example in this area is a thing called AIOps, from MoogSoft, which was founded by Phil Teem who wrote Netcool. Moogsoft's product does some very clever interpretation of the event log streams you throw at it, relating the contents to each other and deducing what's gone wrong in the way that a human would. So if it, say, sees Oracle throwing errors and the server shouting "out of space", it tells you that one is probably the cause of the other.
Packages like this are interesting but you'd employ them as well as, rather than instead of, one of the other categories. Yes, there's value in the additional information they provide but it's just that – additional.
Security Information and Event Management
We couldn't talk about monitoring without mentioning the most fashionable trend going round right now: SIEM. SIEM is all about taking event logs from around your infrastructure and bringing them together for collation and reporting. Part of the functionality overlaps to an extent with what Moogsoft does in that it takes multiple logs and attempts to give a level of visibility that's greater than the sum of the parts (e.g. higher than normal network traffic in an individual server might not be suspicious, but if a SIEM package saw it on ten different servers' event streams it might shout "DDoS").
Again, SIEM offerings supplement what one might call your "traditional" monitoring system. You still want the visibility that the latter gives you, and the functionality added by SIEM is giving you an extra layer of visibility, reporting and alerting.
The one thing you do need to be aware of with SIEM is that although there are free offerings (AlienVault being the one people generally pick) your network, server and storage setups will need to pack a punch because SIEM has lots of data to store and loads of processing to do in collating the messages it sees. Some of the commercial offerings ship as appliances for this reason, but there's nothing preventing you from adopting a software approach as long as you host it on decent kit.
The big question
Your monitoring setup is a game of two halves. At the basic level you'll have one of the first three sections: a paid-for or free mainstream monitoring solution that gives you all the basic functionality you need in order to keep the lights on and observe that the lights are staying on. If you're an enterprise, you're likely to look toward the bigger products in order to get the scalability they provide.
And then you have the option of layering on the extra functionality of the more specialist packages that give intelligent analysis and event collation. They're definitely not mandatory, but the value they can bring is tangible. As to whether to go for a commercial offering or a free one – you know what? I'd be sorely tempted to go open source, because there's really no reason not to. No proper support, and no Service Level Agreement? Who cares? You probably wouldn't use it anyway, and you'll find that once you've fought with the beast to figure out how to make it run, it'll hum along trouble-free. And in a free software world you can afford to adopt multiple packages and provide yourself with a wealth of functionality.
If you're using free software you can spend your money on the stuff that matters – the servers and network infrastructure to run it on. And although you probably don't need stuff like the ability to examine and tweak the source code – hardly anyone ever does anyway so it's no big deal – it doesn't matter if a product provides you with a feature you don’t need to exploit.
So in short: maximise the feature set, don't be frightened to use free software, and consider layering SIEM on top because it's a dead handy thing to do. ®