Silent server monitoring: A neat little cure that doesn't kill the patient
Log analytics has come a long, long way
It’s a typical day in IT. A quirky and poorly developed application meant to be on the testing server sneaks into production. Before anyone realises what's happened (something that can sadly take some organisations months) hundreds of external users are using it. Uh oh.
Suddenly, the infrastructure team finds out that this supposedly “test only” application is in use, and that people are not happy with the performance. Support tickets start piling up, asking: "Why is this slow?" and "Why are we not monitoring this?”
The infrastructure team has a problem that needs solving, and they can’t pull from production something that now has a user base.
Fortunately, a new class of monitoring tools are combining with predictive analytics to help systems administrations both resolve these situations and catch them as they arise. Hopefully before a user base forms!
Traditional monitoring tools require painstaking customisation to monitor each database connection, application service, and WMI counter. These existing tools often require expensive third-party modules to support critical application components, or have crippling limitations. For example, some will monitor VMware environments, but can’t follow a VM from one host to another!
Some contain complete blind spots. An example of this would be a tool that will monitor networking, but not the storage network. Traditional monolithic monitoring tools that try to monitor everything are expensive, incomplete, and often end up poorly deployed.
Vendors such as CloudPhysics want to aggregate information from thousands of environments to establish new baselines, and identify "cards" of information. These cards range from the tactical (VMs with out of date tools or open snapshots) to the strategic (amount of money that could be saved by reducing overcommitment of resources).
Rather than overwhelming you with thousands of graphs, they allow you to pick and choose the cards that will address today’s concerns.
Another, Thousand Eyes, provides fast and detailed information about the connections between you and your cloud providers.
In the old days, applications and clients stayed on our networks end-to-end. As applications move to the cloud, and external customers connect to internally hosted applications, isolating problems often ends up being a finger-pointing festival.
There's gold in them thar logs
Zenoss tries to take on the legacy big four monitoring (BMC, IBM, CA, and Hewlett-Packard) with an open-source spin. Rather than trying to work as a zombie of bolt-on products, Zenoss seeks to leverage open source development for extensions and plug-ins to monitor all of the devices.
Half of the battle of traditional real-time monitoring and management systems is hunting down SNMP MIBs and writing custom probes. Leveraging the bazaar and masses of other sysadmins working on plug-ins helps cut down the amount of duplicate work in making the product useful.
In closing the gap to not just viewing failure but predicting it, log analytics has come a long way. Splunk, SumoLogic, VMware LogInsight and other products promise to gobble up millions of log entries, and spin dashboards and proactive alerts of gold.
There are vendor-sponsored plug-ins, too, from EMC, Brocade and Cisco as well as database and Tomcat application-focused plug-ins make sure you are looking at those five critical warnings, and not the 500,000 entries devoted to one improperly configured and unused switch interface.
Big Panda can offer another layer of sanity by filtering your alerts to ensure you’re only getting the alerts that matter.
These new tools enable admins to quickly respond to angry users. They help predict impending data centre doom and (hopefully) justify a giant 50-inch television to show off their sea of green sensors.
Monitoring applications have undergone a silent revolution these past few years. Maybe it’s time we all checked out who’s new to the field. ®