The decentralisation effect
Server architectures and their impact in operations
Workshop This week’s poll spawned some very interesting responses. We asked you where you keep your servers, and how these relate to the kinds of issues you face.
Looking at locations first of all, you told us that in the majority of your organisations the machines that form the backbone of IT operations now reside in one or a small number of centralised data centres or computer rooms. Over 40 per cent of respondents now house the majority of servers in a single place, with just a few machines found in outlying locations.
Another quarter operates the bulk of server systems in more than one centralised server room/data centre. 23 per cent state that their architecture today places servers in one or more data centres with a number of smaller server rooms being used in branch or remote offices. Almost one in ten report that they do not have a data centre as such, but instead have servers located in multiple buildings in a geographically distributed approach.
When it comes to keeping these servers running, a fairly broad range of challenges were identified, and we’ve drawn out the top four here. As can be seen in the figure, the skilled IT personnel charged with keeping servers operational are under severe pressure. As might be expected the pressure on staff grows with the number of servers to be managed, but this is seen to be a challenge for even the smallest organisation.
The challenge highlighted next is, perhaps, something of a surprise, given that the mini poll is one looking at servers. Or maybe not! Some 40 per cent tell us that responding to the challenge of data growth is a matter high on the list of problems to be addressed. Looking a little deeper reveals that whilst this is seen as a challenge across organisations of all sizes, it figures highest for those with between 50 to 1,000 servers and slightly less so for those with even more machines to manage and for those in the 10 to 50 server bracket.
Looking at the figures in more detail gives us some very interesting insights, particularly around how the way that a data centre is structured has a bearing on the levels of challenges experienced. As shown in the figure below, by far the worst of both worlds seems to be experienced by the quarter of respondents who rely on a combination of central data centres and smaller server rooms. A full 60 per cent of respondents in this group felt overstretched – no doubt due to traipsing to and from sites just to fix the most mundane of issues.
This group also expressed a hike in the level of challenge around data backup and recovery. While this old chestnut is still causing concerns across the board for one organisation in four, the number increases to nearly 40 per cent for more distributed data centre environments. Still, as a core IT process and one that has been undertaken with varying degrees of success for many decades, will we ever be happy that we are getting it right?
Meanwhile, while we haven’t shown this as a chart for fear of chart fatigue, it is interesting to report that “mid-market” sized organisations appear to be having a slightly harder time than smaller businesses and very large enterprises. This finding is fairly representative across geographies and data centre/computer room approaches – which is a stark reminder of how this demographic is often seen as underserved when it comes to appropriately scaled management tools and approaches.
If you’re in this group and you have any feedback on this, or indeed any of the findings shown here, we’d be very interested to hear.
Missed that survey
I'm pretty good about keeping the reg up to date on what we do around here.
We have 4 active datacenters in this building housing servers. Including physical and virtual on small-iron boxes we've got about 3600 servers. We have a few other small datacenters around the country, but mostly those are for nothing more than user authentication and personal file storage, and some servers handling telephony systems; all our operational servers and business data are in a single location. We have 10 mainframes (z7 - z10) and a few 595s to go with them.
Now, as for those datacenters, they're all large, fully built out, raised floor environments, with dedicated air handling and power systems, what people typically think of as a "datacenter."
However, having systems in 4 rooms is logically no different from having them in a single room... They're all on the same backbone network, all have out-of-band connectivity to operational systems, and regardless of the room everything is supproted by 2 completely independent power systems, 2 seperate ISPs, and fully redundant network and SAN connectivity across seperate cables and switches, and essentially is one big pool of systems. A system failure on the rack, row, datacenter, or even building level will not cause an outage.
Data growth is an issue for us, but not because of data storage, its limitations in backup infrastructure... Oance you get to a point of using large scal Tier 1 storage systems, dedupe is inherent. Also mainframe virtualization uses single binary images for multiple systems so datagrowth there is limited. Database backup/replication is also not a challenge. The challenge is system level and file data backup, and managing legal hold, HIPAA, and SOX required data backups and archives. We have dozens of rack rows of nothing but IBM tape chassis for TSM. Actually, getting the data to tape is not the issue, its recovering a system... The sheer number of tapes required to restore a single system using TSM's backup methodology is rediculous (master once, incremental forwever is a BAD idea, really bad, but mastering all these systems on even a monthly rotation would nearly tripple our tape load.) We have plans to move tapeless (for internal recovery, resorting to tape only for archive) but its a more than $10M deployment, and with a big Win2K killoff in process, it was not in the 2009-2010 budget...
Please mentally insert a long and witty comment here that is somehow appropriate and relevant to the situation. I've had but 4 hours of shut-eye, a file server just ate itself, and one of the ESXi boxen just blew a DIMM. Oy, and the coffee's not even made yet!
Counting the minutes until pub O'clock...