The Register® — Biting the hand that feeds IT

Feeds

Belt and braces stop the network falling down

Count everything, then double it

Cloud storage: Lower cost and increase uptime

Lately, everyone seems to have lined up to join the network failure party. In some cases, lax network security has been to blame.

In others, upgrade issues coupled with fundamental design flaws have done the damage.

An inability to cope with denial-of-service attacks by angry internet mobs has even resulted in disruptions to networks that are arguably among the most important on the planet.

The list goes on and on, and it would all be an amusing farce if the results were sometimes not so serious.

The failure of so many major corporations and government agencies to prevent network mishaps is far from an excuse for regular sysadmins to slack. Quite the opposite: these events highlight the general public's increasing unwillingness to forgive such breakdowns.

Whale of a time

Network outages lead to damning press coverage that can tarnish a brand for years, and even lead some to question the maturity of cloud computing. Network failure is so common that it even has its own whale mascot.

At a minimum, network redundancy requires two of everything: every switch, router, network card and cable on the network. Ideally, there should be three of everything. This allows you to take one set of equipment offline for scheduled maintenance with both a primary and backup remaining active.

Preventative maintenance touches on an element of network redundancy far more important than the choice of hardware. The single most important element of network redundancy is a philosophy.

We in IT don't get to live by the axiom "if it ain't broke don't fix it". It is always broken. Even when it doesn’t appear to be broken, there is bound to be a security flaw in the code somewhere that you will eventually have to patch.

Go forth and multiply

This means that while there should be three of everything, there really should also be four of almost everything. The fourth set is the sandbox: a research and development environment that needs be in place to run through configurations before deployment.

The internet is littered with stories of network admins who botched an update by forgetting a single character in a config file. If you are charged with running a mission-critical network, you really don't want that to be you.

This is complicated further if, like most sysadmins these days, you are not always on site when upgrades are taking place. Toying with network gear is uniquely risky because the network is your access point for remote administration.

Having a back door is important. It doesn't have to be a sexy one, just a functional one. If you run a massive hyperscale data centre, then you probably have multiple redundant fibre-optic links providing you with all sorts of ways into the data centre. Smaller organisations don't have this luxury.

Fax of life

The right equipment can help you here, and it is available dirt cheap or even free. Consider a branch office scenario. If the branch office has a fax line, you are in business.

A simple appliance called the Stick can take that fax line you are already paying for and enable you to dial in to an onsite bridgehead system in case of an emergency. Sure, dial-up is largely worthless, but it will pass an RDP session and it will reload that config file you screwed up.

It might seem anachronistic to go from talking about four layers worth of network redundancy to a dial-up modem as your last line of defence, but it is all part of the same philosophy.

If you cut this cable, how much trouble are you in?

Create a detailed network map including every device and link on your network. Consider one by one what happens if that device fails. If you cut this cable, how much trouble are you in? If you take a gun and shoot that switch, can your network survive?

The hard part is not the technology. It exists, it is proven and there are whitepapers for nearly any conceivable scenario. The hard part is the people.

The challenge is convincing the people with the money to invest in redundancy. They need to know what a network outage can mean to the bottom line. The hard part is changing a culture of "good enough" into one that understands the false economy of ignoring redundancy.

So go ahead and unplug that network cable. There's a backup...right? ®

Steps to Take Before Choosing a Business Continuity Partner

title?

"Either way, compared to the price of the kit, the price of driving down there, and the price of an hour's worth of outage, the price of a phone line and a modem might be entirely justifyable."

These sort of things are, of course, much easier to justify to bean-counters immediately *after* a prolonged outage, than when you're talking about a hypothetical outage...

1
0

Hmmph!

Good to see consideration of out of band management there: interesting use for a fax line. The emphasis on lab verification of changes is useful too. That being said, this sounds more of a wish list by someone who hasn't had to deal with either a) hard business realities or b) very complex network failures. I would hesitate to describe myself as any sort of network "guru", but I do build out and support data centres, offices, and WAN links of various sorts, including metro rings, and the biggest causes of long outages are usually poor design (usually too complex), poor software (Brocade gets a special mention here!), and carriers (no comment!): in that order. Doubling and tripling up redundancy can improve failure rates (although I should mention that most vendors won't load balance links properly with IGP's or port channels unless they can be divided by 2), but that ain't necessarily so. My worst outage involved a stray OSPF default screwing up a multi-homed site without out of band management: a single homed site would have had higher reliability over the same calendar year.

1
0
Anonymous Coward

Well, that can be arranged... by modem.

Anyhow, while it's not unreasonable to have to justify expenses, assessing the arguments requires domain knowledge that þe average olde tallyer of beanes just doesn't have. That in itself is a hidden source of mis-spending and thus costs. I say it would be interesting to find ways to fix that--being ignorance based, it won't fix itself.

0
0

More from The Register

SCO vs. IBM battle resumes over ownership of Unix
Zombie lawsuit back and wants to suck the brains out of Linux
 breaking news
What's HP got under wraps? Looks awfully flash and tape shaped
What happens in Vegas won't stay there - we've got the details
Microsoft borks botnet takedown in Citadel snafu
Stupid Redmond kicked over our honeypots, wail white hats
IBM's $1bn layoffs latest: Now axe swings in US, Canada - reports
Union claims 121 storage bods canned after dismal sales
NetApp musters muscular cluster bluster for ONTAP busters
Storage array OS overhauled to juggle more nodes, go down on you, er, less
HP adds 'Haswell' Xeon E3s to entry ProLiant servers
Gussies up MicroServer for SMBs, adds baby switches
Buffalo herds DDR3 RAMs into DriveStation's spinning rust corrals
Claims cache-packed gear keeps up with flash drives
'THINNEST EVER' spinning terabyte beauty slips out of WD fabs
Size-zero drive packs a whopping 143GB per millimetre