Belt and braces stop the network falling down
Count everything, then double it
Lately, everyone seems to have lined up to join the network failure party. In some cases, lax network security has been to blame.
In others, upgrade issues coupled with fundamental design flaws have done the damage.
An inability to cope with denial-of-service attacks by angry internet mobs has even resulted in disruptions to networks that are arguably among the most important on the planet.
The failure of so many major corporations and government agencies to prevent network mishaps is far from an excuse for regular sysadmins to slack. Quite the opposite: these events highlight the general public's increasing unwillingness to forgive such breakdowns.
Whale of a time
Network outages lead to damning press coverage that can tarnish a brand for years, and even lead some to question the maturity of cloud computing. Network failure is so common that it even has its own whale mascot.
At a minimum, network redundancy requires two of everything: every switch, router, network card and cable on the network. Ideally, there should be three of everything. This allows you to take one set of equipment offline for scheduled maintenance with both a primary and backup remaining active.
Preventative maintenance touches on an element of network redundancy far more important than the choice of hardware. The single most important element of network redundancy is a philosophy.
We in IT don't get to live by the axiom "if it ain't broke don't fix it". It is always broken. Even when it doesn’t appear to be broken, there is bound to be a security flaw in the code somewhere that you will eventually have to patch.
Go forth and multiply
This means that while there should be three of everything, there really should also be four of almost everything. The fourth set is the sandbox: a research and development environment that needs be in place to run through configurations before deployment.
The internet is littered with stories of network admins who botched an update by forgetting a single character in a config file. If you are charged with running a mission-critical network, you really don't want that to be you.
This is complicated further if, like most sysadmins these days, you are not always on site when upgrades are taking place. Toying with network gear is uniquely risky because the network is your access point for remote administration.
Having a back door is important. It doesn't have to be a sexy one, just a functional one. If you run a massive hyperscale data centre, then you probably have multiple redundant fibre-optic links providing you with all sorts of ways into the data centre. Smaller organisations don't have this luxury.
Fax of life
The right equipment can help you here, and it is available dirt cheap or even free. Consider a branch office scenario. If the branch office has a fax line, you are in business.
A simple appliance called the Stick can take that fax line you are already paying for and enable you to dial in to an onsite bridgehead system in case of an emergency. Sure, dial-up is largely worthless, but it will pass an RDP session and it will reload that config file you screwed up.
It might seem anachronistic to go from talking about four layers worth of network redundancy to a dial-up modem as your last line of defence, but it is all part of the same philosophy.
If you cut this cable, how much trouble are you in?
Create a detailed network map including every device and link on your network. Consider one by one what happens if that device fails. If you cut this cable, how much trouble are you in? If you take a gun and shoot that switch, can your network survive?
The challenge is convincing the people with the money to invest in redundancy. They need to know what a network outage can mean to the bottom line. The hard part is changing a culture of "good enough" into one that understands the false economy of ignoring redundancy.
So go ahead and unplug that network cable. There's a backup...right? ®