Perish the fault! Can your storage array take a bullet AND LIVE?
Sysadmin Trevor's gentle guide to protecting your data - and your career
Get this right and you'll be singing in the RAIN
RAIN is a redundant (or reliable) array of inexpensive Nodes. For a brilliant explanation I direct you to this video by Gene Fay of Nine Technology. Short version: RAIN copies your data across multiple individual computers for redundancy.
Seize the RAINs, keep your servers' data protected
There are many different implementations of RAIN out there today; this is a large part of what the kerfuffle over "big data" is all about. When you have conversations about HDFS, GlusterFS or Amazon's S3 you are talking about RAIN. In general, RAIN setups don't work like traditional file systems, although the Gluster team is building tech on top of GlusterFS that seeks to change this.
With most RAIN setups, your operating system doesn't mount them, you don't create NFS or SMB shares. If you really want to do those types of activities you need to be using virtual disks on top of the RAIN array using something like FUSE. At this point you're way out in the weeds and you should probably be reassessing the whole project. Still, if you really want to, you can be bizarre and run VMware virtual machines on Gluster via an NFS server translator.
While you can throw layers of translation of top of a RAIN setup in order to make it pretend to be a traditional disk, RAIN is generally for object (not file) storage. It's better to think of RAIN setups as really big databases rather than traditional file systems.
Of course, if ZFS or RAID underpin your storage layer, then what happens if I shoot the storage server? RAIN would seem to be resilient to the loss of an individual system, but there's nothing native to ZFS or RAID to deal with a bullet through the CPU.
This is where clustering comes in. An ideal deployment for fault tolerance would have two servers in bit-for-bit lock-step. In the free software world you are looking at DRBD with Linux or HAST with FreeBSD.
Assuming you have a solid hardware RAID underneath, Microsoft's Server 2012 is actually the basis for a very reliable cluster. Cluster Shared Volumes v2 is how I get my RAID 61: hardware RAID 6 on each node, mirrored. (I turn write caching off in order to ensure that I don't lose data in memory if a node dies. Slower, but safer.)
Combine that with Server 2012's new NFS 4.1 server, the iSCSI target or SMB 3.0 (which supports multichannel, transparent failover and node fault tolerance) and I can shoot one of my Microsoft servers without the VMware cluster that uses them for storage knowing anything's happened.
Speaking of VMware, they offer the vSphere Storage Appliance. It is a reliable technology for creating a storage cluster, however it only scales to three physical systems per storage appliance.
It's all rather a mess right now, isn't it?
If you are starting to sense some holes in feature availability here, you aren't alone. This is why storage vendors exist as separate entities. Honest-to-$deity fault-tolerant storage with open-source tools is an absolute pig to implement and Microsoft needs time to get all its technology ducks in a row. (It needs triple disk redundancy with ReFS on Cluster Shared Volumes scaling to hundreds of nodes before it is a real player.) VMware has the basic technology but it needs to scale quite a bit more before they are a real consideration.
This is why there are so many storage startups out there. It is also why the storage giants can still sell those big, expensive SANs. There is a lot to consider when planning your storage today, even if it is only for a single server. What you knew ten years ago doesn't really apply any more. What you knew five years ago is probably just enough to get you into trouble.
Of course, these technologies are for fault tolerance only. Fault tolerance is not a backup. If your data doesn't exist in at least two physical locations, then your data does not exist; make sure that on top of utilising the fault tolerant technologies discussed above that you have a proper backup plan. And remember: a fault tolerant system (or a backup) that hasn't been tested isn't any form of protection at all. ®