3PAR developing united federation of clusters
Boldly going where no SAN has gone before
3PAR is actively developing technology needed to federate its InServ clustered storage arrays together and provide a single pool of resources across the federation.
3PAR believes that there is a practical limit to the scaling on the number of nodes in a cluster, due to the risk of a node failure taking the entire cluster down. However, the needs of cloud-type computing will inevitably mean total capacity has to scale beyond the limits of an InServ cluster. The way to do this, 3PAR thinks, is to federate its clusters together, but retain InServ manageability to prevent sysadmins having to get up close and personal with the federated cluster's operating controls.
Craig Nunes, 3PARS's marketing chief, said: "We want to get away from folks having to manage capacity in a box. So you need a fluid provisioning capability [with a] peer relationship between arrays in a metro area. Each understands where SATA capacity is, where Fibre Channel capacity is, etc [across the federation]. Folks get storage from a virtual pool... what we do in an array it is possible to do across arrays [clusters] in a metro area."
What sort of distances are we talking about? "Fibre Channel distances; miles apart," meaning a large campus and not long distance.
If users in such a federation were to move, then you could ensure that resources followed them.
Nunes doesn't favour the term "super-cluster": "No. We're actively investing in ways to hide and maybe eliminate the idea of a cluster for storage guy." Storage people might be wary of the complexity implied with a super-cluster, complexity they would have to deal with: "Federation is a less scary way to think about it."
Cluster-level failover across the federated nodes would be a desirable feature.
3PAR thinks federation is entirely practical. Nunes said: "It's a strategic concept for us... and a number of components are in active development today... We're going to be making an upcoming announcement about this." Expect something before the end of November."
3PAR SNW news
At SNW in Frankfurt, 3PAR announced improvements to its InServ product. A Persistent Cache facility will share out the duties of a failed cache in a 4-node or greater InServ array and so avoid the degradation of cache function currently seen with a cache failure in a node. There is a new disaster recovery facility with synchronous replication of changes to one InServ array to another local array and scheduled asynchronous replication to a remote InServ array.
If the primary site goes down then a replacement site comes into play, using the asynchronously replicated data plus the synchronously replicated data since the last asynchronous replication.
Lastly 3PAR has announced RAID MP, a faster RAID 6 technology that is only 15 per cent slower than RAID 10 in terms of performance but equivalent to RAID 5 (3 + 1) in terms of capacity. RAID MP uses 3PAR's third generation ASIC to achieve its speed and 3PAR says RAID re-build is "four times faster than traditional rebuilds." It supports double parity and will be able to support triple and quad parity in the future. ®
*NOT* just for the checkbox
If it's an ATA drive it's absolutely required these days, period end of story. I told both 3PAR & IBM (XIV product) to not talk to me about using ATA storage until you have that. It's not about how fast you do a rebuild, it's about how big of a rebuild you have to do. It's all about URE, I've got literally thousands of drives on the floor in the datacenter, you don't have 2x drives stop spinning, which is really where you care about how fast a rebuild is. A URE stands for "uncorrectable read error", which means that the drive thinks everything is fine and you make a request and it is unable to fulfil it. There is nothing the drive array can do about it, it's part of the drive. Goto Seagate, Hitachi, etc and look it up, most standard ATA drives have a drive manufacture failure rate of 10^14 (or ~12TB). So let's say I's using big raid5 groups 6+1 using 2TB drives (12TB usable). Statistically during a rebuild you are more likely to not to have some data loss. It might be a single 512byte bit, but that 512byte bit could include an oracle datafile, a critical bank transfer, or whitespace that you don't care about. Constant disk scrubbing minimizes this as it should find those failing sectors before the entire drive failure
I've personally had a raid5 drive failure + URE event in a raid event, only a single sector couldn't be rebuilt but it wrecked havoc and ultimately made ~30TB of other VTL data useless (the VTL app spread the writes around the array... very similar to 3PAR). So I'm not talking out my ass, not am I looking for simply a checkbox (note this was on 320GB pata drives so it was a few years ago).
I suggest you do some reading up, as it's a very real danger.
just for the checkbox
3PAR's architecture doesn't need RAID 6. They only came up with RAID 6 so they could put the check box on some customer's RFQs that say yes they do raid 6. Some customers are so dumb and narrow minded that t hey couldn't see past the fact that 3PAR didn't do RAID 6.
I'm sure at some point they can benefit from RAID 6, but probably not till you get to 5TB+ drives or something.
The distributed chunklet RAID based technology for the most part eliminates the chance of data loss during a double disk failure during a disk rebuild.
To put things in perspective, my array which is using about 121TB of raw storage(200 disks), has more than 80,000 individual RAID arrays on it. Disk rebuilds are upwards of 10x faster or more depending on the number of spindles in the array vs other systems, and have near zero performance impact while the rebuild is occurring. I simulated a drive failure in our early testing of the array by powering off a drive remotely.
The guy here who managed our previous array is constantly amazed that we've gone almost a year without a single disk failure(our array is 100% SATA, and the disks are *slammed* 24/7). The vibration absorbing drive sleds probably contribute a good chunk to that increase in reliability, I read an article here earlier in the year that talked about vibration being the #1 cause for disk failures.
NetApp R4 and RDP is a Raid 4 tech that sticks parity onto dedicated disks - this causes them to run very 'hot' in a busy system and can cause slowdowns. The 3Par system sounds more like a tweak of Raid 6 to work on the 3Par 'chunklet' system (check their site for the exact details or see http://bit.ly/2YzQK5 ). This means that loads are still striped across all disks in the array which boosts performance.
Source(s): 3Par user