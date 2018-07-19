Analysis What do data durability numbers mean? Azure brags 12 and even 16 nines durability, while Amazon S3, Google Cloud Platform and Backblaze tout 11 nines. What does this mean?

Data durability is a fancy way of promising you'll keep someone's data intact, and not allow it to degrade through media decay, drive loss, array loss, data center loss, power loss, or some other corrupting influence.

There are two general ways to lengthen the data durability time. The first is to make multiple copies of portions of the data together with mathematical calculations that enable you to rebuild all of the data if some portions are lost. Erasure coding is one such method. Checksumming is another.

The second is to have multiple copies of data-storing drives in and across data centers to withstand progressively larger disaster zones spreading out from a drive to an array to a data center to data centers. This is redundancy.

Given these two things are standard in large and hyperscale cloud data centres, how do they calculate data durability?

It's represented as the period of time you would need to wait before some data is lost.

For example, Amazon S3:

Amazon S3 Standard, S3 Standard–IA, S3 One Zone-IA, and Amazon Glacier are all designed to provide 99.999999999 per cent durability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.000000001 per cent of objects. For example, if you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years.

It's defined here as the length of time we would have to wait before an object we store in S3 is lost, and represented statistically.

Another way of saying it, according to Backblaze, is if you store 1 million objects for 10 million years, you would expect to lose 1 file. That's what 11 nines durability means.

But is it? David Friend of Wasabi told us: "If you gave Amazon or Wasabi 1 million files to store, statistically they would lose one file every 659,000 years."

Sounds good, but actually it's not. He says if you store 1PB of data, with 1.2 billion objects, with 11 nines of durability, then you would lose 0.12 files/year, meaning one file lost every eight years. "The problem is that you won't know you've lost files until you try to use them."

We see here different interpretations of 11 nines data durability. Which is right?

We don't know. Given that it's statistics and probabilities, how is it calculated?

Again, we don't know – which means we can't reliably compare AWS, Azure, Backblaze, GCP or Wasabi data durability numbers to see which is best or worst or the costliest.

Backblaze CTO Brian Wilson points out there is no industry standard way to measure data durability. He has revealed how Backblaze computes its B2 storage service durability number, though.

The probabilities of data loss can vary with the number of file/object fragments, drives used, failure rates and rebuild time. Drive failure rates are tricky, as drives exhibit a bathtub curve effect - having a higher likelihood of failure when they are first turned on and at the forecast end of their usable life.

Should drive failure rates be viewed as a probability of continuous events taking place (mathematically modelled by the Poisson distribution) or as the probability of discrete events occurring (Binomial distribution)?

Wilson presents a Poisson distribution method using drives which have, for simplicity's sake, a constant failure probability over their life. He assumes the average rebuild time to achieve complete parity for any given B2 object with a failed drive is 6.5 days. Also the annualised failure rate of a drive is 0.81 per cent, which is cut to 0.41 per cent by having an outside agency, DriveSavers, recover some data from failed drives.

The annualised drive failure rate is 0.0041 per cent. Backblaze can recover from three drive failures before the first drive is rebuilt.

The result is 11 nines.

Backblaze has made the equivalent Binomial distribution calculation available on Github and the net result appears to be the same.

Wilson says it is likelier that other things will happen before B2 loses data, such as an armed conflict taking out data centres. Earthquakes, floods, pests and other "Acts of God" could destroy multiple data centres. Or there could be a prolonged credit card billing problem and your account data is deleted. Whatever.

It's unlikely Amazon, Azure and Google will reveal the basis of their data durability calculations just because minnow Backblaze shook a stick at them. The moral is that we're not necessarily comparing apples and oranges when looking at costs for 11 nines data durability from cloud storage providers. Sup their data with a long spoon. ®

