Mmm, yes. 11-nines data durability? Mmmm, that sounds good. Except it's virtually meaningless

No one can agree on how it's calculated

Analysis What do data durability numbers mean? Azure brags 12 and even 16 nines durability, while Amazon S3, Google Cloud Platform and Backblaze tout 11 nines. What does this mean?

Data durability is a fancy way of promising you'll keep someone's data intact, and not allow the bits and bytes to degrade through media decay, drive loss, array loss, data center loss, power loss, or some other corrupting influence. Offering 99.999999999 per cent annual durability means you expect to lose 0.000000001 per cent of stored stuff a year.

There are two general ways to lengthen the data durability time. The first is to use algorithms, along with extra information about the data, to detect corruption and restore files and objects if some portions are lost to bit rot. Erasure coding is one such method. Reed-Solomon coding is another.

The second way is to store multiple copies of the data across multiple locations, allowing you to overcome individual drive and array failures all the way to data centers being flooded, torched by rioters, shattered by earthquakes, or eating a nuke. This is redundancy.

Given these two approaches are standard for hyperscale cloud giants, how do these providers calculate their data durability? Good question. We at least know the result represents the period of time you would need to wait before some data is lost. For example, Amazon states for its S3 cloud storage service:

Amazon S3 Standard, S3 Standard–IA, S3 One Zone-IA, and Amazon Glacier are all designed to provide 99.999999999 per cent durability of objects over a given year. This durability level corresponds to an average annual expected loss of 0.000000001 per cent of objects. For example, if you store 10,000,000 objects with Amazon S3, you can on average expect to incur a loss of a single object once every 10,000 years.

As you can see, it's defined here as the length of time we would have to wait before an object we store in S3 is lost, and represented statistically. Another way of saying it, according to Backblaze this week, is if you store one million objects for ten million years, you would expect to lose one file in that time. That's what 11 nines durability means, according to Backblaze.

Meanwhile, David Friend of Wasabi blogged late last year: "If you gave Amazon or Wasabi 1 million files to store, statistically they would lose one file every 659,000 years." Friend added that if you store 1PB of data, with 1.2 billion objects, with 11 nines of durability, you would lose 0.12 files per year, meaning one file lost every eight years. "The problem is that you won't know you've lost files until you try to use them," he noted.

Great, now we have different interpretations of 11 nines data durability. Which is right? And how is it really calculated? Well, here's the rub: there is no industry standard way to calculate it. We can't reliably compare AWS, Azure, Backblaze, GCP, or Wasabi data durability numbers to see which is best or worst or the costliest.

The probabilities of data loss can vary with the number of file and object fragments, drives used, failure rates, and rebuild time. Drive failure rates are tricky to factor in, as disks exhibit a bathtub curve effect – having a higher likelihood of failure when they are first turned on and at the forecasted end of their usable life.


Moshe's monster seven-nines disk box blooms


In explaining one way to calculate data durability, Backblaze CTO Brian Wilson presented a Poisson distribution method using drives that have, for simplicity's sake, a constant failure probability over their life. He assumed the average rebuild time to achieve complete parity for any given Backblaze B2 object with a failed drive is 6.5 days. Also the annualized failure rate of a drive is 0.81 per cent, which is cut to 0.41 per cent by having an outside agency, DriveSavers, recover some data from failed drives.

The annualized drive failure rate is therefore 0.0041 per cent. Backblaze can recover from three drive failures before the first drive is rebuilt. The result is 11 nines. Backblaze also published on GitHub its method for using a binomial distribution to calculate durability.

Wilson said it is likelier that other things will happen before a cloud storage system loses its data to bit rot; for example, a data center could be blown up during armed conflict. Earthquakes, floods, asteroids, pests, and other "acts of God" could destroy one or multiple facilities. Or there could be a prolonged credit card billing problem, and your account data is deleted as a result of non-payment. Whatever you can imagine happening, it's probably more likely than losing information to bit rot.

It's unlikely Amazon, Azure, and Google will reveal the basis of their data durability calculations just because minnow Backblaze shook a stick at them this week. The moral is that we're not necessarily comparing apples and oranges when looking at costs for 11 nines data durability from cloud storage providers. Sup their data with a long spoon. ®

Biting the hand that feeds IT © 1998–2018