Feeds

How NOT to evaluate hard disk reliability: Backblaze vs world+dog

Consumer drives beat data centre versions... Yeah, let's put that to bed

Internet Security Threat Report 2014

HPC blog A few months ago, Brian Beach, a distinguished engineer at cloud backup joint Backblaze, published a set of study-like blog postings relating to his firm's experiences with hard drive lifespan in its 25,000+ spindle environment.

The blogs garnered quite a bit of interest due to the subject matter, and provocative titles like: How Long Do Hard Drives Last?, Enterprise Drives: Fact or Fiction? and What Hard Drive Should I Buy? The blogs raise interesting questions and put forward controversial conclusions.

One of most contentious claims came from the first blog (El Reg's Simon Sharwood covers it here) where Beach asserts that consumer-grade hard drives are actually more reliable than their supposedly industrial strength (and definitely more pricey) enterprise drive cousins.

According to Backblaze's research, enterprise drives failed at an annual rate of 4.6 per cent vs. 4.2 per cent for the consumer versions.

The bottom line, according to Beach, is that consumer drives are a better choice (even after factoring in the longer enterprise warranty) due to their higher reliability and lower cost.

Even more contentious is the last blog, which showed Backblaze failure rates by drive manufacturer. The results were pretty stark, with an “Annual Failure Rate” chart that showed Hitachi drives at less than 2 per cent; WD spinners at around 3 per cent; and Seagate drives at an astounding 14 per cent for the 1.5TB flavour, ~9 per cent for 3TB, and a high 3.8 per cent or so for the 4GB version. Yikes! We should stay away from Seagate, then, right?

Analysis

A bit of digging into the firm's analysis reveals that the foundations underlying the Backblaze conclusions aren’t all that sturdy. Take the data centre vs consumer drive failure rate statistic, for example. To compute annual failure rates, Backblaze compares failures per "drive-years of service", which is the number of each type of drive they have multiplied by years of service – simple, eh?

The problem is that it is comparing 14,719 drive-years of service on its consumer disks vs only 368 drive-years of service on data centre-grade drives. Overall, the enterprise drives had 17 (4.6 per cent) failures while the consumer drives bricked 613 times (4.2 per cent).

This is a damned small sample on the data centre drive side of the equation. The difference between a 4.2 per cent and 4.6 per cent annual failure rates on 368 drive-years worth of service is only 1.5 spindles. Meaning that if only two more enterprise drives had survived, then their analysis would have shown data centre drives to be more reliable than consumer drives.

Moreover, Backblaze has only run the enterprise drives for two years, compared to the more than four years of mileage on their consumer disks. Beach does acknowledge this fact, but doesn’t see any reason to believe that their enterprise drives will become more reliable in the next three years or to the end of their warranty period.

So what hard drive should I buy? Tell me, tell me!

This blog post (What Hard Drive Should I Buy?) is the one that really got my attention. Looking at big colourful charts tells me that I should avoid Seagate drives like email from a Nigerian bureaucrat who’s just looking for a bit of help getting some money out of his country.

But the real story is a lot more nuanced and complicated. This article from Instrumental CEO Henry Newman does a great job of digging into the guts of the Backblaze analysis and pointing out the shortcomings in their approach.

Henry Newman is a bit of an institution in the HPC and storage world. He’s not what I’d call "reserved" when it comes to sharing his opinions – not a guy who pulls his punches. But he also backs up his opinions with facts and solid research, making him one of my go-to sources.

In his analysis of the analysis, Henry points out that Backblaze’s Seagate results are hugely skewed by two drive models – the 1.5TB Barracuda and Barracuda Green SKUs. Seagate publicly disclosed problems with this drive family back in 2008, so it’s not surprising that these drives have, well, problems, right?

There are also some issues with exactly how they’re evaluating the drives, how much traffic a consumer drive should be expected to handle, and things along those lines. It’s all pretty interesting stuff and points to the need for more rigorous research and testing when it comes to drives and reliability.

One final point: at the end of the “What Drives Should You Buy” post, Beach discussed what Backblaze is buying today. Right now, their most favoured drive is the ... wait for it ... Seagate 4TB Barracuda – even though it supposedly is less reliable than the WD or Hitachi drives. Huh?

Brian Wilson, CTO and founder of Backblaze (he shares a name with the 71-year-old Beach Boys front man - though we do not think they are one and the same) explains it this way:

Double the reliability is only worth 1/10th of 1 percent cost increase. I posted this in a different forum:

Replacing one drive takes about 15 minutes of work. If we have 30,000 drives and 2 percent fail, it takes 150 hours to replace those. In other words, one employee for one month of 8 hour days. Getting the failure rate down to 1 percent means you save 2 weeks of employee salary - maybe $5,000 total? The 30,000 drives costs you $4m.

The $5k/$4m means the Hitachis are worth 1/10th of 1 per cent higher cost to us. ACTUALLY we pay even more than that for them, but not more than a few dollars per drive (maybe 2 or 3 percent more).

Moral of the story: design for failure and buy the cheapest components you can. :-)

So the value of higher reliability – in their unique situation – isn’t nearly as much as one might think. Using Brian’s analysis above, this means that a drive that offered double the reliability of the 4TB Seagates (which currently cost around $160) is only worth an additional $.016 (yeah, sixteen cents) to Backblaze.

A quick check of Western Digital and Hitachi (now owned by WD) 4TB spindles reveals that retail prices of these are roughly $30 - $50 more than the Seagate alternative.

So what can we learn from all of this? I think the most important point is that you need to carefully evaluate your information sources. While it’s easy to say “do your own testing”, it’s just not practical in most cases. You’re going to have to rely on third-party sources of information to some extent.

When looking at user experiences, reviews, case studies, etc, you have to factor how they’re using the product and how well that lines up with your unique needs and requirements.

And remember, as always, that past performance doesn’t necessarily dictate future results and that your mileage will vary. Caveat emptor, y’all... ®

Internet Security Threat Report 2014

More from The Register

next story
The cloud that goes puff: Seagate Central home NAS woes
4TB of home storage is great, until you wake up to a dead device
Azure TITSUP caused by INFINITE LOOP
Fat fingered geo-block kept Aussies in the dark
You think the CLOUD's insecure? It's BETTER than UK.GOV's DATA CENTRES
We don't even know where some of them ARE – Maude
Intel offers ingenious piece of 10TB 3D NAND chippery
The race for next generation flash capacity now on
Want to STUFF Facebook with blatant ADVERTISING? Fine! But you must PAY
Pony up or push off, Zuck tells social marketeers
Oi, Europe! Tell US feds to GTFO of our servers, say Microsoft and pals
By writing a really angry letter about how it's harming our cloud business, ta
SAVE ME, NASA system builder, from my DEAD WORKSTATION
Anal-retentive hardware nerd in paws-on workstation crisis
prev story

Whitepapers

Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Managing SSL certificates with ease
The lack of operational efficiencies and compliance pitfalls associated with poor SSL certificate management, and how the right SSL certificate management tool can help.
Top 5 reasons to deploy VMware with Tegile
Data demand and the rise of virtualization is challenging IT teams to deliver storage performance, scalability and capacity that can keep up, while maximizing efficiency.