VelociRaptor users bitten by false error bug
WD response distinctly un-raptorlike in its sluggishness
SaaS data loss: The problem you didn’t know you had
Western Digital has taken ten months to issue a firmware fix to a VelociRaptor bug, and even then has not made it generally available.
VelociRaptor drives are the very popular 2.5-inch SATA drives spinning at 10,000rpm and available with mounts to fit in 3.5-inch drive bays. A Register reader says the drive: "at least in models produced before 9 Feb 2009 (or possibly 21 August 2009), has a firmware bug which will cause it to report an error to its host after 49 days of continuous operation. If that host is a RAID card (motherboard SATA & direct O/S access seems to be more tolerant), it will probably be dropped from the RAID array. If your entire RAID volume is made of VelociRaptors, and they all powered on at the same time, they will all fail simultaneously."
No data is lost but users are inconvenienced.
WD classes the VelociRaptor as an enterprise drive with killer speed and rock-solid reliability.
Users reporting this bug to WD support found difficulties in getting support staff to respond to the fault. There is no download to fix the problem on WD's VelociRaptor download web page.
There is a thread on Storagereview.net showing problems being reported from November 2008 onwards. Later entries in the thread have posts revealing the bug and the firmware fix
One contact received what looks like a firmware fix note from WD, part of which is reproduced here:
Dear Valued WD Customer:As a result of product evaluation consistent with Western Digital's quality systems and our commitment to provide the highest quality products, an update is being made to the WD VelociRaptor product family.
Description of Change:
Performance enhancement: Sequential read and write Compatibility enhancements: Double status FIS, TLER timer and counter synchronization
These changes do not affect the form or fit of the drive but do positively affect the function of the drive.
Details of Firmware Changes:
Performance improvements to sequential read and write: Firmware release includes a function to allow the drive to stay in the sequential mode if there was no other activity. This particularly benefits certain SATA RAID environments.
Double status FIS:
The drive sent two status FIS (C001h and 5001h) after COMRESET. Although the drive behavior did not violate the SATA specification it still created issues for some SATA host bus adapters. This firmware will only post 5001h when the drive is ready.
TLER:
Corrected an issue where after 49 days of continuous operation the drive could falsely report an error to the Host. This issue does not result in data loss and only occurs once every 49 days of continuous operation and only if a read/write operation is in progress when the counter rolls to zero. The new firmware resolves this issue but as an alternate workaround, the drive can be power cycled before 49 days elapses to avoid the potential error.
Field Update Utility/Binary available July 24, 2009. Manufacturing Implementation Date: August 21, 2009
TLER stands for Time-Limited Error Recovery.
Our contact has an e-mail exchange with WD support staff where his main contact is surprised to find that another WD support person has sent out a firmware update to fix the problem, saying: "Where did this guy get an actual firmware update for the VR drives? I thought it was a violation of company policy to send this out even if one were available. The only utility available for the RAID issue on VR drives is the utility that you provided to us from engineering. Why do they provide you with one utility and then give customers the actual firmware?"
The apologetic response to our contact included this: "If you have one of our (VelociRaptor) WDxxxHLFS or BLFS -00 or -01 dated prior to 2/9/2009 you would want to run the FW UPD included in this email to address the issues pointed out."
If readers with VelociRaptors are having similar problems then getting hold of this firmware would seem to be a good idea. WD support says: "This issue has been corrected and does not affect any of the VelociRaptor drives manufactured after February 9 2009 or any drives that were received from RMA services after that date."
Taking ten months to come up with a fix seems somewhat slow, and it is odd that the fix is not publicly available. Our VelociRaptor customer said: "I’m not going to personally blame (the WD support people) for following orders… Or being confused about their orders, but WD should really get its act together here and publish the firmware. The other major issue is that they clearly didn’t sufficiently test the VelociRaptor before labelling it as Enterprise."
WD was contacted on Friday morning but has been unable to provide any comment so far. ®
COMMENTS
Thank you El Reg for helping me make a decision
Because of my recent upgrade to i7 hardware I have accepted that having a 10,000+ RPM system drive actually makes sense, and after negotiating with an on-line supplier I was literally a day or two away from ordering some VelociRaptors.
I had some previous very bad experiences with WD drives, (after using them exclusively for some years), and it took quite a few gulps before I made the decision. Now this!
I am going to bite the bullet and get some SCSI controllers and drives - probably Cheetahs. Hang the cost. A system drive is too important to leave to idiot "market leaders" who think company policy is more important than customers.
That reminds me of a letter I faked to my boss who was suffering with his Raptors...
Dear Western Digital and Ingram Micro Customer,
IMPORTANT INFORMATION REGARDING WESTERN DIGITAL RAPTOR HARD DRIVES PART NUMBERS WD3000HLFS AND WD1500BLFS
A batch of Western Digital Raptor branded hard drives have been recalled due to a serious hardware fault. The fault occurs when these hard drives are installed into some computer systems manufactured by HP, IBM and Dell. At first the hard drives get intermittently misreported by the BIOS, which can lead to an unbootable system. Over time the hard drives store an excessive amount of residual heat which can build up and cause thermal ratcheting, leading to a failure under constant kinetic strain of the “Platter Internal Stability System”.
Any hard drives you might have need to be returned to us as soon as possible so Western Digital can extract the Platter Internal Stability System hardware, look for errors and update the firmware to avoid Severe Heat Induced Tremors.
Please call us on 0844 800 3838 so we can give you a Consignment User Number Tag with which to return your unit(s).
Yours Sincerely,
Antione Balsup
Product Information Support Specialist
WD assumes no liability whatsoever
The firmware also came with this:
THE INFORMATION, FIRMWARE AND TOOLS, AND ALL ASSOCIATED UPDATES AND MODIFICATIONS (THE “INFORMATION”), CONTAINED HEREIN IS PROVIDED ON AN “AS-IS” BASIS AND WD ASSUMES NO LIABILITY WHATSOEVER WITH RESPECT TO SUCH INFORMATION UNDER ANY CONTRACT, NEGLIGENCE, STRICT LIABILITY OR OTHER LEGAL OR EQUITABLE THEORY, EVEN IF WD HAS BEEN ADVISED ON THE POSSIBILITY OF SUCH DAMAGES.
WD EXPRESSLY DISCLAIMS ALL EXPRESS AND IMPLIED WARRANTIES IN CONNECTION WITH THE INFORMATION CONTAINED HEREIN, INCLUDING WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

IT infrastructure monitoring strategies
Agentless Backup is Not a Myth
Top 10 SIEM implementer’s checklist
Steps to Take Before Choosing a Business Continuity Partner
Enabling efficient data center monitoring