Storage vendor bloggers - losing data or losing the plot?
So does EMC's Center really lose data?
Magic Quadrant for Enterprise Backup/Recovery
Comment Store data in EMC Centera and lose it - that's the claim of NetApp blogger Val Bercovici, and he cites a Symantec support document entitled 'Archiving items in Enterprise Vault may result in an extremely rare data loss situation' to prove it.
Bercovici, who works in the office of the CTO (Chief Technology Officer) at NetApp, blogged last week: "When Symantec publicly declares that EMC Centera (and only EMC Centera) is vulnerable to data loss, the entire industry - and most importantly archiving customers need to stand up and listen."
Indeed it should - except Bercovici was wrong. He was misled by a Symantec support document (tech-note) referring to its - Enterprise Vault product - which has now been updated.
The original version, up to January 26, stated there would be: "Potential data loss if archiving to Centera and running any of the known affected Enterprise Vault versions outlined below (see the What is Affected section)."
The revised version says: "There is a potential for data loss with the known affected Enterprise Vault versions outlined below when configured with an EMC Centera storage partition. The data loss occurs as part of the archiving process but before storing on Centera. (See the What is Affected section)."
Symantec spokesperson Cory Edwards said: "This tech note relates to a known issue with [Symantec's] Enterprise Vault, not EMC Centera This issue is limited to customers running Enterprise Vault and Centera together and is extremely rare. It is limited to a small percentage of archived information which is not converted into a text-based format (eg jpeg). Symantec is working with any EV/Centera customers who might have this issue.
"All customers running Enterprise Vault 8.0 or the latest service pack of older versions of Enterprise Vault will not experience any problems. The tech-note provides appropriate measures for customers to identify the issue and take corrective action."
Bercovici's blog declaring that the Centera users and supporters should consider the whistle blown sparked a vigorous, even aggressive discussion in its comments section. The back story here is that there are strong differences between NetApp and EMC concerning the two companies' approaches to storage, benchmarking and capacity usage. Both have strongly enthusiastic evangelising bloggers, with Chuck Hollis, Global Marketing CTO, being EMC's vigorous champion.
Bercovici took the stance that the Symantec tech-note was the nail Centera's coffin needed, and mentioned previous worries over the content-addressing algorithm used and the possibility of it not generating dependably unique strings to identify data absolutely, also a silent data corruption issue. He put it like this: "Evidence emerged which raised questions about the integrity of EMC's commitment to the integrity of their customers' archives... NetApp has never knowingly sacrificed the integrity of data archived by SnapLock and consequently enjoys a reputation of responsibility in this regard."
The comments took exception to this moral high-grounding and to the pinning of data loss fault to EMC when it was in fact down to a Symantec software bug.
Hollis commented: "Val, you don't seem to be able to write anything unless you're attempting to beat up someone, usually EMC. You've gotten in trouble before with this sort of behavior. Looks like you're heading down the exact same road again."
Bercovici replied to Hollis, saying: "It seems the only trouble I get into is with you and your selectively delicate sensibilities, forcing you to autocratically censor comments from me (with condescending reminders no less) which you don't want appearing on your blog. Regardless, I consider it a compliment either way. I look forward to observing how much EMC is willing to sacrifice your relationship with Symantec here (by throwing them under the bus) in order to cover up the truth about Centera's obvious data porosity."
He confirmed his belief about the supposed data porosity in a later comment: "I can assure you we did a 360 on this Symantec Alert with many experts on all sides (including EMC customers). All of us came to the same conclusion - under certain circumstances (such as recovery from disk or node failure) the Centera will drop archived objects without notification or the ability to retrieve them. This seems to be precisely the use-case highlighted by the Symantec Alert."
(They were wrong, all of them.)
Next page: Zero uncertainty
COMMENTS
The plot thickens indeed!
Vinanti (or should I call you FemmeFatale?) - thanks for chiming in here (and on my blog) with relevant objective technical detail!
This is precisely the kind of background info that explains my position against EMC's opaque stance regarding this issue. True to form, EMC's bloggers are now busy shutting down comments on their related blogs just as EMC's PR people did years ago when this Centera silent data corruption issue was first exposed - then covered up by the IT media.
Unfortunately, it's the innocent EMC Centera customers and archive software partners (like Symantec) that now have to live with this Archive Russian Roulette scenario. They'll never know what data went missing forever until they try to retrieve it.
For all those who used the default EMC Centera configurations of collision detection OFF with SIS, I strongly recommend following the "Next Steps" listed on my blog -
http://blogs.netapp.com/exposed/2009/01/emc-centera-cus.html
Wrong and Right
Chris, you're on the wrong side of this issue, but for technically innocent reasons. Val has exposed a malicious attack scenario (involving user-generated MD5 collisions) which archiving developers like me had never accounted for back in 2003/2004 when we developed our initial integration with the Centera API.
Bottom-line, all of the early archiving implementations on Centera upto version 1.2 (including KVS, IXOS, iLumin, EDUCOMM, et al) are vulnerable to this data loss scenario because EMC configured collision detection OFF as the default in order to enable the popular Single Instance Storage (SIS) feature.
That means this troublesome Symantec KB article has relevance on the Centera side of the equation, not just the EV side.
See my latest update on why and how:
http://blogs.netapp.com/exposed/2009/02/its-never-the-u.html?cid=148736695#comment-148736695
The Exposure Continues
Hello Coward and other commenters,
Please do keep the comments coming! My goal is to add exposure to the key topic of compliance archive data integrity, not to win tete-a-tete battles over 3rd party knowledgebase semantics.
Transparency on this topic is very important to me, and I've decided putting up with online abuse is a small price to pay for the increased customer trust this exercise will result in once disturbing veils of secrecy around EMC Centera data integrity are finally removed.
-Val.
http://blogs.netapp.com/exposed/2009/02/its-never-the-u.html

IT infrastructure monitoring strategies
Requirements Checklist for Choosing a Cloud Backup and Recovery Service Provider
Data control in the cloud
Cloud based data management
Enabling efficient data center monitoring