SaaS data loss: The problem you didn’t know you had
Zero uncertainty
Later in the comment sequence he states: "There is zero uncertainty or doubt in my claims. You see a few years ago I executed a script... and was able to repeatedly and intentionally make the Centera lose an object silently with no notification back to the script or user.
"I have spoken to dozens of EMC Centera partners and customers who have experienced the same symptoms due to various product deficiencies not necessarily related to hash algorithms."
Bercovici has updated his blog entry with score-throughs thus: "In case you're wondering, this is NOT an issue with Symantec software. The root cause is the complex EMC Centera API and brittle internal Centera architecture which Symantec Enterprise Vault and other similar applications are forced to utilize for proper archiving functionality on that platform."
He also posted a later blog entry on Whistleblowing which revisited the topic and concluded: "It's unfortunate that EMC product complexity and uncertainty is increasing the risk of the very data it is entrusted to protect. It's time to stabilize and simplify your archiving environment."
With Symantec's Enterprise Vault and Centera, complex software is writing to a complex storage product. But that is a side issue, irrelevant in my own view. NetApp's ONTAP is a highly complex storage product and he's not suggesting we abandon its use because of its complexity. Simply put, Bercovici claimed the Symantec tech-note indicated Centera lost archived data when it did not. He was wrong.
Storage blogging by professionals well-informed about their own company's products, generally knowledgable about storage products, practices and processes, and also pre-disposed to criticising competing vendors' products, can be a great generator of light. But sometimes the heat and smoke flowing from a blog controversy completely obscures the light.
As far as I can ascertain EMC's Centera, like NetApp's SnapLock, does not lose stored data. ®
COMMENTS
The plot thickens indeed!
Vinanti (or should I call you FemmeFatale?) - thanks for chiming in here (and on my blog) with relevant objective technical detail!
This is precisely the kind of background info that explains my position against EMC's opaque stance regarding this issue. True to form, EMC's bloggers are now busy shutting down comments on their related blogs just as EMC's PR people did years ago when this Centera silent data corruption issue was first exposed - then covered up by the IT media.
Unfortunately, it's the innocent EMC Centera customers and archive software partners (like Symantec) that now have to live with this Archive Russian Roulette scenario. They'll never know what data went missing forever until they try to retrieve it.
For all those who used the default EMC Centera configurations of collision detection OFF with SIS, I strongly recommend following the "Next Steps" listed on my blog -
http://blogs.netapp.com/exposed/2009/01/emc-centera-cus.html
Wrong and Right
Chris, you're on the wrong side of this issue, but for technically innocent reasons. Val has exposed a malicious attack scenario (involving user-generated MD5 collisions) which archiving developers like me had never accounted for back in 2003/2004 when we developed our initial integration with the Centera API.
Bottom-line, all of the early archiving implementations on Centera upto version 1.2 (including KVS, IXOS, iLumin, EDUCOMM, et al) are vulnerable to this data loss scenario because EMC configured collision detection OFF as the default in order to enable the popular Single Instance Storage (SIS) feature.
That means this troublesome Symantec KB article has relevance on the Centera side of the equation, not just the EV side.
See my latest update on why and how:
http://blogs.netapp.com/exposed/2009/02/its-never-the-u.html?cid=148736695#comment-148736695
The Exposure Continues
Hello Coward and other commenters,
Please do keep the comments coming! My goal is to add exposure to the key topic of compliance archive data integrity, not to win tete-a-tete battles over 3rd party knowledgebase semantics.
Transparency on this topic is very important to me, and I've decided putting up with online abuse is a small price to pay for the increased customer trust this exercise will result in once disturbing veils of secrecy around EMC Centera data integrity are finally removed.
-Val.
http://blogs.netapp.com/exposed/2009/02/its-never-the-u.html

IT infrastructure monitoring strategies
Agentless Backup is Not a Myth
Top 10 SIEM implementer’s checklist
Steps to Take Before Choosing a Business Continuity Partner
Enabling efficient data center monitoring