Feeds

Mimosa adds files to archive cocktail

Archive's dusty old barn is shaken to foundations

Next gen security for virtualised datacentres

Mimosa, an email archiving software company, is adding file archiving to its NearPoint product, this way striking out on a unified archiving strategy.

Once upon a time data protection meant backup to tape. Those simple times seem a long time ago now, with tape backup's front-end restore role and back-end archive role both under sustained attack from hard-drive based products. Disk-based backup, together with virtual tape libraries (VTL), has revolutionised the former backup role of saving files in case they were inadvertently deleted. Now you can continuously protect files and restore them to any point in time.

Disk-based archiving has spread like wildfire for email. There are also products for images, particularly in healthcare with X-rays and formal PACS (Picture Archiving and Communication System) products. There are enterprise content management (ECM) products such as Documentum, which archive files, unstructured or semi-structured data and have developed from enterprise document management systems.

We also have search, indexing, compliance and legal discovery products, such as Recommind, Autonomy-Zantaz, and Kazeon's Information Server. What we are seeing here is the coming together of various archive content silos and the layering across them of archive platform services to ingest data into a repository, index it, search it, extract subsets for compliance or legal discovery, and provide policy-driven retention services.

Archive stack

You can conceive of an archive stack comprising four layers:

  • Data-creation source such as email, PowerPoint, Word, a blog, etc.
  • Archive ingest software such as NearPoint which captures data and puts it into a repository,
  • Platform services to generate meta data on the content, single-instance or de-dupe it, search it, obtain and apply retention policies, and manage any storage tiering,
  • Storage hardware meaning hard drives, optical disks, and tape.

Mimosa fizzing

Public and private-sector organisations are drowning in a sea of data, and its volumes are rising unstoppably as they are loath to throw anything away and virtually every content-creating activity gets digitised. Mimosa has grown furiously as its customers adopt email archiving so as to tame the Exchange elephant.

UK MD Brian Bennett says the privately-held, 200-person company now has 525 customers, up from 300 a year ago. In the first quarter of this year it revenues were more than the whole of 2007. In the second quarter it doubled 2007 revenues, and it's on track to grow its business 300 percent this year. Guessing, its turnover must now be in the $20m-$50m range - Mimosa isn't saying.

Scott Whitney, Mimosa's product management VP, says NearPoint v3.5 will add file archiving from Windows' file shares, with other sources posibly added later. A Mimosa agent will crawl the file share looking for files that meet user-set policies defining things like file size, creation date, time since last access, and zap files that meet the criteria into the archive, leaving a stub behind if so desired. Once a file is archived it stays archived, with no 'ping-pong' as a user accesses the archive copy, causing a restore and then a re-archive once it stays unaccessed for enough time.

Files are single-instanced with any file archive candidate compared to previously archived files and email attachments. That means an old PowerPoint, already archived as an email attachment, won't take up fresh space in the repository. Instead there will just be a pointer to the existing entry.

Mimosa is using Stellent technology which means that the new NearPoint product will know of 300 filetypes and can search them properly. The product will have content monitoring capability to alert sysadmins if outbound material contains user-defined sensitive data, and it will have an existing eDiscovery capability applied to it as well.

A software development kit (SDK) will be made available to developers so that more specialised applications can have data ingested, indexed, searched, extracted or managed by the NearPoint archive.

Deduplication

What about de-duplicating an archive? Conceptually, de-duping file data in an archive is just de-duping data. But crawling through a de-duped archive to detect data you can safely throw away is something that will take a huge amount of CPU time and must needs be done with fanatical care.

Say there is a de-duped data element which is referenced by 10,000 pointers. If the element is deleted then 10,000 references to it are lost too. So the retention crawler must understand de-dupe element pointers and these must have their own unique metadata and retention criteria. Each one is a virtual archive file and must be treated as such by the retention crawler.

As archives grow in size they will hold billions of objects, the majority of which will have been de-duped down to pointers. Trawling these for deletion candidates will be a vital task.

Mimosa has deals with Plasmon and Data Domain to support their storage products. It is possible that de-duplication will move up the archive layer from storage hardware to archive platform services. Alternatively, we may see combinations of archive platform services and storage hardware emerging as single products.

The disk technology revolution sweeping through archiving is causing tectonic shifts amongst the suppliers of storage, email, backup software, data protection vendors, compliance, legal hold, eDiscovery, enterprise content management and document management products and services. What NearPoint is doing is sure to be replicated by other archive product suppliers as they react to the forces of convergence and unification sweeping the backup-to-tape-is-everything cobwebs from archiving's dusty old barn. ®

5 things you didn’t know about cloud backup

More from The Register

next story
The Return of BSOD: Does ANYONE trust Microsoft patches?
Sysadmins, you're either fighting fires or seen as incompetents now
Oracle reveals 32-core, 10 BEEELLION-transistor SPARC M7
New chip scales to 1024 cores, 8192 threads 64 TB RAM, at speeds over 3.6GHz
Microsoft: Azure isn't ready for biz-critical apps … yet
Microsoft will move its own IT to the cloud to avoid $200m server bill
Docker kicks KVM's butt in IBM tests
Big Blue finds containers are speedy, but may not have much room to improve
US regulators OK sale of IBM's x86 server biz to Lenovo
Now all that remains is for gov't offices to ban the boxes
Gartner's Special Report: Should you believe the hype?
Enough hot air to carry a balloon to the Moon
Flash could be CHEAPER than SAS DISK? Come off it, NetApp
Stats analysis reckons we'll hit that point in just three years
Dell The Man shrieks: 'We've got a Bitcoin order, we've got a Bitcoin order'
$50k of PowerEdge servers? That'll be 85 coins in digi-dosh
prev story

Whitepapers

Endpoint data privacy in the cloud is easier than you think
Innovations in encryption and storage resolve issues of data privacy and key requirements for companies to look for in a solution.
Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Solving today's distributed Big Data backup challenges
Enable IT efficiency and allow a firm to access and reuse corporate information for competitive advantage, ultimately changing business outcomes.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.