Mimosa adds files to archive cocktail
Archive's dusty old barn is shaken to foundations
Mimosa, an email archiving software company, is adding file archiving to its NearPoint product, this way striking out on a unified archiving strategy.
Once upon a time data protection meant backup to tape. Those simple times seem a long time ago now, with tape backup's front-end restore role and back-end archive role both under sustained attack from hard-drive based products. Disk-based backup, together with virtual tape libraries (VTL), has revolutionised the former backup role of saving files in case they were inadvertently deleted. Now you can continuously protect files and restore them to any point in time.
Disk-based archiving has spread like wildfire for email. There are also products for images, particularly in healthcare with X-rays and formal PACS (Picture Archiving and Communication System) products. There are enterprise content management (ECM) products such as Documentum, which archive files, unstructured or semi-structured data and have developed from enterprise document management systems.
We also have search, indexing, compliance and legal discovery products, such as Recommind, Autonomy-Zantaz, and Kazeon's Information Server. What we are seeing here is the coming together of various archive content silos and the layering across them of archive platform services to ingest data into a repository, index it, search it, extract subsets for compliance or legal discovery, and provide policy-driven retention services.
You can conceive of an archive stack comprising four layers:
- Data-creation source such as email, PowerPoint, Word, a blog, etc.
- Archive ingest software such as NearPoint which captures data and puts it into a repository,
- Platform services to generate meta data on the content, single-instance or de-dupe it, search it, obtain and apply retention policies, and manage any storage tiering,
- Storage hardware meaning hard drives, optical disks, and tape.
Public and private-sector organisations are drowning in a sea of data, and its volumes are rising unstoppably as they are loath to throw anything away and virtually every content-creating activity gets digitised. Mimosa has grown furiously as its customers adopt email archiving so as to tame the Exchange elephant.
UK MD Brian Bennett says the privately-held, 200-person company now has 525 customers, up from 300 a year ago. In the first quarter of this year it revenues were more than the whole of 2007. In the second quarter it doubled 2007 revenues, and it's on track to grow its business 300 percent this year. Guessing, its turnover must now be in the $20m-$50m range - Mimosa isn't saying.
Scott Whitney, Mimosa's product management VP, says NearPoint v3.5 will add file archiving from Windows' file shares, with other sources posibly added later. A Mimosa agent will crawl the file share looking for files that meet user-set policies defining things like file size, creation date, time since last access, and zap files that meet the criteria into the archive, leaving a stub behind if so desired. Once a file is archived it stays archived, with no 'ping-pong' as a user accesses the archive copy, causing a restore and then a re-archive once it stays unaccessed for enough time.
Files are single-instanced with any file archive candidate compared to previously archived files and email attachments. That means an old PowerPoint, already archived as an email attachment, won't take up fresh space in the repository. Instead there will just be a pointer to the existing entry.
Mimosa is using Stellent technology which means that the new NearPoint product will know of 300 filetypes and can search them properly. The product will have content monitoring capability to alert sysadmins if outbound material contains user-defined sensitive data, and it will have an existing eDiscovery capability applied to it as well.
A software development kit (SDK) will be made available to developers so that more specialised applications can have data ingested, indexed, searched, extracted or managed by the NearPoint archive.
What about de-duplicating an archive? Conceptually, de-duping file data in an archive is just de-duping data. But crawling through a de-duped archive to detect data you can safely throw away is something that will take a huge amount of CPU time and must needs be done with fanatical care.
Say there is a de-duped data element which is referenced by 10,000 pointers. If the element is deleted then 10,000 references to it are lost too. So the retention crawler must understand de-dupe element pointers and these must have their own unique metadata and retention criteria. Each one is a virtual archive file and must be treated as such by the retention crawler.
As archives grow in size they will hold billions of objects, the majority of which will have been de-duped down to pointers. Trawling these for deletion candidates will be a vital task.
Mimosa has deals with Plasmon and Data Domain to support their storage products. It is possible that de-duplication will move up the archive layer from storage hardware to archive platform services. Alternatively, we may see combinations of archive platform services and storage hardware emerging as single products.
The disk technology revolution sweeping through archiving is causing tectonic shifts amongst the suppliers of storage, email, backup software, data protection vendors, compliance, legal hold, eDiscovery, enterprise content management and document management products and services. What NearPoint is doing is sure to be replicated by other archive product suppliers as they react to the forces of convergence and unification sweeping the backup-to-tape-is-everything cobwebs from archiving's dusty old barn. ®