Feeds

Mimosa adds files to archive cocktail

Archive's dusty old barn is shaken to foundations

Internet Security Threat Report 2014

Mimosa, an email archiving software company, is adding file archiving to its NearPoint product, this way striking out on a unified archiving strategy.

Once upon a time data protection meant backup to tape. Those simple times seem a long time ago now, with tape backup's front-end restore role and back-end archive role both under sustained attack from hard-drive based products. Disk-based backup, together with virtual tape libraries (VTL), has revolutionised the former backup role of saving files in case they were inadvertently deleted. Now you can continuously protect files and restore them to any point in time.

Disk-based archiving has spread like wildfire for email. There are also products for images, particularly in healthcare with X-rays and formal PACS (Picture Archiving and Communication System) products. There are enterprise content management (ECM) products such as Documentum, which archive files, unstructured or semi-structured data and have developed from enterprise document management systems.

We also have search, indexing, compliance and legal discovery products, such as Recommind, Autonomy-Zantaz, and Kazeon's Information Server. What we are seeing here is the coming together of various archive content silos and the layering across them of archive platform services to ingest data into a repository, index it, search it, extract subsets for compliance or legal discovery, and provide policy-driven retention services.

Archive stack

You can conceive of an archive stack comprising four layers:

  • Data-creation source such as email, PowerPoint, Word, a blog, etc.
  • Archive ingest software such as NearPoint which captures data and puts it into a repository,
  • Platform services to generate meta data on the content, single-instance or de-dupe it, search it, obtain and apply retention policies, and manage any storage tiering,
  • Storage hardware meaning hard drives, optical disks, and tape.

Mimosa fizzing

Public and private-sector organisations are drowning in a sea of data, and its volumes are rising unstoppably as they are loath to throw anything away and virtually every content-creating activity gets digitised. Mimosa has grown furiously as its customers adopt email archiving so as to tame the Exchange elephant.

UK MD Brian Bennett says the privately-held, 200-person company now has 525 customers, up from 300 a year ago. In the first quarter of this year it revenues were more than the whole of 2007. In the second quarter it doubled 2007 revenues, and it's on track to grow its business 300 percent this year. Guessing, its turnover must now be in the $20m-$50m range - Mimosa isn't saying.

Scott Whitney, Mimosa's product management VP, says NearPoint v3.5 will add file archiving from Windows' file shares, with other sources posibly added later. A Mimosa agent will crawl the file share looking for files that meet user-set policies defining things like file size, creation date, time since last access, and zap files that meet the criteria into the archive, leaving a stub behind if so desired. Once a file is archived it stays archived, with no 'ping-pong' as a user accesses the archive copy, causing a restore and then a re-archive once it stays unaccessed for enough time.

Files are single-instanced with any file archive candidate compared to previously archived files and email attachments. That means an old PowerPoint, already archived as an email attachment, won't take up fresh space in the repository. Instead there will just be a pointer to the existing entry.

Mimosa is using Stellent technology which means that the new NearPoint product will know of 300 filetypes and can search them properly. The product will have content monitoring capability to alert sysadmins if outbound material contains user-defined sensitive data, and it will have an existing eDiscovery capability applied to it as well.

A software development kit (SDK) will be made available to developers so that more specialised applications can have data ingested, indexed, searched, extracted or managed by the NearPoint archive.

Deduplication

What about de-duplicating an archive? Conceptually, de-duping file data in an archive is just de-duping data. But crawling through a de-duped archive to detect data you can safely throw away is something that will take a huge amount of CPU time and must needs be done with fanatical care.

Say there is a de-duped data element which is referenced by 10,000 pointers. If the element is deleted then 10,000 references to it are lost too. So the retention crawler must understand de-dupe element pointers and these must have their own unique metadata and retention criteria. Each one is a virtual archive file and must be treated as such by the retention crawler.

As archives grow in size they will hold billions of objects, the majority of which will have been de-duped down to pointers. Trawling these for deletion candidates will be a vital task.

Mimosa has deals with Plasmon and Data Domain to support their storage products. It is possible that de-duplication will move up the archive layer from storage hardware to archive platform services. Alternatively, we may see combinations of archive platform services and storage hardware emerging as single products.

The disk technology revolution sweeping through archiving is causing tectonic shifts amongst the suppliers of storage, email, backup software, data protection vendors, compliance, legal hold, eDiscovery, enterprise content management and document management products and services. What NearPoint is doing is sure to be replicated by other archive product suppliers as they react to the forces of convergence and unification sweeping the backup-to-tape-is-everything cobwebs from archiving's dusty old barn. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Docker's app containers are coming to Windows Server, says Microsoft
MS chases app deployment speeds already enjoyed by Linux devs
'Hmm, why CAN'T I run a water pipe through that rack of media servers?'
Leaving Las Vegas for Armenia kludging and Dubai dune bashing
SDI wars: WTF is software defined infrastructure?
This time we play for ALL the marbles
'Urika': Cray unveils new 1,500-core big data crunching monster
6TB of DRAM, 38TB of SSD flash and 120TB of disk storage
Facebook slurps 'paste sites' for STOLEN passwords, sprinkles on hash and salt
Zuck's ad empire DOESN'T see details in plain text. Phew!
Windows 10: Forget Cloudobile, put Security and Privacy First
But - dammit - It would be insane to say 'don't collect, because NSA'
Oracle hires former SAP exec for cloudy push
'We know Larry said cloud was gibberish, and insane, and idiotic, but...'
Symantec backs out of Backup Exec: Plans to can appliance in Jan
Will still provide support to existing customers
prev story

Whitepapers

Forging a new future with identity relationship management
Learn about ForgeRock's next generation IRM platform and how it is designed to empower CEOS's and enterprises to engage with consumers.
Win a year’s supply of chocolate
There is no techie angle to this competition so we're not going to pretend there is, but everyone loves chocolate so who cares.
Why cloud backup?
Combining the latest advancements in disk-based backup with secure, integrated, cloud technologies offer organizations fast and assured recovery of their critical enterprise data.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.