Feeds

There's a tide of unstructured data coming - start swimming

Or you could just work out a plan...

Website security in corporate America

Whether you prefer to define the known size of our planet’s total digital universe in petabytes or even zettabytes, we can all agree the collective weight of data production is spiralling ever upwards.

While we focus on the relative merits of transactional versus analytical databases, the unstructured data that fails to fall within the general purview of either these systems is the rising tide beneath.

We are not just talking about non-textual audio, video and graphical data here. Unstructured data must also be thought of in its textual form of Word documents, emails, social media messages and other as yet undefined data shapes.

Different stakeholders view structured and unstructured differently. After all, in the world of video production it does not necessarily follow that all video data will be structured to those companies working with it.

Equally, textual information held in Word or other word processing applications may be regarded as unstructured if it does not align with the structure or access method of the database in which it is housed.

Unstructured data is defined by a combination of the data’s structure, the database or container structure holding the data, and the access method used to reach the data.

Without some form of reference, data value plummets like a stone

Love me, love my data

So how do we build procedures and policies for managing unstructured data? Just how swollen is the rising tide and where are the undertows that can suck us under?

How do we learn to love the new world of structured and unstructured data and live with both?

Do we need to exercise some almost chaos-theory like aptitude for data agility to get through? Would it be wise to hold unstructured data in a structured database but access it via unstructured methods?

The fact is that context will always rank as ace high, says Rob Bamforth, principal analyst at research firm Quocirca. He argues that without some form of reference, data value plummets like a stone.

“This context has to be applied to the data as stored (in the form of metadata, tags, or anything to provide some context that can be built upon), otherwise it is applied when accessed, even by unstructured methods,” he says.

“For example, a Google search may appear as a complete open search of all the unstructured data on the internet, but in reality it is the specific product of the search and ranking algorithms used.

"Plus we need to factor in how far and fast the web spiders have trawled the data that appears to be available at any moment.”

Staying with the Google example, we need to remember that Google typically determines the value of the data rather more often than the content provider, such as the journalist writing this, for example.

Define the context

Whoever defines the context adds the value to the data – and it could also come from how different forms of data are combined.

As another example, if a government agency were to combine sufficient quantities of essentially public and shared data in such a way that its value increases dramatically and becomes secret intelligence, then once again we have brought structure to bear upon chaos.

So is the unstructured data tsunami is out of control?

In a recent survey carried out by Unisphere and MarkLogic, 86 per cent of respondents said unstructured data is important to their organisation, but only 11 per cent had clear procedures and policies for managing it.

Andrew Anderson, CEO of information stream company Celaton, suggests that forward-thinking organisations are starting to use artificial intelligence and automation in making sense of unstructured data.

“Those who are still relying on human interpretation will be trying to stay afloat on the unstructured data tsunami with one hand tied behind their back,” he says.

Protecting against web application threats using SSL

More from The Register

next story
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
iOS 8 release: WebGL now runs everywhere. Hurrah for 3D graphics!
HTML 5's pretty neat ... when your browser supports it
Mathematica hits the Web
Wolfram embraces the cloud, promies private cloud cut of its number-cruncher
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Mozilla shutters Labs, tells nobody it's been dead for five months
Staffer's blog reveals all as projects languish on GitHub
SUSE Linux owner Attachmate gobbled by Micro Focus for $2.3bn
Merger will lead to mainframe and COBOL powerhouse
iOS 8 Healthkit gets a bug SO Apple KILLS it. That's real healthcare!
Not fit for purpose on day of launch, says Cupertino
Profitless Twitter: We're looking to raise $1.5... yes, billion
We'll spend the dosh on transactions, biz stuff 'n' sh*t
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.