Building big data? Are you building a security headache too?

I didn't mean to isn't good enough

High performance access to file storage

Companies have long understood how to classify certain data sets as sensitive or non-sensitive, for privacy purposes. Information tagging for example, has been a well-understood technique here. But they don’t always understand that seemingly non-sensitive data sets can become sensitive when combined.

“The way to combat this is to understand and define the desired business outcome before you collect or process the data,” says Elmellas.

“Know the lifecycle. By doing this, you can use the same traditional classification techniques but vary them as required throughout the project.”

Understanding the context in which the information is used is crucial in extracting the information you need from the huge piles of data you have collected.

Companies know this, just as the spooks up in Maryland and at GCHQ do. Context is also important for private enterprises wanting to manipulate big data in a secure way.

These challenges are difficult enough when dealing with big data inside your own domain. What about when you are shipping it out to third parties?

Don’t think you won’t. Logistics chains are a prime example, says Clive Longbottom, founder of analyst firm Quocirca.

Information may move from the retailer to the OEM manufacturer and the fulfilment company, for example. This data enables these stakeholders to deliver a product efficiently and also lets the customer track progress through a self-service portal. But companies must make sure that the information is being used sensibly at all stages.

“The information can (and should) be hashed with an identifier, rather than being stored with the personal identifiable data [PID] as it moves along the chain,” says Longbottom.

Weak links in the chain

Any PID is stored in the company’s database in hashed, encrypted form, he says, and the reference is then matched with a certificate to create a public token. That token is used if any stakeholder needs to see the PID related to the customer order – and the customer has to agree first.

“This also makes the sending of data outside a legal jurisdiction easier,” says Longbottom.

"India can work against the data to their hearts’ content, but they do not see anything that has PID in it. It is only the work packages that get returned and then, through use of the hashed security, add value to the data stored.”

So, you understand the subtle interplay of data classification, business process and risk? Good for you. But sometimes, a mere software bug can send things awry.

One of the biggest data sets of all is Facebook’s. It just ran foul of privacy rules after it accidentally divulged the personal information of six million loyal users.

When a user uses the Download Your Information feature in Facebook, the social network spits out all of that user’s data, including the phone and email addresses for any contacts they have uploaded to its address book.

The bug added any new address book information for those contacts uploaded by other users, enabling, say, abusive ex-partners to access a person’s new telephone number and email address.

Researchers found that uploading one public email address for an individual could harvest a dozen extra pieces of data about that person. The individual doesn’t even have to be a Facebook user.

Clearly, big data security leaks can come from many places. Misclassification of data is one, as is the ability to combine information from multiple sources. Simple software bugs are a third.

Companies collecting vast buckets of data about individuals may not intend to use it maliciously – but it doesn’t mean that others won’t.

By all means, build massive data sets and use them to find answers to questions you don’t even understand yet. But make sure you get your legal and computational ducks in a row before you start down this road. ®

High performance access to file storage

More from The Register

next story
Audio fans, prepare yourself for the Second Coming ... of Blu-ray
High Fidelity Pure Audio – is this what your ears have been waiting for?
Dropbox defends fantastically badly timed Condoleezza Rice appointment
'Nothing is going to change with Dr. Rice's appointment,' file sharer promises
MtGox chief Karpelès refuses to come to US for g-men's grilling
Bitcoin baron says he needs another lawyer for FinCEN chat
Did a date calculation bug just cost hard-up Co-op Bank £110m?
And just when Brit banking org needs £400m to stay afloat
Zucker punched: Google gobbles Facebook-wooed Titan Aerospace
Up, up and away in my beautiful balloon flying broadband-bot
Apple DOMINATES the Valley, rakes in more profit than Google, HP, Intel, Cisco COMBINED
Cook & Co. also pay more taxes than those four worthies PLUS eBay and Oracle
It may be ILLEGAL to run Heartbleed health checks – IT lawyer
Do the right thing, earn up to 10 years in clink
France bans managers from contacting workers outside business hours
«Email? Mais non ... il est plus tard que six heures du soir!»
prev story


Securing web applications made simple and scalable
In this whitepaper learn how automated security testing can provide a simple and scalable way to protect your web applications.
Five 3D headsets to be won!
We were so impressed by the Durovis Dive headset we’ve asked the company to give some away to Reg readers.
HP ArcSight ESM solution helps Finansbank
Based on their experience using HP ArcSight Enterprise Security Manager for IT security operations, Finansbank moved to HP ArcSight ESM for fraud management.
The benefits of software based PBX
Why you should break free from your proprietary PBX and how to leverage your existing server hardware.
Mobile application security study
Download this report to see the alarming realities regarding the sheer number of applications vulnerable to attack, as well as the most common and easily addressable vulnerability errors.