Feeds

Building big data? Are you building a security headache too?

I didn't mean to isn't good enough

Application security programs and practises

The world and its dog has been shocked by the Prism news story. Early in June, we found out that the US National Security Agency (NSA) had developed a secret data-gathering mechanism to steal all our data and store it in a large data warehouse.

We are outraged that it is being mined, searched and otherwise prodded. But do we really think that big data security problems stop at Google, Facebook, Microsoft and Fort Meade?

The private sector has been collecting data on all of us for ages. It is stored in massive data sets, often spread between multiple sources. What makes us think this is any more secure? At least the NSA is well trained in keeping it all under lock and key.

Social trend

What does “big data” mean, anyway? Some describe it – wrongly – as simply a lot of data in a relational database. But if that were the case, then the security challenges would be the same as for conventional databases. And they aren’t.

Others view it as data sets so large that they cannot be handled by traditional relational tools. But we have had that kind of thing for years, in the form of data warehouses.

One difference is that modern large data sets often consist of far more varied data, including unstructured stuff such as tweets. Big data is inherently social, meaning that much of it is personal.

Big data is also supposed to perform better. Really large data sets can be tuned to look for “weak signals” – emerging trends that a traditional data warehouse-based business intelligence system may not have spotted.

The goal is also to have them work quickly, so that they can help companies predict and react to market trends efficiently. No more three-day turnarounds for specific reports here.

So big data is more complex, more flexible and faster. It is powerful stuff, but with power also comes risk. Big data carries unforeseen security consequences, warns Tony Lock, programme director at analyst firm Freeform Dynamics.

“Customers give you data to use for certain purposes, but they may not have allowed you to start crunching it to answer all kinds of questions,” he says.

Many companies have not considered those issues, he adds.

In practice, says PA Consulting IT specialist James Mucklow, this means you must have a clear policy, explaining what you are going to use customers’ data for.

Big data can provide deep-dive profiles of individuals by using sources that we are not always aware of. Take loyalty cards, for example.

The credit-card industry spends millions developing and enforcing data security and privacy guidelines for the storage of personal financial information. Anyone dealing with currency transactions of any sort is heavily regulated. But loyalty points are not currency and don’t face the same kinds of rules.

We know where you live

Yet loyalty card customers provide mounds of personal information, both directly and indirectly. They may hand over names and addresses, gender, phone number, birthday, and email addresses. Sometimes, they even reveal their income.

Even basic postal code information can enable companies to infer more information about you, based on the demographic data for your area.

Every purchasing decision can be tracked and sucked into a wider data set. Suddenly, data has gotten much bigger – and much more personal.

“You have to be sure that you are seen to be using the data in a responsible way,” says Mucklow. He outlines the story of US retailer Target, which figured out that a girl was pregnant before she had told her parents and let the cat out of the bag by sending her leaflets with advertising.

One of the biggest challenges for companies holding big data sets is that they are like the pan-dimensional, hyper-intelligent beings that built Douglas Adams’s computer, Deep Thought.

They asked the computer the meaning of Life, the Universe, and Everything. After 7.5 million years, it told them that the answer was 42 and it transpired that they didn’t really understand the question.

Mystery questions

Big data sets are massive pools of data, designed to answer questions that people don’t even know they want the answer to. It is tricky defining privacy policies that provide enough flexibility to make proper use of the data and enough privacy to ensure that customers are happy.

Ideally, all of this data would be rendered anonymous, but this can provide a false sense of security, warns Jamal Elmellas, technical director for Auriga, a security consulting firm.

“The mechanism you use to anonymise that data must be sufficiently robust to not breach the Data Protection Act but also leave the data in a state that is useful for what you want to achieve,” he says. “It is a very fine line.”

Unfortunately, companies get it wrong. Data ends up being “pseudo-anonymised”, he warns, making it relatively easy to reassemble into information that can help to identify individuals.

We have seen this before. Researchers used re-identification techniques to find user identities in an anonymous set of data published by Netflix in 2006. They matched that data to IMDb, a third-party source of movie reviews written by individuals.

This shows how big data’s biggest strength – its ability to derive data from different sources – is also its biggest security weakness.

Increased exposure happens when multiple data sources are brought together

“Different consumer organisations collect information with one-dimensional views of a consumer,” says Hunter Albright, CEO of Beyond Analysis, a consulting firm that specialises in big data.

“It has limited value and risk because of that. Increased exposure happens for an individual when multiple data sources are brought together.”

Application security programs and practises

More from The Register

next story
BBC goes offline in MASSIVE COCKUP: Stephen Fry partly muzzled
Auntie tight-lipped as major outage rolls on
There's NOTHING on TV in Europe – American video DOMINATES
Even France's mega subsidies don't stop US content onslaught
You! Pirate! Stop pirating, or we shall admonish you politely. Repeatedly, if necessary
And we shall go about telling people you smell. No, not really
Airbus promises Wi-Fi – yay – and 3D movies (meh) in new A330
If the person in front reclines their seat, this could get interesting
UK Parliament rubber-stamps EMERGENCY data grab 'n' keep bill
Just 49 MPs oppose Drip's rushed timetable
Samsung threatens to cut ties with supplier over child labour allegations
Vows to uphold 'zero tolerance' policy on underage workers
Dude, you're getting a Dell – with BITCOIN: IT giant slurps cryptocash
1. Buy PC with Bitcoin. 2. Mine more coins. 3. Goto step 1
US freemium mobile network eyes up Europe
FreedomPop touts 'free' calls, texts and data
prev story

Whitepapers

Reducing security risks from open source software
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
Application security programs and practises
Follow a few strategies and your organization can gain the full benefits of open source and the cloud without compromising the security of your applications.
Boost IT visibility and business value
How building a great service catalog relieves pressure points and demonstrates the value of IT service management.
Consolidation: the foundation for IT and business transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.