DARPA calls Big Data boffins: Help us lock up everyone's privates
US gov reckons public datasets are threat to the State
The American military is looking for number-crunching wizards able to tackle the national security threat posed by, erm... publicly available data.
The Defense Advanced Research Projects Agency (DARPA) is searching for boffins to "measure the national security impact of public data and to defend against the malicious use of public data against national interests".
DARPA is apparently worried that enemy agents could use publicly available data to build up a map of their targets, using the information to prepare an attack aimed right at the unprotected soft bits of a public – or private – organisation.
So, DARPA wants data scientists to get in touch and propose new methods of protecting data from the bad guys.
The secret squirrel design agency wants to work out the best methods for "anonymization and de-anonymization of data sources", while developing tools and frameworks to "measure the national security impact of public data and to defend against the malicious use of public data against national interests".
DARPA said: "Could a modestly funded group deliver nation-state type effects using only public data? The threat of active data spills and breaches of corporate and government information systems are being addressed by many private, commercial, and government organizations. The purpose of this research is to investigate data sources that are readily available for any individual to purchase, mine, and exploit."
It continued: "Does the availability of data for purchase or for free... provide a determined adversary with the tools necessary to inflict nation-state level damage?"
It has long been known that the pen is mightier than the sword, but DARPA seems to be saying that numbers in spreadsheets could be as damaging as nukes.
DARPA cited the 2009 Netflix scandal as an example of how vulnerable targets are once their data is released into the wild. Netflix published supposedly anonymous information relating to the viewing habits of 480,000 customers as part of a $1m competition to improve its recommendation system.
But by joining a few digital dots, the supposedly anonymous information could be used to identify customers by name, leading to a lawsuit from a closeted lesbian who claimed the world might guess her sexual orientation from her rental choice of Brokeback Mountain and that this might negatively affect her professional life.
"An unintended consequence of the Netflix Challenge was the discovery that it was possible to de-anonymize the entire contest data set with very little additional data," DARPA added. "This de-anonymization led to a federal lawsuit and the cancellation of the sequel challenge. The purpose of this topic is to understand the national level vulnerabilities that may be exploited through the use of public data available in the open or for purchase."
Boffins whose application is successful will first be asked to investigate what data is currently available and which sets are the most vulnerable. They will then be asked to design a proof-of-concept device for sampling data from multiple sources and then providing automated feedback on how risky these numbers are.
Finally, DARPA wants to design a real-world tool that can monitor open source data sets in real time, measure vulnerabilities and then provide defensive countermeasures. This will then be used as the template for "a series of capabilities relevant to both government and commercial organizations to defend against threats due to the proliferation of purchasable or public data sets".
Of course, some of us might say the NSA already has a handle on how to use big data, seeing as the PRISM surveillance programme managed to collect the details of millions of people every day.
Still, if you're not bothered by the apparent lack of joined-up thinking among the world's most secretive government agencies, you can join in the race to become the world's first spreadsheet superhero by getting your application in to DARPA by 25 September. ®
Sponsored: Hyper-scale data management