Feeds

Public genome databases can leak identity

Anonymity only goes so far

Intelligent flash storage arrays

Public genome data is a significant risk to individuals, according to research led out by Yaniv Elrich, a geneticist at the Whitehead Institute for Biomedical Research.

The team that Elrich led was able to de-anonymise genome data using only public information and careful Internet searches. A little chillingly, individuals could be associated with patrilineal genetic characteristics, even if they weren’t in the databases. A family member’s presence in the database can be enough, if they’re related in the male line and carry the same surname.

Working with data published in two public genomic databases, Ysearch and SMGF, Elrich demonstrated the privacy risk by matching chromosome data with 50 individuals, in a paper published in Science (abstract here, full paper available free with registration).

Among the genome data recorded in the databases is a genetic marker called “short tandem repeats” (for which genetic science hasn’t yet identified a specific purpose), which are passed down the male line.

As the paper notes, it had been assumed that listing surnames in the databases didn’t place individual identity at risk, since surnames “could match thousands of individuals”. However, the genome data has become a genealogy tool as well, in databases such as YBase.

DNA sequencing pioneer Dr Craig Venter volunteered as a test subject in the research. With only the relevant DNA sequence, Dr Venter’s age, and the US state where he lives, Erlich was able to retrieve just two possible records – one of which was Dr Venter.

With a known surname, the searches become even more accurate: “Combining the recovered surname with additional demographic data can narrow down the identity of the sample originator to just a few individuals,” Erlich states in the paper.

“Surname inference from personal genomes puts the privacy of current de-identified public data sets at risk”, it continues.

“In five surname recovery cases, we fully identified the CEU* individuals and their entire families with very high probabilities … data release, even of a few markers, from one person can spread through deep genealogical ties and lead to the identification of another person who might have no acquaintance with the person who released his genetic data”. ®

*CEU refers to a particular genetic dataset: “multigenerational families of northern and western European ancestry in Utah who had originally had their samples collected by CEPH (Centre d’Etude du Polymorphisme Humain)”. ®

Top 5 reasons to deploy VMware with Tegile

More from The Register

next story
Rosetta probot drilling DENIED: Philae has its 'LEG in the AIR'
NOT best position for scientific fulfillment
LIFE, JIM? Comet probot lander found 'ORGANICS' on far-off iceball
That's it for God, then – if Comet 67P has got complex molecules
'Yes, yes... YES!' Philae lands on COMET 67P
Plucky probot aces landing on high-speed space rock - emotional scenes in Darmstadt
HUMAN DNA 'will be FOUND ON MOON' – rocking boffin Brian Cox
Crowdfund plan to stimulate Blighty's space programme
THERE it is! Philae comet lander FOUND in EXISTING Rosetta PICS
Crumb? Pixel? ALIEN? Better, it's a comet-catcher!
SEX BEAST SEALS may be egging each other on to ATTACK PENGUINS
Boffin: 'I think the behaviour is increasing in frequency'
Post-pub nosh neckfiller: The MIGHTY Scotch egg
Off to the boozer? This delicacy might help mitigate the effects
I'M SO SORRY, sobs Rosetta Brit boffin in 'sexist' sexy shirt storm
'He is just being himself' says proud mum of larger-than-life physicist
NASA launches new climate model at SC14
75 days of supercomputing later ...
prev story

Whitepapers

Choosing cloud Backup services
Demystify how you can address your data protection needs in your small- to medium-sized business and select the best online backup service to meet your needs.
Getting started with customer-focused identity management
Learn why identity is a fundamental requirement to digital growth, and how without it there is no way to identify and engage customers in a meaningful way.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Security and trust: The backbone of doing business over the internet
Explores the current state of website security and the contributions Symantec is making to help organizations protect critical data and build trust with customers.
Storage capacity and performance optimization at Mizuno USA
Mizuno USA turn to Tegile storage technology to solve both their SAN and backup issues.