Feeds

Public genome databases can leak identity

Anonymity only goes so far

Build a business case: developing custom apps

Public genome data is a significant risk to individuals, according to research led out by Yaniv Elrich, a geneticist at the Whitehead Institute for Biomedical Research.

The team that Elrich led was able to de-anonymise genome data using only public information and careful Internet searches. A little chillingly, individuals could be associated with patrilineal genetic characteristics, even if they weren’t in the databases. A family member’s presence in the database can be enough, if they’re related in the male line and carry the same surname.

Working with data published in two public genomic databases, Ysearch and SMGF, Elrich demonstrated the privacy risk by matching chromosome data with 50 individuals, in a paper published in Science (abstract here, full paper available free with registration).

Among the genome data recorded in the databases is a genetic marker called “short tandem repeats” (for which genetic science hasn’t yet identified a specific purpose), which are passed down the male line.

As the paper notes, it had been assumed that listing surnames in the databases didn’t place individual identity at risk, since surnames “could match thousands of individuals”. However, the genome data has become a genealogy tool as well, in databases such as YBase.

DNA sequencing pioneer Dr Craig Venter volunteered as a test subject in the research. With only the relevant DNA sequence, Dr Venter’s age, and the US state where he lives, Erlich was able to retrieve just two possible records – one of which was Dr Venter.

With a known surname, the searches become even more accurate: “Combining the recovered surname with additional demographic data can narrow down the identity of the sample originator to just a few individuals,” Erlich states in the paper.

“Surname inference from personal genomes puts the privacy of current de-identified public data sets at risk”, it continues.

“In five surname recovery cases, we fully identified the CEU* individuals and their entire families with very high probabilities … data release, even of a few markers, from one person can spread through deep genealogical ties and lead to the identification of another person who might have no acquaintance with the person who released his genetic data”. ®

*CEU refers to a particular genetic dataset: “multigenerational families of northern and western European ancestry in Utah who had originally had their samples collected by CEPH (Centre d’Etude du Polymorphisme Humain)”. ®

Secure remote control for conventional and virtual desktops

More from The Register

next story
Gigantic toothless 'DRAGONS' dominated Earth's early skies
Gummy pterosaurs outlived toothy competitors
Vulture 2 takes a battering in 100km/h test run
Still in one piece, but we're going to need MORE POWER
TRIANGULAR orbits will help Rosetta to get up close with Comet 67P
Probe will be just 10km from Space Duck in October
Boffins ID freakish spine-smothered prehistoric critter: The CLAW gave it away
Bizarre-looking creature actually related to velvet worms
CRR-CRRRK, beep, beep: Mars space truck backs out of slippery sand trap
Curiosity finds new drilling target after course correction
'Leccy racer whacks petrols in Oz race
ELMOFO rakes in two wins in sanctioned race
Astronomers scramble for obs on new comet
Amateur gets fifth confirmed discovery
Boffins build CYBORG-MOTHRA but not for evil: For search & rescue
This tiny bio-bot will chew through your clothes then save your life
What does a flashmob of 1,024 robots look like? Just like this
Sorry, Harvard, did you say kilobots or KILLER BOTS?
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
Top 10 endpoint backup mistakes
Avoid the ten endpoint backup mistakes to ensure that your critical corporate data is protected and end user productivity is improved.
Top 8 considerations to enable and simplify mobility
In this whitepaper learn how to successfully add mobile capabilities simply and cost effectively.
Rethinking backup and recovery in the modern data center
Combining intelligence, operational analytics, and automation to enable efficient, data-driven IT organizations using the HP ABR approach.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.