Feeds

Public genome databases can leak identity

Anonymity only goes so far

Top 5 reasons to deploy VMware with Tegile

Public genome data is a significant risk to individuals, according to research led out by Yaniv Elrich, a geneticist at the Whitehead Institute for Biomedical Research.

The team that Elrich led was able to de-anonymise genome data using only public information and careful Internet searches. A little chillingly, individuals could be associated with patrilineal genetic characteristics, even if they weren’t in the databases. A family member’s presence in the database can be enough, if they’re related in the male line and carry the same surname.

Working with data published in two public genomic databases, Ysearch and SMGF, Elrich demonstrated the privacy risk by matching chromosome data with 50 individuals, in a paper published in Science (abstract here, full paper available free with registration).

Among the genome data recorded in the databases is a genetic marker called “short tandem repeats” (for which genetic science hasn’t yet identified a specific purpose), which are passed down the male line.

As the paper notes, it had been assumed that listing surnames in the databases didn’t place individual identity at risk, since surnames “could match thousands of individuals”. However, the genome data has become a genealogy tool as well, in databases such as YBase.

DNA sequencing pioneer Dr Craig Venter volunteered as a test subject in the research. With only the relevant DNA sequence, Dr Venter’s age, and the US state where he lives, Erlich was able to retrieve just two possible records – one of which was Dr Venter.

With a known surname, the searches become even more accurate: “Combining the recovered surname with additional demographic data can narrow down the identity of the sample originator to just a few individuals,” Erlich states in the paper.

“Surname inference from personal genomes puts the privacy of current de-identified public data sets at risk”, it continues.

“In five surname recovery cases, we fully identified the CEU* individuals and their entire families with very high probabilities … data release, even of a few markers, from one person can spread through deep genealogical ties and lead to the identification of another person who might have no acquaintance with the person who released his genetic data”. ®

*CEU refers to a particular genetic dataset: “multigenerational families of northern and western European ancestry in Utah who had originally had their samples collected by CEPH (Centre d’Etude du Polymorphisme Humain)”. ®

Security for virtualized datacentres

More from The Register

next story
Voyager 1 now EIGHTEEN LIGHT HOURS from home
Almost 20 BEEELION kilometres from Sol
Ex-Soviet engines fingered after Antares ROCKET launch BLAST
Speculation rife, but Orbital claims it's too early to tell
MEN: For pity's sake SLEEP with LOTS of WOMEN - and avoid Prostate Cancer
And, um, don't sleep with other men. If that's what worries you
Jim Beam me up, Scotty! WHISKY from SPAAACE returns to Earth
They're insured for $1m, before you thirsty folks make plans
ROGUE SAIL BOAT blocks SPACE STATION PODULE blastoff
Er, we think our ISS launch beats your fishing expedition
NASA: Spacecraft crash site FOUND ON MOON RIM
'What fun!' exlaims NASA boffin who found the LADEE
Comet Siding Spring revealed as flying molehill
Hiding from this space pimple isn't going to do humanity's reputation any good
BAE points electromagnetic projectile at US Army
Railguns for 'Future fighting vehicle'
prev story

Whitepapers

Why and how to choose the right cloud vendor
The benefits of cloud-based storage in your processes. Eliminate onsite, disk-based backup and archiving in favor of cloud-based data protection.
A strategic approach to identity relationship management
ForgeRock commissioned Forrester to evaluate companies’ IAM practices and requirements when it comes to customer-facing scenarios versus employee-facing ones.
Reg Reader Research: SaaS based Email and Office Productivity Tools
Read this Reg reader report which provides advice and guidance for SMBs towards the use of SaaS based email and Office productivity tools.
Saudi Petroleum chooses Tegile storage solution
A storage solution that addresses company growth and performance for business-critical applications of caseware archive and search along with other key operational systems.
Simplify SSL certificate management across the enterprise
Simple steps to take control of SSL across the enterprise, and recommendations for a management platform for full visibility and single-point of control for these Certificates.