Feeds

Google flu-finding service diagnosed with 'big data hubris'

Bad data contagion overwhelms prediction service

  • alert
  • submit to reddit

The next step in data security

A paper in Science claims that Google Flu Trends, unveiled back in 2008 to become a poster-child of Big Data, has one teeny, tiny, fatal flaw: it's almost always wrong.

The paper – abstract here – finds that not only did Flu Trends completely miss the 2009 swine flu, but for 100 of the 108 weeks since 2011, Google's predictions of influenza outbreaks are simply wrong, and the reason is simple: nearly everybody thinks the slightest sniffle means they have influenza.

That's quite at odds with how Google described the project when it first unveiled the project. As by Johns Hopkins professor Steven Salzberg here, the Chocolate Factory originally claimed that “we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day”.

As Salzberg notes, for the most recent week in which the Centre for Disease Control (CDC) has published data, only 8.8 per cent of specimens sent for testing returned a positive for influenza.

Talking to the Science podcast, here, researcher David Lazer of Northeastern University in Boston (and lead author of the paper) suggests one problem is that people – including highly-trained Oompa Loompas – love pattern matching, and that Google “overfit” the data.

“They ... overfit the data. They had fifty million search terms, and they found some that happened to fit the frequency of the 'flu' over the preceding decade or so, but really they were getting idiosyncratic terms that were peaking in the winter at the time the 'flu' peaks … but wasn't driven by the fact that people were actually sick with the 'flu',” he tells the podcast.

Having noted that this caused errors, Google ran a fix into Flu Trends, but since 2011, the system has been overestimating the number of 'flu' cases.

Calling this "big data hubris", Lazar says there were "certain assumptions baked into the analysis that doomed it in the long run". For example, he says, Google Flu Trends assumed a stable relationship between search terms and the incidence of influenza, which hasn't been the case.

Google's own search algorithms, which route someone from a 'flu' search to a suitable product, also play a part, Lazar says. Google search creates a kind of feedback loop which Google Flu Trends mistakenly interprets as an outbreak. ®

Security for virtualized datacentres

More from The Register

next story
New 'Cosmos' browser surfs the net by TXT alone
No data plan? No WiFi? No worries ... except sluggish download speed
'Windows 9' LEAK: Microsoft's playing catchup with Linux
Multiple desktops and live tiles in restored Start button star in new vids
iOS 8 release: WebGL now runs everywhere. Hurrah for 3D graphics!
HTML 5's pretty neat ... when your browser supports it
Mathematica hits the Web
Wolfram embraces the cloud, promies private cloud cut of its number-cruncher
Google extends app refund window to two hours
You now have 120 minutes to finish that game instead of 15
Mozilla shutters Labs, tells nobody it's been dead for five months
Staffer's blog reveals all as projects languish on GitHub
SUSE Linux owner Attachmate gobbled by Micro Focus for $2.3bn
Merger will lead to mainframe and COBOL powerhouse
iOS 8 Healthkit gets a bug SO Apple KILLS it. That's real healthcare!
Not fit for purpose on day of launch, says Cupertino
Profitless Twitter: We're looking to raise $1.5... yes, billion
We'll spend the dosh on transactions, biz stuff 'n' sh*t
prev story

Whitepapers

Secure remote control for conventional and virtual desktops
Balancing user privacy and privileged access, in accordance with compliance frameworks and legislation. Evaluating any potential remote control choice.
WIN a very cool portable ZX Spectrum
Win a one-off portable Spectrum built by legendary hardware hacker Ben Heck
Intelligent flash storage arrays
Tegile Intelligent Storage Arrays with IntelliFlash helps IT boost storage utilization and effciency while delivering unmatched storage savings and performance.
High Performance for All
While HPC is not new, it has traditionally been seen as a specialist area – is it now geared up to meet more mainstream requirements?
Beginner's guide to SSL certificates
De-mystify the technology involved and give you the information you need to make the best decision when considering your online security options.