Feeds

Google flu-finding service diagnosed with 'big data hubris'

Bad data contagion overwhelms prediction service

  • alert
  • submit to reddit

Build a business case: developing custom apps

A paper in Science claims that Google Flu Trends, unveiled back in 2008 to become a poster-child of Big Data, has one teeny, tiny, fatal flaw: it's almost always wrong.

The paper – abstract here – finds that not only did Flu Trends completely miss the 2009 swine flu, but for 100 of the 108 weeks since 2011, Google's predictions of influenza outbreaks are simply wrong, and the reason is simple: nearly everybody thinks the slightest sniffle means they have influenza.

That's quite at odds with how Google described the project when it first unveiled the project. As by Johns Hopkins professor Steven Salzberg here, the Chocolate Factory originally claimed that “we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day”.

As Salzberg notes, for the most recent week in which the Centre for Disease Control (CDC) has published data, only 8.8 per cent of specimens sent for testing returned a positive for influenza.

Talking to the Science podcast, here, researcher David Lazer of Northeastern University in Boston (and lead author of the paper) suggests one problem is that people – including highly-trained Oompa Loompas – love pattern matching, and that Google “overfit” the data.

“They ... overfit the data. They had fifty million search terms, and they found some that happened to fit the frequency of the 'flu' over the preceding decade or so, but really they were getting idiosyncratic terms that were peaking in the winter at the time the 'flu' peaks … but wasn't driven by the fact that people were actually sick with the 'flu',” he tells the podcast.

Having noted that this caused errors, Google ran a fix into Flu Trends, but since 2011, the system has been overestimating the number of 'flu' cases.

Calling this "big data hubris", Lazar says there were "certain assumptions baked into the analysis that doomed it in the long run". For example, he says, Google Flu Trends assumed a stable relationship between search terms and the incidence of influenza, which hasn't been the case.

Google's own search algorithms, which route someone from a 'flu' search to a suitable product, also play a part, Lazar says. Google search creates a kind of feedback loop which Google Flu Trends mistakenly interprets as an outbreak. ®

Boost IT visibility and business value

More from The Register

next story
NO MORE ALL CAPS and other pleasures of Visual Studio 14
Unpicking a packed preview that breaks down ASP.NET
KDE releases ice-cream coloured Plasma 5 just in time for summer
Melty but refreshing - popular rival to Mint's Cinnamon's still a work in progress
Leaked Windows Phone 8.1 Update specs tease details of Nokia's next mobes
New screen sizes, dual SIMs, voice over LTE, and more
Another day, another Firefox: Version 31 is upon us ALREADY
Web devs, Mozilla really wants you to like this one
Put down that Oracle database patch: It could cost $23,000 per CPU
On-by-default INMEMORY tech a boon for developers ... as long as they can afford it
Secure microkernel that uses maths to be 'bug free' goes open source
Hacker-repelling, drone-protecting code will soon be yours to tweak as you see fit
Mozilla keeps its Beard, hopes anti-gay marriage troubles are now over
Plenty on new CEO's todo list – starting with Firefox's slipping grasp
prev story

Whitepapers

Implementing global e-invoicing with guaranteed legal certainty
Explaining the role local tax compliance plays in successful supply chain management and e-business and how leading global brands are addressing this.
The Essential Guide to IT Transformation
ServiceNow discusses three IT transformations that can help CIO's automate IT services to transform IT and the enterprise.
Consolidation: The Foundation for IT Business Transformation
In this whitepaper learn how effective consolidation of IT and business resources can enable multiple, meaningful business benefits.
How modern custom applications can spur business growth
Learn how to create, deploy and manage custom applications without consuming or expanding the need for scarce, expensive IT resources.
Build a business case: developing custom apps
Learn how to maximize the value of custom applications by accelerating and simplifying their development.