Google flu-finding service diagnosed with 'big data hubris'
Bad data contagion overwhelms prediction service
A paper in Science claims that Google Flu Trends, unveiled back in 2008 to become a poster-child of Big Data, has one teeny, tiny, fatal flaw: it's almost always wrong.
The paper – abstract here – finds that not only did Flu Trends completely miss the 2009 swine flu, but for 100 of the 108 weeks since 2011, Google's predictions of influenza outbreaks are simply wrong, and the reason is simple: nearly everybody thinks the slightest sniffle means they have influenza.
That's quite at odds with how Google described the project when it first unveiled the project. As by Johns Hopkins professor Steven Salzberg here, the Chocolate Factory originally claimed that “we can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day”.
As Salzberg notes, for the most recent week in which the Centre for Disease Control (CDC) has published data, only 8.8 per cent of specimens sent for testing returned a positive for influenza.
Talking to the Science podcast, here, researcher David Lazer of Northeastern University in Boston (and lead author of the paper) suggests one problem is that people – including highly-trained Oompa Loompas – love pattern matching, and that Google “overfit” the data.
“They ... overfit the data. They had fifty million search terms, and they found some that happened to fit the frequency of the 'flu' over the preceding decade or so, but really they were getting idiosyncratic terms that were peaking in the winter at the time the 'flu' peaks … but wasn't driven by the fact that people were actually sick with the 'flu',” he tells the podcast.
Having noted that this caused errors, Google ran a fix into Flu Trends, but since 2011, the system has been overestimating the number of 'flu' cases.
Calling this "big data hubris", Lazar says there were "certain assumptions baked into the analysis that doomed it in the long run". For example, he says, Google Flu Trends assumed a stable relationship between search terms and the incidence of influenza, which hasn't been the case.
Google's own search algorithms, which route someone from a 'flu' search to a suitable product, also play a part, Lazar says. Google search creates a kind of feedback loop which Google Flu Trends mistakenly interprets as an outbreak. ®