Was this quake AI a little too artificial? Nature-published research accused of boosting accuracy by mixing training, testing data

Academics, journal deny making a boo boo

maths teacher dodges paper balls

An academic paper published in Nature has been criticized by a data scientist – who found a glaring schoolboy error in the study when he tried to reproduce the machine-learning research.

The paper in question, published in August last year, describes how neural networks can be trained to predict the location of aftershocks following an earthquake. At first glance, it looks pretty respectable. The authors are from Harvard University, the University of Connecticut, and Google, in the US, and was published in Nature, a leading science journal, following peer review, after all.

When Rajiv Shah, a data scientist at Boston software house DataRobot, leafed through the paper, however, he became “deeply suspicious” after eyeing up the results: the neural network’s accuracy was unexpectedly high.

He attempted to reproduce the research, and found a major flaw: there was some overlap in the data used to both train and test the model. That means the software was given an unfair advantage. It's kinda like being told one of the answers of a exam, given the exam, and acing the matching question. It's a big no-no in deep learning. The problem, known as data leakage, means the results can be pretty much moot.

“Data leakage is bad, because the goal of a predictive model is to generalize to new examples,” Shah explained to The Register. “For a model to generalize, it should be tested on data that resembles the ‘actual world’."

"Typically this is done with a random sample of your data – the test set – which you never expose to the model," he added. "This ensures your model has not learned from this data and provides a strong measure to ascertain generalizability. With data leakage, the test set is not really independent and any metrics, therefore, cannot generalize to performance in the ‘actual world’.”

Essentially, it means the model is heavily overfitting to the training data. Its performance, therefore, looks promising because it’s being tested using the same data that it was trained on. Its accuracy is artificially high.

The researchers' feedforward neural network was trained by inspecting 131,000 seismic wave patterns from pairs of main earthquake shocks and their aftershocks. The data is split into grid cells that describe a fixed volume, the model predicts if an aftershock will occur at the center of each grid cell based on the impact of the seismic waves caused by the earthquake.

If the same seismic wave patterns are used to train and test the neural network then it’s unsurprising that it’ll be able to predict the aftershocks accurately. Feed it data from a new earthquake not in seen in the training data, however, and it probably won’t be able to guess as well. The claim that neural networks perform better than more traditional geological methods like the Coulomb failure stress change is simply not true, Shah argued.

Time to write a letter

He decided to email the boffins themselves, but was dismayed with the lack of response. So he confronted Nature’s editors. In a letter, he wrote: “These errors should be highlighted, as data science is still an emerging field that hasn’t yet matured to the rigor of other fields. Additionally, not correcting the published results will stymie research in the area, as it will not be possible for others to match or improve upon the results.”

Robot on road photo via Shutterstock

Nice 'AI solution' you've bought yourself there. Not deploying it direct to users, right? Here's why maybe you shouldn't

READ MORE

He hoped that his criticisms would be published in Nature’s Matters Arising section, a place where comments can be made after the peer review process. But Nature seems to have rejected Shah’s inputs after a stern response from the researchers themselves. Shah has made his letter to Nature, the researcher’s rebuttal, and Nature’s following response public via his GitHub account.

Phoebe DeVries, a postdoctoral fellow at Harvard University, and Brendan Meade, a professor of earth and planetary sciences also at Harvard, who worked on the original researched, admitted that their model was trained and tested on a subset of the same data, but downplayed its effects on the results.

“The network is mapping modeled stress changes to aftershocks, and this mapping will be entirely different for the example in the training data set and the example in the testing data sets, although they overlap geographically," the pair said.

"There’s no information in the training data set that would help the network before well on the testing data set - instead, the network is being asked in the testing data set to explain the same aftershocks that it has seen in the training data set, but with a different mainshocks. If anything, this would hurt [the] performance on the testing data set,” DeVries and Meade, wrote back to Shah.

“These comments were made without any scientific context. We are earthquake scientists and our goal was to use a machine learning approach to gain some insight into aftershock location patterns. We accomplished this goal. The authors of these comments do not - we will be disappointed if Nature publishes them,” they concluded.

A Nature referee decided to not include Shah’s comments to the Matters Arising section. “I do not feel that the central results of the study are compromised in any way, and I am not convinced that the commentary is of interest to [an] audience of non-specialists (that is, non machine learning practitioners),” according to a response received by Shah.

When The Register pressed Nature about the problem of data leakage, a spokesperson told us it couldn’t discuss anything further based on “confidentiality reasons.”

“For confidentiality reasons, we cannot discuss the specific history or review process of any Nature paper with anyone other than the authors. We treat all correspondence as confidential and do not confirm or deny any reports of submissions that may or may not have been made to us,” the spokesperson told us.

“When critiques are made about papers we have published, we look into them carefully following an established process. If the critique appears substantial, it is important that all material is scrutinised by experts in the field — the process of peer review — before any conclusions are drawn and any comments made.

"We recognize the importance of post-publication commentary on published research as necessary to advancing scientific discourse. Exceptionally interesting and timely scientific comments and clarifications on original research papers published in Nature may, after peer review, be published online as Matters Arising– sometimes alongside a Reply from the original authors."

DeVries declined to comment when approached by El Reg.

Shah said the paper highlights the “perceived unlevel playing field between the tech companies - [in this case] Google and their influence over academic research,” and hoped this would dispel the hype in deep learning. ®

Sponsored: Your Guide to Becoming Truly Data-Driven with Unrivalled Data Analytics Performance

SUBSCRIBE TO OUR WEEKLY TECH NEWSLETTER




Biting the hand that feeds IT © 1998–2019