World Cup stats fever - have you got the balls to win?
How to model your way to a result
No host nation has ever lost the opening game of the World Cup.
Now combine that nugget with knowing that South Africa is the lowest-ranked football team, by some distance, to have ever hosted the competition – the squad is 83rd in FIFA's rankings.
What do you think is going to happen when they face up to Mexico (ranked 17th) at 3pm (GMT+1) this afternoon in Johannesburg?
If you knew for sure you'd keep it to yourself and put some money on it at the bookies, preferably with a healthy wedge on the accurate goal score market. The starting line-up of each team will be key, as will other factors. The quest to forecast the results of this kind is part of man's long-held desire and ongoing research project to use history to quantify risk and basically predict the future. [For a look at some individual players, go here ]
Out of the 32 teams in the tournament, betting-wise South Africa is 18th favourite (using the market of the betting exchange Betfair ). There is only one team with a worse FIFA ranking in these finals, and that is North Korea, ranked 105th by FIFA and in last place by the punters.
So there are 13 teams whose historical performances would suggest they are better than South Africa (including Greece, Australia and Nigeria, FIFA ranked 13th, 20th and 21st respectively), but are all less fancied to hold the Jules Rimet Trophy aloft by people who are putting their money on the result. That's a big home turf booster affect, (but lets face it, compared with favourites Spain and Brazil, none of these bottom half teams is fancied much at all).
Because there's money to be made – in terms of gambling, but also in team management - there are businesses taking a crack at football forecasting, interpreting data, refining models and analysing which information is most relevant to act on.
Bettorlogic  is a leading player in this market, and provides data to 350 bookmakers, football clubs, and betting syndicates. It employs statisticians and developers to work on its models and even Arsenal manager Arsene Wenger uses its player analysis tool to examine the impact of a player or combination of players on team performance (see part two to view how key World Cup players have performed for their clubs).
“He [Wenger] is looking at how individual players perform in different scenarios, against different kinds of teams, in different phases of the game, and whether Arsenal are winning, drawing, losing at that point in the game,” says CEO Mike Falconer.
There's no end to the number of data points you can grab in any football game – half-time and full-time score, assists, shots on target, completed passes, team possession, ground covered by a player – and certain fans love this level of number crunching, as they do in cricket and baseball.
Bettorlogic's experience is that it works better if you cut through all this. At present you'd have to grab all this data manually which would be impossible at scale and speed. The company's model is about goals, the time they're scored in a game and the rank of the respective teams – eg top five, bottom third etc. Player analysis compares the average number of points per game (PPG) gained by a team when a player features for his club with the PPG when he does not play.
This analysis revealed Chelsea's Florent Malouda to be their key player in the 2008-09 season, when many considered him a weak link. His contribution was recognised the following season when he was named the Premier League's Player of the Month in March 2010.
But all this bears little resemblance to its modelling at the last World Cup. “We've a much greater idea of what is relevant, what works over a period of time,” says product director Andrew Dagnall.
Then there's some special sauce in its algorithms, backed up by its historical data which goes back to 1970. The company's strong suit is providing very fast analysis when team selection is announced, and in-play analysis (which is where the big growth is in betting markets). Bettorlogic has four 2.6Ghz quad-core processor systems, scouring for new data every 30 seconds, and crunching away to give in-play updates every five minutes, looking at 40 divisions around the world. The set up is C#, .NET Framework, and SQL Server 2008.
“We take a feed of a line-up when it's known producing a performance factor based on the starting 11, anyone else in the squad, in terms of the teams they're about to play,” says Falconer. “There are some other products which are not entirely dissimilar, but that look at much more granular data such as yards covered and shots on target. We're much more about what is the relationship between a player and the performance of a team.
“It's much more relevant from a betting and a manager's perspective.”
Falconer believes the work his business is doing in identifying performance patterns is pioneering.
“In-play, we can analyse any one of 40 divisions worth of games, whatever their frequency, every five minutes. Whenever a goal is scored, we can look historically at how that event/decision has played out in the past. No-one else is anywhere near that.”
A typical scenario would be investigating that if Arsenal were 1-0 at home after 30 minutes against a bottom five team, what historically has been the goal count, or by what margin do they typically win? “We're constantly updating that throughout the game,” says Falconer.
The modelling thows up interesting stats. Dagnall points out that, “When John Terry and Ricardo Carvalho play together in Chelsea's back four [they're defenders], the team score more goals.”
So, you scream, is all this information useable? Predictive modelling using sports and gambling data certainly has had its successes, starting with Blaise Pascal and and Pierre de Fermat's contribution to the theory of probability in the 1650s. More up-to-date was the success of the Oakland A's baseball team in 2002 which used statistical analysis and different criteria to those hitorically used to measure performance to find players undervalued by the market. From the late 80s the 'Hong Kong Syndicate' pioneered the use of modelling to exploit betting on the insular Hong Kong horse racing scene.
And in the measure that counts, BettorLogic's history shows between a 12 per cent and 16 per cent return on betting investment, at level stakes, if its customers follow its recommendations over time. But 'over time' is the key, and unquantifiable, point – like the stock market, this isn't going to be a smooth ride of consistent growth. And there are obvious recent examples of modelling fuck-ups in that arena.
But here's an interesting stat. Many bookmakers use part of Bettorlogic's full service as a stimulus for their customers to bet. In a four-month test using Betfair customers, 5,000 got access to the product, against a control group of the same size and profile, which didn't. The difference in betting volume was around 35 per cent, but these “informed” punters didn't win more money.
“They were not systematic enough or organised enough to follow a pattern the number of times to make it work,” says Falconer.
“We're not saying we're soothsayers. We're saying here's this situation modelled historically and reliably. Here's the necessary information delivered systematically on a massive scale, in real time.” ®