Man City drags Big Data into Big Football
And Google+ grabs sweaty ballers for its Hangouts...
Posted in Management, 25th September 2012 09:02 GMT
Free whitepaper – Hands on with Hyper-V 3.0 and virtual machine movement
Open ... and Shut Football is the world's most popular sport by a crushing margin. Yet for all the money and attention it gets, the beautiful game has remained doggedly anti-technology, eschewing video replays or goal-line technology despite the prevalence of such tools in other sports. One club, however, is opting to make technology a central hallmark of its game, and may well force other clubs to follow suit.
Unfortunately, that club is not my Arsenal.
No, it's Manchester City, the perennial "other club" in Manchester, which in recent years through barrels-full of oil money has bought an all-star team (and a title to go with it). The Citizens (or "Man City"), as the team is colloquially called, have recently set some soccer industry firsts, opening up their player data and more recently establishing intimate fan forums on Google+ Hangouts.
The Google+ Hangouts move is cute but unlikely to move the needle much on Man City's aspiration to retain its Premiership title. After all, only 10 participants can engage with Man City's chosen representatives (this week it's Judas Iscariot… former Arsenal star and now a Man City executive Patrick Vieira). The hangout is streamed live so others can watch, but it remains an interesting but not necessarily sport-changing sop to fans.
Man City's Big Data initiative, however, is much more promising.
Presented in .csv format, Man City's data trove is a "time-coded feed that lists all player action events within the game with a player, team, event type, minute and second for each action, together with the x/y/z co-ordinates for each event", thereby enabling "heat map, touch map, passing matrices and mapping attacking play and distribution".
This is the same data that Man City credits with guiding it toward the best defensive record in the Premiership for the past two seasons. (The more cynical among us would argue that no amount of data could have helped the old cash-strapped Man City to that record, but why quibble just because my own club refuses TO SPEND SOME !%!%!% MONEY, ARSENE!!!)
Such data has been available for a fee to others, but hasn't been widely accessible to the public. This has now changed, and will likely spur other clubs to take similar actions.
Given that professionals have already had access to this data, the real value isn't in gifting opposing teams data about one's performance. It's rather a way to reach out to as-yet unknown but potentially useful sources of data crunching. As Gavin Fleig, Man City's head of performance analysis, explains, the decision to open up its player data is a way to enable the "Bill Jameses" of the world – James being the man whose sabermetric work in the 1970s arguably revolutionised baseball:
Bill James kick-started the analytics revolution in baseball. That made a real difference and has become integrated in that sport. Somewhere in the world there is football's Bill James, who has all the skills and wants to use them but hasn't got the data. We want to help find that Bill James, not necessarily for Manchester City but for the benefit of analytics in football. I don't want to be at another analytics conference in five years' time talking to people who would love to analyse the data but cannot develop their own concepts because all the data is not publicly available.
This is a great example of one cardinal principal of Big Data: "Don't throw any data away." Data storage is cheap while analysis of these data is potentially priceless, even (pardon the pun) game-changing.
Personally, I'm hopeful that the Citizens' data will be used by some enterprising Arsenal fan to crack the code on how to spend peanuts and still crack at least one Manchester club's stranglehold on the top of football. But whether or not this pipe dream materialises, it's impressive how Man City is challenging soccer's technology status quo, embracing technology as a way to interact with fans and anonymous data scientists. I don't love the team, but I love its technology leadership. ®
Matt Asay is senior vice president of business development at Nodeable, offering systems management for managing and analysing cloud-based data. He was formerly SVP of biz dev at HTML5 start-up Strobe and chief operating officer of Ubuntu commercial operation Canonical. With more than a decade spent in open source, Asay served as Alfresco's general manager for the Americas and vice president of business development, and he helped put Novell on its open source track. Asay is an emeritus board member of the Open Source Initiative (OSI). His column, Open...and Shut, appears three times a week on The Register.
Free whitepaper – Hands on with Hyper-V 3.0 and virtual machine movement
COMMENTS
Re: Asay is a Gooner
It's why I'm so angry at the world. :-)
city fan in peace
Thought the Arsenal performance at the Etihad was pretty impressive - not many teams will get points there this season and Arsenal are looking far more like title challengers than they have for a few seasons.
As for the stats - Patrick Finch aludes to it. In baseball the game is defined by very narrow ranges of probability because each phase of play contains only one set of actions and variables. Cricket is similar. In football, by contrast, due to the continual interaction of the other 21 players on the pitch, plus officials, even the crowd - spotting statistical patterns is going to be that much harder. Liverpool failed because those players they bought depended on players around them with whom to interact. Some players can be removed from this team setting and placed in another and it will work, perhaps Liverpool failed spectacularly because the metrics being measured are faulty.
In my own, amateur, observations of the game, there are broadly speaking four sets of actions
1. defensive / possession regaining actions
2. the transitional act from regaining to utilising
3. offensive / possession utilisation actions.
4. the transitional act of loss of possession
Possession regaining actions include closing players down, tracking back, blocking, defending, tackling, keeping formational shape when defending, marking runners, marking for corners, stepping up to catch players offside. And more besides. There will be measurements of a player's awareness, their movement, their tackling accuracy, and much more.
Transitional regaining acts include intercepting passes, clearances, what the player did with the ball - lump it clear or play towards one of their own - and the accuracy of that.
Offensive acts are runs made off the ball to pull defences around (really important - and also were those runs made in a realistic way so the player could have been reached by an accurate pass, or were they expecting the impossible from their team-mates?), breaks, runs made with the ball, passes, throw-ins, shots, movement - accuracy and effectiveness of all of these is important but not as important as the actual act itself. For example, in any 10 minute spell with possession, a winger can consistently make great runs and not receive a pass. This running causes a defender or two to follow him, leaving a gap. Alternatively the defenders realise the player is not going to receive a pass and may drift infield to do something more useful. At this point a decisive pass can play the under-utilised winger in and set up a goal-scoring opportunity. Obviously shots made / on target / straight at the keeper etc. There is luck involved too but the more possession and passing a team does the more likely they are to get into shooting scenarios, the more likely they are to score.
Last there is the loss of possession. Except where down to consistent carelessness, consistent poor positioning or consistent poor awareness this could be down to how your opponent plays. But it is important to note risks taken when a team is stretched - risky passes made by defenders cost goals, overly cautious passing by attacking players causes attacks to break down and lose momentum.
None of these lists are the least bit exhaustive and there will be subtleties and intricacies which as a non-player I don't see.
I don't know the answer. There is quality and consistency of decision-making, speed of thought, presence of mind, adaptability, inventiveness - how do you quantify them? Or is the metric something that is missed, but implied by all of these factors? Is it too chaotic to predict and only general patterns can be derived? Will we forever be subjected to the inane, cliche-ridden banality of football pundits who seem to have as little awareness in commentary box as they had as either players or in management?
As mentioned by others, the long ball game was the result of a poor reading of stats and it has caused unparalleled damaged to the English game. Having kids play on full-sized pitches with full-sized goals has created a nation of hit-and-hopers instead of any appreciation of the finer arts of playing...
I think there's something special about football's resistance to statistical analysis.
When my own beloved Liverpool tried to import "Moneyball" principles which worked so well in baseball, it went horribly wrong. They shelled out large fees on uncontested deals for Charlie Adam, Jordan Henderson, Stewart Downing and Andy Carroll based on a statsitical analysis of their contributions on the pitch. None of them was especially successful, three have left after a single season. Even more intruiging, if you ignore the one statistic that actually matters (difference of goals scored and goals conceded), Liverpool did very well - by shots, possession etc.
This is nothing new. In the 80s, the "long ball" game was inspired by the statistic that most goals were scored from less than three passes (so why bother to pass it?). The result was that the English game took a great step backwards to becoming "head tennis", while other countries worked on developing skills involved in playing football.
I suspect its the fluidity of the game, the contiguous play, that makes it very hard to isolate key variables (and indeed, to enforce the advantage and offsite rules with any degree of consistency). Most other sports have discretely defined passages of play.
Re: Yay! Football!
Awww not like football madra? That's fine but there's no need to be such a lil' bitch about it
Footy + Data = :-)
Opening up the data for the sake of it is highly commendable - fine work indeed from Citeh.
I do wonder how many man-hours* will be spent finding any actionable insight. Graphs and charts are all very well, but unless it makes a measurable difference, is it worthwhile, no matter how enjoyable? There's also the pesky issue of footballers being asked to implement whatever insights the nerds uncover, plus the fact that the opposition may not play nicely and sign up to the plan.
Right, I'm off to prove that Citeh will never beat the mighty Everton ever again...
* Yes, I said man-hours not person-hours. Get over it. Women are too smart to waste time looking at footy data.

The new Office Garage series: