How can we use data in soccer

In the book published last year, “The Numbers Game – Why Everything You Know About Soccer Is Wrong”, the authors Chris Anderson and David Sally made every effort to do one thing: calling for a revolution in soccer to help it adapting into this “Big Data” modern world like other popular sports. They claim that soccer is probably the most old-fashioned and stubborn sport in the world. “That’s the way it’s always been done” are the seven words that dominate soccer.

I generally agree with them. Well, at least it’s the impression given by FIFA and UEFA… But why is soccer lagging behind?

Is it because that soccer is a harder game? It’s technically more difficult for the human body and the rules give the game much more flexibility than, say basketball and American football in which every second of the game time and every inch of the field is counted accurately. Too much liquidity and too little control makes it complicated to collect, manipulate and apply the data.

Or it might just be a culture issue. You may imagine that if the world of soccer were dominated by the US instead of Europe, it might have already been enjoying the prosperity of data analytics now and even the Moneyball  story could have been born from it rather than from baseball.

Whatever the reason is, now it seems things are changing. As technologies of collecting and manipulating data are developed faster than ever, and successful examples of using data in other sports are shown one after another, I believe that more and more soccer professionals are ready to embrace the era of the Big Data.

In this post, I want to talk about some ideas that are inspired by the book about how data analysis can be used in soccer. It’s not a book review. Besides the topics from the book, I’ll also write about my own ideas and something I saw elsewhere.

Continue reading

Analysis of playing minutes of Barça

This analysis was done BEFORE I began to learn any systematical techniques on statistics or data mining. Many aspects in this article are premature and need much improvement. But it’s the starting point of my passion of data analytics, especially football analytics. So I’ll start my blog with it.

The motive of this analysis is to settle the debates over some issues of last season of Barça using data analysis. Data analysis can well compensate the vague impression, short memory and biased opinion that we usually have in a qualitative analysis. I believe that the team has a much more advanced and comprehensive system of data analytics, comparing to that, this analysis is very simple and crude. But with only the basic tools and limited data online, we can at least obtain some general idea from it.

Here I analyze the playing minutes of the first team players of Barca in recent five years, in order to understand the issues about rotation, age structure and the situation of the homegrown players.

All data are taken from the database of Spanish football http://www.bdfutbol.com

Before going into the analysis, let’s define a concept for our convenience: The first team player. We follow the practical concept rather than the official status: Those and only those who play more than (including) 90 minutes in all the official games of the first team in a season are defined as the first team players.

Continue reading