How to Be Ahead of Your Competition with Data
In our exclusive roundtable discussion, we spoke with some of the most innovative experts in sports analytics. Here we learn about their career experience and cutting-edge technical expertise.
To learn more Download Data Samples and/or book a time with one of our experts.
“The key is explanation as machine learning techniques are a sort of meta-algorithm: they use data to automatically generate new algorithms, capable of performing a task they have been trained to,” said Cintia. “Such new algorithms tend to be black-boxes: they work, they produce predictions, but the underlying motivations leading to that prediction are not completely transparent. This process of knowledge extraction and explanation enforces a co-operation between human and machine skills, heading towards a continuous refinement for both.”
Having the most up-to-date and accurate data is the best way to keep accuracy of predictions high. But what about the value of using data from past seasons? “Malanga says, the more past data, the better.”
We extensively make use of past seasons data to train our models and to backtest trading strategies,” said Malanga. “To stress it further, by having more data, we can make more experiments without increasing the risk of ‘overfitting’, i.e. the common curse of ‘remembering’ past data rather than ‘learning’ from it. There is also a clear alignment between technical and commercial values: the deeper the historical set on which we tuned our algorithms, the more credible and robust our strategies become to our clients' eyes.”
To learn more Download Data Samples and/or book a time with one of our experts.
Extensive Geographical Data Coverage Enables You to Not Have Blindspots
Cintia described to us that for companies that plan to scale globally, extensive geographical data is key, especially in terms of scouting.
“For performance monitoring and scouting, having a wide geographical coverage is fundamental,” said Cintia. “We discover new players since the beginning of their careers, and this is very valuable for clubs. Data has to be machine readable - it's not obvious, unfortunately - and should cover all the aspects of player and club performances: from matches to training load, financial aspects and media coverage.”
With his next point, Cintia explained that with international players from all continents and world leagues becoming more popular in football, it is now extremely important to have data available from leagues across the globe.
There is also a clear alignment between technical and commercial values: the deeper the historical set on which we tuned our algorithms, the more credible and robust our strategies become to our clients' eyes.
“Historical data allows us to find new players, and to extend the possibility of performance analysis and comparison,” said Cintia. “As an instance, we found that the player with most goals scored in a single game is a Chinese female player. Now we know that it's possible to score 9 goals in a single match, it was in a game scored 16-0 to China over Turkmenistan. This could seem kind of fun, but we can do the same for plenty of similar metrics—it's really valuable.”
Extensive geographical data is also very useful in terms of influencing algorithms and comparing data between different leagues.
“We tend to model each league and country as a universe per-se,” said Malanga. “Of course, some analysis may benefit by grouping together different leagues’ data. But in general, an extensive geographical data coverage would simply mean a wider set of leagues for which we train predictive algorithms.”
To learn more Download Data Samples and/or book a time with one of our experts.
It Takes Large Data Sets to Get Ahead of Your Competition
Wyscout data can be combined with multiple other data providers to provide added statistical benefits. Meza explained the value of working with multiple data providers from the perspective of Twenty3 Sport.
“For us it's not about ‘combining’ the data to improve algorithms or models - in fact we keep data from data providers in completely different silos,” said Meza. “The reason we work with multiple data providers is that we want to provide technology to make use of the data and extract insight from it for clients, and different clients have licenses with different data providers, therefore we must tender for multiple data providers ourselves to increase our potential pool of customers. Also, we endeavor to set up our tools in a way where the details of the data are still available to the end users through our tools - for example, Wyscouts "tags" on event data can be used in our tools, to also empower the provider to innovate in their data, and for that to make its way to the end users.”
In terms of taking advantage of multiple data providers to get ahead of competitors, Cinta said he “would say it allows us to image and develop better products”, while Malanga drew the conclusion that “it allows us to mitigate the risk of corrupted data, plus it means we are able to capture a higher level of detail, as different companies have different data compiling procedures”.
Historical data allows us to find new players, and to extend the possibility of performance analysis and comparison.