Figure - uploaded by Raffaele Mattera
Content may be subject to copyright.
Example of the dataset for the English Premier League

Example of the dataset for the English Premier League

Source publication
Article
Full-text available
Several studies deal with the development of advanced statistical methods for predicting football match results. These predictions are then used to construct profitable betting strategies. Even if the most popular bets are based on whether one expects that a team will win, lose, or draw in the next game, nowadays a variety of other outcomes are ava...

Contexts in source publication

Context 1
... Table 1 shows how dataset for the English Premier League is built. While one variable (red) is binary, the Goal/No Goal and the Under/Over variables have been transformed in binary events. ...
Context 2
... the 75% there are at least more than one goal, while only in the 30% of the matches more than three goals are scored, In the end, red cards occurred only in the 0.7% of the matches within the sample. Similarly to the Premier League, we also collected the last 13 seasons of the Italian Serie A. The dataset looks like the one in Table 1. The descriptive statistics of the Italian Serie A matches are shown in the Table 3. ...

Citations

... • Predictive systems about issues of soccer matches, such as a simple framework based on scoring models [6], capable of obtaining accurate forecasts for binary outcomes in soccer matches, are proposed. In this sense and in order to analyze the usefulness of the proposed model, experiments were conducted with the English Premier League and the Italian Serie A, where certain events such as red cards in the match, the score under/over, and the goal/no goal outcome are predicted. ...
Article
Full-text available
The importance of Big Data and the analysis of this data in recent years is indisputable, and this boom has spread to all areas of life, including professional sports and, within this, soccer. The significant amounts of money involved in this sport have led to the need for the top clubs to employ these techniques to gain a competitive advantage over their competitors. Despite this, there is very little information on how these tools are used or what parameters they consider. Similarly, there are a multitude of amateur analyses that offer very few conclusions. They simply focus on collecting and presenting the data in the form of a comparison without any analysis or pre-processing. This work describes the implementation of an expert system based on fuzzy logic used to evaluate the talent of a soccer player at all levels, his/her aptitude and attitude, to face his/her individual and collective professional development. For this purpose, the above aspects will be evaluated specifically in the different aspects of the game, which will allow us to evaluate the performance of a soccer team and thus determine the probability of victory, draw, and defeat in a confrontation.
... Applying quantitative methods to sports data raised the interest of statistics and machine learning research communities. Statistical methods are nowadays commonly used for predicting matches' results (Angelini & De Angelis, 2017;Mattera, 2021), pricing players' value (Behravan & Razavi, 2021) and for evaluating teams performances (Sarlis & Tjortjis, 2020). Clustering gives another significant application of machine learning techniques for sports analytics Narizuka & Yamazaki, 2019;Ulas, 2021). ...
Article
Full-text available
Although there are many contributions in the time series clustering literature, few studies still deal with count time series data. This paper aims to develop a fuzzy clustering procedure for count time series data. We propose an Integer GARCH-based Fuzzy C-medoids (INGARCH-FCMd) method for clustering count time series based on a Mahalanobis distance between the parameters estimated by an INGARCH model. We show how the proposed clustering method works by clustering football teams according to the number of scored goals.
... Team performances have been analyzed previously by different authors, using data from different countries or competitions and based on substantially different quantitative approaches guided by the type of data available (Anders and Rotthoff 2011;Bar-Eli et al. 2006;Carmichael and Thomas 2005;Chowdhury 2015;Gómez-Déniz et al. 2019;Greenberg 2015;Mattera 2021;Mechtel et al. 2011;Titman et al. 2015). Regarding the effects and occurrence of red cards, most of the literature finds, as might be expected, that red cards produce a decrease in the performance of the penalized team while causing an increase in that of the opposing team, either in terms of the probability of winning the game or in terms of expected goals, points achieved or scoring rate; although the magnitude of the effect can vary considerably. ...
Article
Full-text available
The aim of the current study is to analyze the effects of red and yellow cards on the scoring rate in elite soccer. The sample was composed of 1826 matches in the top five European leagues. All events were structured in 5-min intervals and were analyzed by means of a Generalized Linear Mixed Model with Poisson distribution, considering the presence of correlated data, where the dependent variable is represented by scoring rate. Team strength and home advantage were considered implicitly by means of a transformation of the betting odds for each game. The model also took into account the goal difference and time evolution. Overall, we found that after a sending off, each team’s scoring rate changes significantly, damaging the penalised team and favouring its opponent. When the player who is sent off belongs to the Away team, the impact of a red card is more or less maintained over time intervals. The red card effect, on the other hand, tends to fade over time when the affected team is stronger. The relative difference in scoring rates is also affected by the goal difference and the difference in booked players, being slightly lower for the team going ahead if it has more booked players. Our approach allows estimating the expected cumulative soring rate through time for various red card scenarios. Particularly if a red card is given with 30 min of remaining time, the expected impact is 0.39 goals if the guilty player is on the visiting team and 0.50 if he plays for the home team. Coaches and analysts could use this information to establish objectives for players and teams in training and matches and to be prepared for these very different scenarios of numerical superiority or inferiority.
Article
Objectives The purpose of this study is to evaluate the pinnacle of football match key statistics as in‐play information for determining the match outcome of Europe's foremost leagues, namely those in England, Scotland, Spain, Germany, Italy, France, Portugal, Belgium, Turkey, the Netherlands, and Greece. The study analyzed a sample of 98,849 matches across all sports leagues from the 2002/2003 to 2023/2024 seasons. Methods The techniques employed include the zero‐inflated Poisson regression model and generalized ordered logit/partial proportional odds (gologit/ppo) models. Results The findings revealed that, for both home and away teams, the number of shots, shots on target, corners, and the changes from one season to another, as well as the occurrence of Covid‐19, are factors that encourage goal scoring. On the other hand, fouls committed, yellow cards, and red cards act as limiting factors for goal scoring. The effects are higher in the full‐time play than in the halftime. However, the impact of the number of goals scored in the last match and the effect of Covid‐19 are negligible for the home and away teams, respectively. Moreover, when comparing the impacts specifically within home teams and within away teams, it was found that yellow and red cards are highly detrimental, while the positive impact of shots on target surpasses these and other factors in home teams. In contrast, for away teams, the negative impact of yellow and red cards is more significant than any other factor. Conclusion Football match key statistics including the number of shots, shots on target, corners, change from one season to another, fouls committed, yellow cards, red cards, last match outcome, and occurrence of Covid‐19 are essential determinants of the match outcome whether a team is at home or way but the impact is higher during the second half of the play.
Conference Paper
Full-text available
The purpose of this research is to present an overview of the conceptual framework of sports statistical data fabric techniques as an approach for using information technology systems for analyzing sports statistical data to develop athletes' potential. This literature review shows that the development of advanced statistical methods for match prediction and selection of athletes by using sports statistical data to analyze the correlation and correlation of the data creates an information system with sports statistics data fabric techniques provide excellent cognitive results. The information obtained is accurate and accurate. From the results of the study, it was born as a new body of knowledge with in-depth conceptual frameworks in sports and scope of information as a basis for development sports with specific characteristics of physical fitness. The knowledge gained from the conceptual framework, this research can create new norms and knowledge in sports toward the development of athletes to their highest potential.
Chapter
With the rise of Big Data in sports performance analysis based on data and a wide variety of possibilities have opened. In the last, 10 years considerable progress has been made in sports analysis using Machine Learning algorithms. The current chapter defines one such method, ‘Logistic Regression’ as a method for analysing binary problems. It also gives a few hypothetical examples of how it can be used to answer specific problems in the team and individual sports. Furthermore, the chapter outlines a study which has used logistic regression as one of the methods to analyse KPIs in Elite Goalkeepers in Football.