Scientific Programme

Applied Sports Sciences

IS-AP02 - Machine learning in sports – Not everything that glitters is gold! - sponsored by adidas

Date: 03.07.2024, Time: 13:15 - 14:30, Lecture room: Clyde Auditorium

Description

Just a few years ago, Artificial intelligence (AI) and its subfield Machine learning (ML) were perceived almost like magic spells – offering miraculous capabilities unimagined with other approaches but inaccessible for subject matter “Muggles”. Since then, turnkey open statistical software has made ML accessible for sports scientists with little background knowledge or coding skills. At the same time, commercial “AI-based” applications for use in sports practice have multiplied. While this accessibility opens up numerous opportunities, ease and ubiquity entail the risk of overlooking specific methodological issues and limitations that have affected the field of sports science over the last few decades. Recent reviews have already pointed out common mistakes when using ML approaches within the domain of sports science. Likewise, from a practitioner’s perspective, basic ML-literacy enables asking the appropriate questions when scrutinizing performance claims for proposed algorithms and applications. This invited symposium therefore aims to provide guidance for sports scientists and practitioners alike when adopting ML for their research questions. The first talk provides recommendations on how to set up an ML study, including guidance when ML is the right choice for a specific research question or use-case, explain how to avoid common methodological mistakes, and point to critical questions when assessing performance claims. The second talk focusses on the distinction between prediction and inference central to ML approaches and addresses general misconceptions in this regard. The session closes with the presentation of a practical example using ML within a sports scientific setting, specifically highlighting the case of the Dutch National Short Track team. Thereby providing an overview of how to report ML findings in sports science research.

Chair(s)

Matthias Kempe
Matthias Kempe
University of Groningen, Department of Human Movement Sciences
Netherlands
Gregory Bogdanis
Gregory Bogdanis
National & Kapodistrian University of Athens, School of Physical Education and Sport Science
Greece
Anne Hecksteden

Speaker A

Anne Hecksteden
Universität Innsbruck, Institute for Sport Science - Chair of Sports Medicine
Austria
Read CV

ECSS Glasgow 2024: IS-AP02

Does this make sense? Assessing machine learning applications for sport related use-cases

Today, machine learning is self-evident part of many life domains. Impressive performance in some areas together with objectivity and scalability may lead to the impression that data-driven approaches are generally superior to other ways of generating knowledge and competence (e.g. “normal” statistics or experience- based learning). This apparent edge is particularly attractive in competitive sport where “AI” based applications have become pervasive. While understanding and evaluating technical details is beyond the reach for most sport scientists and practitioners, it is possible to gauge the potential suitability of machine learning for a specific use-case and critically scrutinize performance claims. To begin with, it is important to keep in mind that the performance of data driven approaches relies (among other things) on two critical requirements: (i) Large amounts of precedents (many previous cases to learn from) and (ii) An informative - not just large - set of explanatory variables (that is, the explanatory variables jointly explain most of the outcome). Only if these requirements are met, one can hope to identify regularities that generalize beyond the training data. Note that neither a vast panel of explanatory variables (as e.g. in “omics” approaches), nor a sophisticated, highly-flexible machine learning algorithm (e.g. a deep neural network), nor a large number of repeated measurements (e.g. from wearables) can make up for a limited number of precedents. Therefore, in many sport-related use-cases, a misfit between the model complexity required for useful performance and the number of available precedents (“data scarcity”) poses hard limits to the performance of data-driven approaches. Why is this “hard” limitation is not recognized more generally? Arguably, challenges of unambiguously verifying model performance play a major role. First and foremost, complex an / or probabilistic outcomes (e.g. recovery need or injury occurrence), which are common in sport, do not allow for a direct comparison of model output with the (later) observed ground truth. Moreover, model performance may differ dramatically between training set, cross-validation, out-of-sample, and out-of-population validation. While successful practical application requires model performance with new cases, data scarcity increases the risk of spuriously high performance on the training data (“overfitting”). Therefore, verified out-of-sample performance is crucial for sports practice and should always be reported / requested. Moreover, consciously scrutinizing the plausibility of claimed performance based on the predictability of the outcome, the set of explanatory variables and common sense remains important (e.g. near perfect prediction of contact injuries in team sports based on GPS data alone). This talk will illustrate the above considerations based on practical examples and provide suggestions for fitting machine learning approaches to sport-related use-cases

Robert Rein

Speaker B

Robert Rein
German Sport University Cologne, Institute of Cognitive and Team/Racket Sport Research
Germany
Read CV

ECSS Glasgow 2024: IS-AP02

Understanding parameter inference versus predictive inference when applying machine learning models

Standard statistical practices in sports science research typically involve the estimation of unobservable model parameters based on a specific data model. Subsequently, inferences usually connected to a hypothesis testing framework about model parameters are performed to identify data generating processes and possible causal relationships. In general, linear model approaches instantiated for example through Analysis of Variance type of analyses represent the majority of analysis approaches across sports science disciplines. Accordingly, model driven hypotheses tests using a frequentist theoretical approaches form the main epistemological framework to interpret research results and assess their importance. However, with the increasing application of machine learning approaches (ML) to sports science data this approach is overly restrictive and may also lead to erroneous procedures. First, a distinction must be made between an inferential approach and a decision-making approach according to the aim of research project. ML approaches are typically much more suitable when researchers are interested in building a decision-making procedure as their strengths lie in prediction results and less on parameter inferences. ML methods therefore usually require a different approach to the analysis of the data and the interpretation and assessment of the model as many of these models act as black-boxes in contrast to standard linear models. Practices common under an inferential framework like variable selection may not only be unnecessary but may actually deteriorate performance even when less important variables are deleted. In turn, when using black-box algorithm approaches model interpretability necessarily becomes less important compared to predictive accuracy. Focusing more strongly on predictive properties has thereby the benefit of being often of direct interest to practitioners. A practitioner may not have much interest in what has been found for a specific sample but rather would like to know what would happen if new observables would be collected based on their specific sample of athletes. As the prediction error and more general model accuracy become more central for the assessment of a chosen analysis procedures like cross-validation, model stability, and model comparison and concepts like bias-variance trade-off and overfitting become much more important compared to traditional goodness-of-fit tests and parameter inferences. Unfortunately, knowledge and best-practices about these procedures and their specific requirements are not commonly taught at sports science institutions which at best hampers the application of ML to sports science data and at worst leads to erroneous applications. Thus, when adopting more algorithm centered machine learning in contrast to traditional model centered approaches different requirements have to be met by the researchers to ensure that their conclusions are truly supported by their data.

Matthias Kempe

Speaker C

Matthias Kempe
University of Groningen, Department of Human Movement Sciences
Netherlands
Read CV

ECSS Glasgow 2024: IS-AP02

The Data Scientist and the Coach: A long lasting marriage if done right

Using Machine Learning (ML) to synthesize the vast amount of data enabling coaches, trainers, and support staff to make educated decisions might be the most influential use in sports practice. By now, a professional sports team monitors their athletes extensively, recoding variables of training prescription and performance (external load), as well as the physiological and mental strains of these training sessions (internal load). Next to that, they also perform regular performance testing and record individual variables like age, injury history, growth spurt, etc. The complexity and possible interactions of these variables are hard to grasp which is why Robertson made the argument to install decision support systems to create recommendations for practitioners based on “historical” data [1] using ML. Such a system creates simplified variables and/ or visualization by synthesizing various information to make better-informed decisions. Especially in terms of athlete monitoring and injury prevention, different approaches were introduced in the last decade to serve this purpose. A recent review found eleven studies that used different data sources and ML algorithms to predict injuries [2].While the quality of the implementation of the ML algorithms in these papers was good compared to earlier reviews in this field [3], the overall quality of the evidence just ranked from very low to moderate. While seeming accurate, these approaches aren´t endorsed by practitioners. Therefore, this talk will give recommendations to design an approach that is helpful, useful and accepted by coaches and the supporting staff using ML. The outline is given by an example of a use case of Dutch National Short track team, applying a readiness to perform prediction to adapt daily training prescriptions. To enable more sport scientist to apply such an approach and facilitate reproducibility, appropriate reporting as well as data and code sharing should be stimulated. This means open science principles, as highlighted by Bullock et al. [4], should be stimulated in these publications. This includes using reporting guidelines like TRIPOD and adhering to FAIR data principles. Thus, this talk will give an outline on how to setup an ML project for practical implementation in a way that could also serve as a blue-print for other sport scientist. [1] Robertson, P.S.: Man & machine: Adaptive tools for the contemporary performance analyst. J. Sports Sci. 00, 1–9 (2020). https://doi.org/10.1080/02640414.2020.1774143. [2] Van Eetvelde, H., Mendonça, L.D., Ley, C., Seil, R., Tischer, T.: Machine learning methods in sport injury prediction and prevention: a systematic review. J. Exp. Orthop. 8, (2021). https://doi.org/10.1186/s40634-021-00346-x.