Predictive Modeling of 4th Down Conversion in Power 5 Conferences: Football Data Analytics

Authors: Joshua Blinkoff¹, Michael Voeller¹, Scottie Graham² and Jeffrey Wilson³

¹Barrett Honors College, Arizona State University Tempe, AZ
²Arizona State University, Sun Devils Athletics, Tempe, AZ
³Department of Economics, Arizona State University, Tempe, AZ

Corresponding Author:
Jeffrey R. Wilson, BA, MS, PhD
Department of Economics CPCOM 465D
Arizona State University/Tempe AZ 85287
jeffrey.wilson@asu.edu
480-213-4460

Dr. Jeffrey Wilson is a Professor of Statistics and the Faculty Athletics Representative to the PAC-12 and NCAA. His research includes binary logistic regression models and hierarchical data with random effects.

Predictive Modeling of 4^th Down Conversion in Power 5 Conferences: Football Data Analytics

ABSTRACT

Purpose

In the sport of football, coaches are faced with critical decisions at different times in the game. Often the coach makes the decision based on a gut feeling or the advice of an assistant. However, if each decision can be supplemented with data, it is possible to increase the chances of success. This paper uses data (2015-18) from the games played between the 65 teams in Division I in the Power 5 conferences of the NCAA, to present a prediction model useful for 4^th down determinations.

Methods

A predictive logistic regression model is used in the determination of 4^th down options. In particular, a model based on a logistic regression model with random effects, capable of predicting the likelihood of converting on 4^th down decision is presented. The adequacy of the model is estimated through calibration, discrimination, and bootstrap samples.

Results

Distance-to-go, pass or run, line of scrimmage, and the week of season are significant factors in predicting a successful 4^th down with team as a random effect.

Conclusion

The paper demonstrates the use of analytics to increase the decision-making in football. It increases the precision in decision making by 36% in these data.

Applications in Sport

Teams can use the model to facilitate similar decisions in other parts of the game. This can also be used in the recruiting of players.

Keywords: Analytics, Random effects, correlated data, logistic regression, hierarchical level data, decision-making

INTRODUCTION

Predictive modeling is an important tool in the emerging field of sports analytics. It is not uncommon to find business corporations trending towards utilizing data analytics in an effort to predict their outcomes. Predictive modeling consists of a collection of statistical techniques with the goal of finding a relationship between a target or response and several predictors or drivers. Such statistical modeling allows one the opportunity to attach probabilities of occurrences to future outcomes based on certain predictors, with a certain degree of accuracy. The predictors are key components in a statistical relationship that predicting future values of the target variable. There are three categories of potential predictors or drivers: those unlikely to affect the outcomes, those certain to affect the outcomes, and those that may or may not have an effect on the outcome. This analysis addresses all categories as both correlated outcomes and independent outcomes Wilson and Lorenz (1).

The use of data analytics and its applications have grown tremendously and now have become prominent in the world of sports, particularly in football and baseball Ayers (2). In particular, the use of data analytics has been embraced and enhanced in the decision-making of football, a sport that has historically relied on instinct and gut reaction Steinberg (3). During the course of a football game, the coaching staff makes critical decisions from play to play that greatly impact their team’s likelihood of success. These decisions are improved with the use of data analytics.

American Football

In the sport of American football, two teams compete on a 100-yard long by 53-yard wide rectangular field with the ultimate goal of making it to the opposite end of the field (the “end zone”), with a touchdown or field goal. Each team takes turns playing offense, while the defense attempts to prevent the offense from advancing down the field and getting into the offensive team’s end zone. If a team advances across the goal line of the end zone with a run or a catch, it is a touchdown and the team gets 6 points. If they cross the line through a successful kick that goes between the goal post (referred to as a field goal), the team is rewarded with three points.

When a team obtains possession of the football, they embark on what is referred to as a drive. Each drive in the game starts with a first down, and the offensive team must advance 10 or more yards within 4 downs. If they do, then the play continues with a reset of the down marker and ultimately the team has another four chances to move at least 10 yards. Failure to convert on a 4^th down results in a turnover on downs, and the defending team takes over at that position on the field. To play it safe on 4^th down, the offensive team usually punts the ball down the field with the intention of making the other team start their drive as close as possible to their own end zone. So, on a 4^th down, offenses must decide whether to call a play that attempts to gain the distance necessary for a 1st down, punt the ball, or kick a field goal, if the distance on field position permits.

Historically, in the absence of data analytics, the decision to gain a 1st down is based on non-measurable factors such as instinct, confidence, or momentum as felt by the coach or one of the coordinators. Such a traditional approach to decision-making in coaching is steadily declining, for a growing number of sports. There is a greater trend towards relying on historical data and predictive analytics to aid in play-by-play selection and decision-making Russo (4). The use of data analytics through statistical modeling provides an opportunity to quantify the decision and increases one’s odd of success. The extra information helps minimizes the risk of failure, as in this case, in decisions on 4^th down, Gorski (5). The paper looks at those plays when the decision is made to go on 4^th down and the factors impacting success. It examines how predictive modeling can realize a substantial gain in decision-making.

The coach at the Pulaski Academy, a non-Power 5 conferences member in Arkansas rarely punts. The decision is based on a model built on measuring the offense’s expected points; the expected probability of a 4^th down conversion; average expected punt distance; and the opponent’s expected punt return, Dalen (6). Dalen explored a model that accounts for the probability of making a 4^th down for every distance, expected points from each point on the field, and the mean net yards of a punt, Hall (7). The model examines an entire season of NFL play-by-play data and included 3^rd down plays, whereby the offense punts on the next play, representing all “do-or-die” downs. It examines play-by-play data, but excludes random effects of any kind.

The establishment of a prediction model is a process. There are several steps in developing a prediction model. The process includes problem definition and data inspection, predictors coding, model specification, model estimation, model performance, model validation, and model presentation (8). Of these steps, model validation is critical to assess model performance and ensure a model’s capability to predict future outcomes. Model validation is generally performed internally or externally. Common measures for model validation include calibration that shows the agreement between the predictive outcomes versus the observed outcomes, discrimination that checks the concordance between predictions and observations, and bootstrap that validates model performance using repeated sampling technique, Chu and Wang-Trahan (9-12).The model in this paper follows this process.

In addition, data were randomly split into a model development dataset and a model validation dataset. The development dataset was used for variable selection and functional form assessment, and the validation dataset to assess model performance. The model parameter estimates are based on combined data, consisting of both development and validation datasets. The prediction model is a generalized linear mixed model using the logit link in GLIMMIX. The model validation, include model calibration using SGPLOT, discrimination using the ROC option in PROC LOGISTIC, and sensitivity analysis using SAS MACRO.

Consider the critical decisions on a 4^th down play: the offense can maintain possession, punt the ball, or attempt a field goal. However, once the decision is made to maintain possession and “go for it” on 4^th down, one must decide to pass or to run. Also, the tenacity of the offense is a random effect that cannot be ignored in any modeling. The latter are random effects. A generalized linear mixed model is fit. In particular, a correlated logistic regression model is constructed to predict the probability of success at 4^th down. Data for each of the 65 teams in the Power 5 conferences, based on play-by-play data from four seasons of college football (2015-2018), are used in the model. The correlated logistic regression model includes several fixed factors, and useful random factors to account for the non-measurable factors in the team performance and coaching experience in the team’s offense. The data are analyzed using PROC GLIMMIX in SAS.

The proposed model provides a predictive probability obtained from a binary correlated logistic model with fixed effects: distance-to-go; line of scrimmage; a run or pass; and the week of the season; with random effects in team’s offense. Conference did not show variation in play. The proposed model is not intended to dictate to offensive coordinators which specific play to run, but rather provides an example of the use of predicted probability at a certain point in the game. However, one can develop similar models for other critical situations.

METHODS

Data acquisition

The data used for this model rely on information obtained from the 65 teams in the Power 5 conferences (ACC, Big Ten, Big XII, PAC-12, and SEC) over four seasons (2015, 2016, 2017, and 2018), obtained through web scraping of the ESPN API. The observations in the data are the plays made in a game. Since they are produced by a similar mechanism, the observations are correlated. There are 2,322 instances of 4^th down attempts to continue the drive. These instances cover the situation when the number of yards required to make a 4^th down is less than or equal to seven yards. This determination was made after talking with an assistant coach on the coaching staff at one of these 65 schools. However, one can develop similar models with different exclusion criteria.

Each attempt to “go for it” on 4^th down is an event. Each event is a Bernoulli trial, success or failure. However, these trials are obtained on a hierarchical structure. The mechanism that leads to its origination is similar. Hence, the events are correlated. Correlated observations require special attention to the standard errors. The standard errors are usually larger than believed to be. As such, the statistical tests are altered and should be addressed with appropriate models (1).

Random Effects Predictive Model: Success on 4^th Down

An appropriate model to address correlated data and model binary responses is to use a generalized linear mixed model. One such model is the predictive regression model with random effects on a logit scale is:

where logit is the natural logarithm of , is the probability of conversion on 4^th down, is the probability that team fails to convert on 4^th down, denotes the odds of converting on 4^th down, denotes the regression coefficient for the impact of the predictors, , The effect of the regression coefficients is important to determine how impactful each covariate is on the predicted probability The random effects measures each team’s effect such that for i^th offense , The random effects reflects the unmeasurable factors not explained by the fixed effects. Each play in the database arose from a hierarchical structure of three levels, with conference as the primary level, team as the secondary level, and each play as the observational unit. Plays are considered nested in teams, and teams are nested in conferences. Such random effects model is fit in SAS with PROC GLIMMIX.

RESULTS

Data Description

The data cover the period 2015-2018 in Division 1 NCCA college football. There are 2,322 plays where teams decidedto go for it on the 4^th down, and the distance-to-go at 7 yards or less, as shown in Table 1. The most frequent 4^th down play occurred with less than 2 yards, [0, 2) yards. As the distance-to-go increases, there are fewer attempts made to go for it on the 4^th down.

Table 1: Frequency Distribution of Plays by Distance [2015-2018]

Distance	Number of Plays	% of plays
0	91	3.92
1	965	41.56
2	366	15.76
3	245	10.55
4	213	9.17
5	169	7.28
6	143	6.16
7	130	5.60
Total	2322	100.00

Four measures of significant importance are distance-to-go, line of scrimmage, pass or run, and the week of the season. The week of the season reflects a belief of familiarity of plays. The longer the season goes the more familiar the defense is with the playbook. The mean and standard deviation for distance, and line of scrimmage are shown in Table 2. The mean distance is 2.52 yards with a standard deviation of 1.93. The mean line of scrimmage is approximately 66 (the opposing 100-66=34-yard line) with a standard deviation of 19.09. This indicates that teams are more likely to go for it on the opponent’s side of the field. As expected, this phenomenon is not without explanation, as it is less risk adverse. In the event that the offense fails to convert on the 4^th down, the opponent has a much shorter distance-to-goto obtain a score. About 44.6% of plays were pass plays. The average line of scrimmage is 35-yard line on the opponent side of the field.

Table 2: Mean and Standard Deviation for Predictor Variables, 2015-2018

Predictor	mean	Standard Deviation
Distance-to-go	2.52	1.93
Line of scrimmage	66.17	19.09

Predictive Model

A correlated logistic regression model with random effects is fit to the data. The model fixed effects are distance-to-go; line of scrimmage;pass; and the week of the season. Conference did not show to be a significant random effect, but the offensive team is a significant random effect. The generalized linear mixed model is fit using a statistical program (SAS Procedure GLIMMIX).

The results of predicting the probability of making 4^th down, , are summarized in Table 3. As distance-to-go increases, the team is less likely to make the 4^th down. The odds of not making it is 1.18. Each extra yard decreases the chances of success. A team choosing to pass on the 4^th down is less likely to be successful. The odds of not making it with a pass is 1.39. As the season [week] progresses, a team is less likely to convert on 4^th down. The odds of not making it on each is 1.02. These odds are presented in a forest plot, as shown in Figure 1. Recall if the odd is 0.72, the complement is 1.39.

The variance of the random effects in team was 0.075 with standard error 0.032. This results in a z-value of (0.075/0.032) =2.34. Thus, the random effects in team is a significant unmeasurable factor.

Table 3: Parameter Estimates for Logistic Regression Model with Random Effects

Effect	Logit Estimate	Standard Error	p-value	95% Confidence limits OR
Distance-to-go	-0.167	0.012	<.0001	[0.826,0.867]
Yard Line	-0.006	0.002	0.0085	[0.990,0.999]
Pass	-0.335	0.088	0.0002	[0.602,0.851]
Week	-0.020	0.011	0.0832	[0.959,1.003]

Figure 1: Forest Plots outlining the covariate effects

Adequacy of the Model

The goodness of fit of the model is usually verified by calibration, discrimination, or bootstrap samples among other methods. Calibration demonstrates the agreement between the observed outcomes and the predictions made by the model. In performing calibration, the study population is divided into risk deciles based on the predicted probabilities. The expected number of outcomes in each decile and the observed number of outcomes in each decile are compared. A calibration plot provides an examination of the agreement between the observed and the expected. The plot is good, as it lies close to a 45-degree line, as shown in Figure 2.

Figure 2: Calibration of the model: shows how accurate the model is with predicting

Discrimination evaluates the ability of a prediction model to discern plays that had an outcome of success as opposed to plays that did not have such outcome. A common measure for model discrimination is the area under the receiver operating characteristic (ROC) curve (AUC). The AUC treats each predicted probability from the prediction model as a threshold and calculates the specificity and sensitivity for each threshold. The ROC curve plots sensitivity against one minus specificity over all possible thresholds. The resulting c-statistic is 0.72 indicating a good model (12).

Figure 3: Roc Curve: a measure of the effectiveness of the model

An assessment of the adequacy model through a bootstrap re-sampling method relies on repeated estimates of a statistic from a large number of bootstrap samples (13). The bootstrap samples are generated with replacement from the original data (14,15). The model estimates to these bootstrap samples are applied to the calculation of the c-statistics within each sample. The sensitivity analysis validates how robust the model performance is among the different distance to go groupings.

Predictive Probability

The coach’s model claims that each time the team goes for it on 4^th down, the team is expected to be successful [100%: 0%]. However, the actual result is [59%: 41%] of success to failure. The prediction model result is [63.5%: 36.5%] of success to failure, as shown in Table 4.

Table 4: Comparison of Models (Coach, Actual & Predictive)

Model	Success to Convert	Failure to Convert
Coach	100	0
Actual	59%	41%
Predictive	63.5%	36.5%

The results show that the factors distance-to-go, pass or run, week, and line of scrimmage are very significant to the decision-making process. If the information from these factors were used, the coaches would have decreased their losses by 36.5%. The coach’s decision is on a scale consisting of one point [only 1] but the actual result is on a scale of [0 or 1] whereas the prediction is on a scale that lies between [0, 1]. When using the predictive model, the user needs to determine the threshold of comfort. Some thresholds are median 50%, prior probability (59%, in this case,) or a self-instituted comfort threshold value.

The model provides significant prediction for most cases, except for when distance-to-go is less than 1 yard, as shown in Table 5. The coach’s model failed as the distance-to-go increases. The coach’s intuition is successful for less than 1 yard to go. However, when the distance-to-go is greater than 1 yard, the predictions based on football analytics is substantially better than a coach’s intuition.

Table 5: Model and Coach Success by Distance to go

Distance	Number of Plays	Coach Probability Success	Actual Probability Success	ModelProbability Success
0	91	1.000	0.989	0.699
1	965	1.000	0.684	0.670
2	366	1.000	0.585	0.621
3	245	1.000	0.531	0.549
4	213	1.000	0.469	0.490
5	169	1.000	0.420	0.430
6	143	1.000	0.420	0.370
7	130	1.000	0.362	0.319
Total	2322

DISCUSSION

A predictive model for 4^th down plays can be useful to coaches. A model incorporating non-measurable random effects in a team’s offense with distance-to-go, a run or a pass, line of scrimmage, and week of the season is significantly better at prediction. It can be programmed prior to each week’s game and updated the week or weeks after each game as more data becomes available.

There are possibly other things not provided in the study data that one may claim would improve the model, which, if provided, can make the prediction even more accurate. Some other factors may be the special team or kicking team as a fixed effect or preferably coaching staff as random effect. However, the present available data are void of head coaching changes and other staff changes during the four seasons of 2015-2018.

CONCLUSION

This paper covers some common techniques for predicting outcomes in the game of football. The model showed great performance based on model calibration, discrimination, and sensitivity analysis.

APPLICATION IN SPORT

Teams can use the model to facilitate similar decisions in other parts of the game. This can also be used in the recruitment of players.

ACKNOWLEDGEMENT

We thank the Coach Herman Edwards ASU football coach and his staff for providing an avenue to test this model. We thank Vice President for Athletics and Athletics Director Ray Anderson and his staff for their support and critical review into the research and model fit.

REFERENCES

Wilson, J.R. and Lorenz, K. (2003). Modelling Correlated Binary Responses using SAS R and SPSS. Springer.
Ayers, R. (2018, January 24). How big data is revolutionizing sports. Dataconomy. Retrieved from: https://dataconomy.com/2018/01/big-data-revolutionizing-favorite-sports-teams/
Steinberg, L. (2015, August 18). Changing the game: The rise of sports analytics. Forbes. Retrieved from: https://www.forbes.com/sites/leighsteinberg/2015/08/18/changing-the-game-the-rise-of-sports-analytics/#13a9d89b4c1f
Russo, R. D. (2018, October 25). Going for it: 4th-down makes fuel football scoring surge. AP News. Retrieved from: https://www.apnews.com/13ab8f9043f44acfbcafcf1f24c8b078
Gorski, C. (2017, February 3). Math to football coaches: ‘Be more aggressive on 4th down’. Retrieved https://www.insidescience.org/news/math-football-coaches-be-more-aggressive-4th-down
Dalen, P. (2013, November 15). Conventional wisdom be damned: The math behind Pulaski Academy’s offense. https://www.footballstudyhall.com/2013/11/15/5105958/fourth-down-pulaski-academy-kevin-kelley
Hall, R. (2018, August 18). Should NFL teams punt so often? Retrieved from https://www.stepwisedigressions.com/single-post/2016/08/29/should-nfl-teams-punt-so-often
Steyerberg, E.W. and Vergouwe, Y. (2014). Towards better clinical prediction models: Seven steps for development and an ABCD for validation. Eur Heart J, 35(29): 1925-31.
Chu, P.D. and Wang, W. (2019): Empirical study on relationship between sports and analytics and success in regular season and postseason in Major League Baseball. Journal of Sports Analytics, 5(3), 205-222.
Ew, S. (2009). Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating. Springer.
Mr, C. (1999). Bootstrap Methods: A Practitioner’s Guide. John Wiley & Sons.
Trahan, K. (2015, July 15). 6 charts that show college football conferences have their own offensive identities. SB Nation. Retrieved from: https://www.sbnation.com/college-football/2015/7/15/8821257/college-football-conference-offenses-styles
Frank E. and Harrell, J. (2015). Regression modeling strategies with applications to linear models, logistic and ordinal regression, and survival analysis. Springer.
Duchnowski, M. (2017). Predictive models: Storing, scoring and evaluating SAS. http://support.sas.com/resources/papers/proceedings17/1334-2017.pdf
Efron, B. (1977). Rietz Lecture – Bootstrap methods – Another look at the jackknife. Ann Stat, 7(1): 1-26.