Which global tennis rating better measures player skill? Evidence from the 2022 USTA Junior National Championships

Authors: Rebecca L. Mayew¹ and William J. Mayew²

¹USPTA Tennis Coach, Cary, NC, USA
²Fuqua School of Business, Duke University, Durham, NC, USA

Corresponding Author:

William J. Mayew
Fuqua School of Business
Duke University
100 Fuqua Drive
Durham, NC 27708
919-660-7781
wmayew@duke.edu

Rebecca L. Mayew, MS is a tennis coach certified by the United States Professional Tennis Association (USPTA) with interests in player development and player performance measurement.

William J. Mayew, MS, PhD is a Professor of Business Administration at Duke University. His research focuses on performance measurement of financial analysts and public corporations as well as the prediction of financial reporting fraud.

Which global tennis rating better measures player skill? Evidence from the 2022 USTA Junior National Championships

ABSTRACT

Assessing relative player skill is important in many aspects of tennis. In 2008, the Universal Tennis Rating (UTR) was introduced as a global tennis player skill rating that put all players, regardless of gender, age or geographic location, on a common scale. The International Tennis Federation (ITF) recently launched a competitor rating called the World Tennis Number (WTN). The purpose of this paper is to provide evidence on which rating is a superior measure of player skill. We assume better skilled players are more likely to win tennis matches and examine whether UTR or WTN ratings better predict head-to-head match success using 1,532 matches played by 870 participants at the 2022 United States Tennis Association (USTA) Junior National Championships. We observe classification accuracy of 73.9% and 70.4% for UTR and WTN ratings, respectively. Both classification accuracy levels are statistically greater than chance and approximate the accuracy level observed for bookmakers at the professional level. UTR and WTN rating classification accuracy does not statistically differ between ratings in the sample overall, by age division, by gender, by match format, or by the magnitude of player rating differences. We conclude that UTR and WTN ratings are equivalent measures of player skill based upon their ability to predict match outcomes. These findings provide initial empirical evidence important to tennis organizations making rating adoption decisions, tennis coaches seeking play parity, tournament directors seeding players and college coaches screening potential recruits. We provide mapping functions between UTR and WTN ratings for situations where players have one rating but not the other.

Keywords: Universal Tennis Rating (UTR), World Tennis Number (WTN), junior tennis, match forecasting, classification accuracy

INTRODUCTION

Assessing relative player skill is important in many aspects of sports administration. Player rankings serve as one popular tool for assessing whether a given player is more skilled than another. In professional tennis, rankings are used for tournament selection and seeding to help ensure competitive balance. Competitive matches are more entertaining because the outcome is more uncertain, and spectators are known to have preferences for uncertainty in match outcomes (9). Player rankings for touring tennis professionals are provided by the Association of Tennis Professionals for men (1) and Women’s Tennis Association for women (38), and these rankings have been extensively studied in the literature (7).

At the junior (i.e. pre-professional 18 years old and under) level, player rankings also exist, but not for the purpose of maximizing spectator enjoyment. Rather, junior rankings serve the purpose of ensuring fairer matchups and facilitating player skill development. For junior players in the United States wanting to play competitively, the United States Tennis Association (USTA) offers gender and age specific rankings at the district, sectional and national level, where ranking points accumulate from playing USTA sanctioned events (36). An inherent limitation of player rankings at the junior level, however, is that relative skill comparisons can only be made among players that are part of the same ranking. Consider a tennis coach interested in maximizing the development of a 16-year-old boy with a USTA national ranking of #50 (where #1 is the highest) and is seeking a comparably skilled practice partner. Locally available practice partners include a 16-year-old girl with a national USTA ranking of #5, an 18-year-old boy with a USTA national ranking of #100, a 16-year-old boy with a USTA national ranking of #200, and a 16-year-old boy without a USTA national ranking at all. Which player should the coach choose?

This question is difficult to answer because rankings are not comparable across genders (the 18-year-old girl) or age divisions (the 18-year-old boy). The #200 ranked 16-year-old boy may not necessarily be a less skilled practice partner. If he does not exclusively play USTA events to obtain ranking points, but rather plays both USTA events and high school tennis where no USTA ranking points are available, the relatively low national ranking may be deceiving. Further still, the 16-year-old boy without a USTA national ranking may be an exceptionally skilled player but not have a USTA ranking because he exclusively plays International Tennis Federation (ITF) tournaments globally rather than USTA events in the United States.

To overcome these inherent limitations of rankings and to facilitate skill comparisons more universally, the Universal Tennis Rating (UTR) was introduced in 2008. UTR ratings are designed to promote fair and competitive play by providing a measure of player skill on a common scale regardless of age, gender or geographic location (30). Owned by Universal Tennis, LLC, UTR ratings have been widely adopted, with more than 2.3 million UTR ratings for players spanning roughly 125 countries (24). In 2020 Universal Tennis, LLC, announced its own world tour where players qualify and are seeded based on their UTR rating (26).

Despite the existence and popularity of UTR ratings, we know of no research investigating how well UTR ratings capture player skill. Moreover, in 2019, the ITF announced it was developing a competitor rating called the World Tennis Number (WTN) (14). In 2022, the ITF began using WTN as part of its official acceptance criteria for tournaments on the ITF junior world tour circuit (15). Additionally the USTA officially adopted WTN, as opposed to UTR, for use in junior tournaments (37).

Given two competing global rating systems now exist, how does each rating perform as a measure of junior tennis player skill and is one rating superior to the other? The purpose of this paper is to provide initial empirical evidence to answer this question. Universal Tennis, LLC, refers to its own UTR rating as “the world’s most accurate tennis rating system” in press releases (32) but we are aware of no existing empirical evidence that documents superiority of UTR ratings over WTN ratings. It is not possible to make theoretical predictions regarding how well UTR ratings or WTN ratings capture player skill based on the underlying algorithms as they are proprietary. Superiority of one rating over another cannot easily be inferred from adoption by tennis organizations thus far either. On the international front, Tennis Australia has adopted UTR as its official rating system (27) in contrast to the USTA’s adoption of WTN (34). Adoption also varies domestically. Within the United States, junior players competing as part of their high school team use UTR as the official rating (24), which contrasts with junior players participating in USTA tournaments that use WTN. For United States college players, the Intercollegiate Tennis Association (ITA) partnered with Universal Tennis in 2017, referring to the partnership as affirming the UTR rating as “the one reliable metric anyone can use to gauge a tennis player’s skill” (12). Nonetheless the ITA switched its partnership to the ITF in 2023, making WTN the new official rating system for college tennis players in the United States (13).

Answering our research question is important to organizations that oversee junior tennis, junior tennis tournament directors, junior coaches interested in skill development and college coaches who recruit junior tennis players. For our empirical strategy, we assume higher skilled players are more likely to win a match in head-to-head competition and assess whether UTR or WTN ratings more accurately identify match success.

METHODS

Sample

We analyzed match results from the 2022 USTA Junior National Championships. We chose this particular tournament for five reasons. First, we studied a junior tournament as opposed to a professional tournament due to data availability. While UTR and WTN ratings exist for professional players, only UTR ratings are publicly observable. When a player becomes a professional, the WTN rating changes from a numeric rating to a non-numeric designation of “Pro Zone”. Second, among junior tournaments, the Junior National Championships was a national level tournament as opposed to a district or sectional tournament. National tournaments allow for a more representative sample of junior tennis players in the United States. Additionally, qualifying for national tournaments requires players to have played and won more matches than are typically required for district or sectional tournaments. Because WTN for players in the United States rely in part upon past USTA matches, using a national tournament helps ensure a critical mass of matches is available for constructing the WTN rating.

Third, among all national tournaments held by the USTA, the Junior National Championships arguably offered the highest incentives for players to give their best effort because this is the only national tournament where the winner of the 18 and under (16 and under) division receives entrance into the main draw of the US Open (Junior US Open). Fourth, over 1,500 matches were available for analysis in this tournament, which should facilitate appropriate statistical power for our analysis given other published research studying professional Grand Slam matches has used samples as small as 999 (16) to 1,081 matches (19). Finally, from a practical standpoint, WTN ratings were introduced in the United States in June of 2022 and this tournament offered one of the first opportunities to empirically examine WTN ratings.

The 2022 Junior National Championships were played in Kalamazoo, Michigan from August 5-14, 2022 for boys (USTA Tournament ID 22-59861) and in San Diego, California from August 6-14, 2022 for girls (USTA Tournament ID 22-97375). Each location hosted hard court matches in both a 16 years and under division (16u) and an 18 years and under division (18u). A 256 player draw was used for all age divisions, which implies 256 matches in the main draw inclusive of a match to determine third place played by the losers in the semifinal round. The consolation draw was comprised of 251 matches. Each age division at each location therefore had 507 matches, implying 2,028 matches overall. Additionally, while a 256 player draw was used, only 224 players were accepted for participation because the top 32 players were given first round byes. This implies the full sample of participants was 896 (224×4). To identify individual participant names and who won each match, we reviewed the publicly available draw sheets by entering the Tournament ID into the USTA tournament website (35).

UTR ratings were obtained from player profile pages via a power subscription to the UTR App (31). WTN ratings were obtained from player pages on the WTN rating website (40). We collect UTR and WTN ratings as of the morning of August 5, 2022 Eastern Standard Time prior to any matches starting. The WTN rating was missing for 26 players. The missing players were comprised of two boys from the 16u division, 16 boys from the 18u division and 8 girls from the 18u division. Missing values pertain primarily to top players who have already begun playing professional level tournaments. We required non-missing UTR and WTN ratings for each player in order to make comparisons. We removed 6 matches with no determinable skill differential because either the WTN rating or UTR rating between players was exactly equal. Overall, these criteria resulted in a final sample of 1,532 matches and 870 unique players.

UTR Ratings data

UTR ratings are owned and operated by Universal Tennis, LLC, and were originally developed by Dave Howell, a tennis coach interested in facilitating more level-based play to maximize junior tennis player development. A proprietary algorithm rates players on a scale between 1.00 (lowest skill) and 16.50 (highest skill) based on up to 30 match results from the last 12 months (22). Players with values from 1-4 are referred to as beginners, 5-8 as having intermediate skills, 9-12 as advanced and greater than 13 as professionally skilled (29). Match results considered when deriving a player’s UTR rating can come from many sources and include, but are not limited to, USTA matches, ITF matches, high school matches and matches played as part of Universal Tennis events (2). UTR ratings are available in real time and are updated daily. A player’s UTR rating can increase by winning more games than expected for the match. Given the pre-match UTR rating differential between players, the UTR algorithm expects a certain percentage of total games to be won by each player. The player who performs better than expected will experience an increase in their UTR rating while the opposing player UTR rating will decrease. The algorithm’s expected value is not disclosed to the players.

WTN Ratings data

WTN ratings are owned and operated by the ITF. Like UTR, WTN is a skill rating system based on a proprietary algorithm and is available in real time. WTN ratings, however, are on a different scale than UTR and range from 40.00 (lowest skill) to 1.00 (highest skill). Players with WTN ratings from 40-30 are considered novices, 30-20 as having intermediate skill, 20-10 as having advanced skill, 10-3 as top level amateurs, and 3-1 are considered professionally skilled (33). Also in contrast to UTR ratings, WTN ratings are updated weekly, and are based on match data from ITF events in addition to match data shared with the ITF by a player’s national association from 2016 onward (39). For example, in the United States, USTA match results are provided for inclusion in WTN ratings and in Great Britain, Lawn Tennis Association matches are provided for inclusion. Compared with UTR ratings, WTN ratings are derived using a longer match history, but the types of matches considered are more narrowly restricted as they come from the player’s national association and ITF events only.

A player’s WTN rating can increase by obtaining set scores better than expected in each set played during a match. For a given pre-match WTN rating differential, the WTN algorithm expects a certain match outcome. The WTN rating will increase for the player who performs better than expected and decrease for the other player. Players do not observe the WTN algorithm expectations. A final difference between UTR and WTN ratings is that UTR ratings are publicly displayed for all players including professionals, while WTN ratings cease being numerically displayed when a player becomes a professional.

Statistical analysis

A number of studies have examined match outcomes of professional tennis players using paired comparison models (3, 5, 8, 16, 19). These studies commonly utilize regression analysis to estimate the likelihood that the favored player wins as a function of the skill difference between the two players competing. Following this literature, we estimated the following logistic regressions for match m with robust standard errors via the statistical software STATA/SE 17.0:

The dependent variable in equation 1a (1b), FAV_UTR_WIN (FAV_WTN_WIN), is an indicator variable that equals 1 if the player favored to win per their UTR (WTN) rating wins the match and 0 otherwise. The independent variable in equation 1a (1b), UTR_DIFF (WTN_DIFF), is the difference in the UTR (WTN) rating between the favored player and the unfavored player. Higher (lower) UTR (WTN) values indicate more skill. Therefore, as UTR_DIFF (WTN_DIFF) becomes more positive (negative) the skill difference between the favored and unfavored player increases, which should translate into a higher likelihood of the favored player winning the match. Observing α₁>0 (β₁<0) when estimating equation 1a (1b) would indicate that UTR (WTN) rating differences can help predict match outcomes. However, the magnitude of the α₁ (β₁) coefficient in equation 1a (1b) does not indicate how accurate UTR (WTN) ratings are, nor is α₁ comparable to β₁ because UTR and WTN ratings are on different scales.

As a result, we assessed rating classification accuracy via the area under the receiver operator characteristic curve (AUC). In signal detection theory, the receiver operator characteristic (ROC) curve is designed to assess how well a given continuous predictor variable can discriminate between outcomes as well as facilitate a comparison among alternative predictors (11). A ROC curve captures the probability of detecting a true signal against a false signal. Deriving the curve requires estimation of the true positive rate (i.e. sensitivity) and the false positive rate (i.e. 1- specificity) for a continuum of cutoff points, c, applied to the fitted value from an estimated logistic regression. Fitted values from logistic regressions, such as those outlined in equations 1a and 1b, are bounded between zero and one by construction. Suppose c = 0.75. All observations where fitted values are equal to or above (below) 0.75 are considered situations where the higher rated player is predicted to be the winner (loser). A two-by-two contingency table can be constructed at c = 0.75 comparing the prediction of whether the higher ranked player won or not versus whether the higher ranked player actually won or not. With these data, sensitivity and specificity can be calculated at c = 0.75 and the process can be repeated for all values from c = 0 to c = 1. Plotting sensitivity against 1-specificity for all possible cutoff points generates the ROC curve and the area under this curve is the AUC.

AUC values equal to 0.50 represent classification accuracy at random chance levels and an AUC equal to 1.00 indicates perfect classification accuracy. Intermediate AUC values are commonly characterized as acceptable when between 0.70 and 0.80, excellent when between 0.80 and 0.90 and outstanding when over 0.90 (11). AUCs have been used in the medical sciences to assess how well diagnostic tests predict whether a patient is diseased or not (42). In the social sciences, AUCs have been used to assess how well financial statement measures can predict fraud in equity security markets (10) and how well post-conviction risk assessments predict future arrests in the criminal justice system (20). We used the AUC to ascertain how well UTR and WTN rating differences predict match outcomes in their own right and also to assess whether one rating was more accurate than the other.

RESULTS

Descriptive statistics

Descriptive statistics of the 870 sample players are provided in Panel A of Table 2. The average sample UTR rating was 9.83, ranging from 5.59 (lowest skill) to 13.82 (highest skill). The average WTN rating was 15.63, with a range between 26.06 (lowest skill) and 4.37 (highest skill). Together, under either UTR or WTN, the lowest skilled player in our sample was above the beginner/novice category, as one would expect for a prestigious national junior tournament. The highest skilled players border on the high end of the amateur level and the low end of the professional level, which again is expected given the winner of the 18u division will be accepted into the main draw of the US Open, which is a professional tournament. Overall, our sample can be characterized as capturing the entire rating range above novice and below professional for both UTR and WTN. The average player in our sample would be characterized as advanced by UTR standards and intermediate by WTN standards. Whether the subjective term “advanced” has a systematically different meaning than “intermediate” is not something we can assess. However, we can statistically examine the association between ratings. We found UTR and WTN ratings to be highly negatively correlated (ƿ= -0.92, p-value < 0.001). This extremely high correlation is suggestive evidence that UTR and WTN ratings will likely perform similarly as predictors of match outcomes.

Descriptive statistics at the match level are displayed in Panel B of Table 2. We found that the favored player based on UTR ratings wins the match 75% of the time, very similar in magnitude to the 76% observed for WTN ratings. These percentages fall within the range documented in the literature using professional players. The favored player won 71.2% of the time among Grand Slam matches studied in del Corral and Prieto-Rodriguez (8), and between 75.8% and 81.8% of the time in Boulier and Stekler (3).

Logistic Regression Estimation

Estimation of 1a is provided in column 1 of Table 3 Panel A. We found a positive and statistically significant coefficient on UTR_DIFF (α₁=2.066, p-value < 0.001), suggesting that as the difference in UTR rating between the favored and unfavored player increases, the likelihood the favored player wins increases. Estimation of 1b is provided in column 1 of Table 3, Panel B, where we observed a negative and statistically significant coefficient on WTN rating differences (β₁=-0.474, p-value < 0.001). Because smaller WTN values indicate higher skill, as WTN_DIFF becomes more negative, the skill difference of the favored player increases as does the likelihood that the favored player wins. While these coefficient estimates establish both UTR and WTN ratings are statistically significant predictors of match outcomes, the coefficients cannot be directly compared to draw an inference on which is a better predictor. To draw such an inference requires the AUC derived from the regression estimations.

The first two rows of Figure 1 display the AUC for UTR and WTN ratings respectively, along with black bars designating the related 95% confidence intervals. The classification accuracy of UTR rating differences was 73.9% (95% CI, 71.2%-76.6%) compared with 70.4% for WTN (95% CI, 67.5%-73.2%). The 95% confidence intervals in both cases did not overlap with chance AUC levels of 0.50, but did overlap with each other. This implies that overall, UTR and WTN rating differences between competitors can predict match outcomes at better than chance levels, but UTR and WTN ratings exhibit statistically equivalent classification accuracy.

Figure 1. Classification Accuracy Comparisons.
Note. This figure reports classification accuracy of WTN and UTR ratings for match outcomes using the area under the receiver operator characteristic curve (AUC). AUC values of 0.500 represent chance levels and 1.000 represents perfect classification accuracy. Each gray bar in each row graphically depicts the AUC, with the actual AUC value displayed in white font and the 95% confidence interval error bars displayed in black. N is the number of matches used in the estimation of the AUC. Rows 1 and 2 display AUC estimates for both ratings using the overall sample, rows 3-6 display AUC estimates separately by rating and player gender, rows 7-10 display AUC estimates separately by rating and age group (18 and under or 16 and under), rows 11-14 display AUC estimates separately by rating and whether the match format dictates a 10-point tiebreaker (consolation) or a full set is played (main draw) in the event a third set is reached, and rows 15-18 display AUC values separately by rating and player skill difference where UTR rating differences above the sample median and WTN values below the sample median are considered large differences and small differences otherwise. The number observations are not equal in rows 15 and 16 and rows 17 and 18 because of multiple rating value differences exhibiting a tie with the sample median.

While the AUC serves as our measure of classification accuracy and enables statistical tests of our observed AUC both between ratings and against chance, we note that AUC is not the only way to assess classification accuracy. An alternative measure is a Brier score, which is popular in the extant literature studying professional tennis players. A Brier score, B, is defined as B = (∑_m=1^N(P_m-X_m)² )/N where P is the predicted probability that the favored player wins match m, X equals one if the favored player actually wins and zero otherwise and N is the total number of matches. With Brier scores, a value of 0.00 represents perfect accuracy and 0.25 represents random chance. Table 3 reveals that the overall sample Brier score was 0.162 for UTR ratings and 0.167 for WTN ratings. Both Brier scores imply UTR and WTN ratings exhibited classification accuracy rates above chance levels and very similar to one another. We therefore reach the same conclusion with Brier scores as we do with AUC.

Brier scores have been reported in research analyzing professional tennis players, which enabled us to compare our findings to professional tennis matches. In an analysis of Grand Slam match outcomes from 2005-2008, del Corral and Prieto-Rodriguez (8) reported Brier scores ranging from 0.167 to 0.205. We note that the regression models used in del Corral and Prieto-Rodriguez (8) to predict match outcomes incorporated differences in player rankings as well as a host of additional covariates including differences in player height, age, and handedness. Such player characteristics are not publicly available for the junior players we study and so we cannot include them in equation 1a or 1b. However, that our models using a single covariate achieve Brier scores similar in magnitude to models using multiple covariates suggest UTR and WTN ratings are powerful classifiers.

As another point of reference for our Brier scores, we considered the work of Lisi and Zanella (19) that used bookmaker odds as a predictor of match outcomes at Grand Slam events. They documented a Brier score of 0.166 for bookmakers, which implies classification accuracy of both UTR and WTN ratings at the junior level is nearly identical to the classification accuracy of bookmakers at the professional level.

Exploratory subsample analysis

Both UTR and WTN ratings are marketed as skill measures that do not depend on gender or age. Nonetheless, physical maturation and tennis play can vary by both gender and age through adolescence (23, 28) and our sample is comprised of adolescents. Gender has also been shown to impact match competitiveness at the professional level (6, 9, 17). To assess whether the overall findings vary by gender or race, we re-estimated equations 1a and 1b separately for boys and girls matches and separately for 16u and 18u matches. Results are reported in columns 2 through 5 in Table 3. With respect to gender, we found the classification accuracy of WTN ratings for girls (boys) was 69.4% (71.4%), while it was 73.0% (74.9%) for UTR ratings. Figure 1 reveals the 95% confidence interval overlaps for all four estimates, implying classification accuracies were equivalent statistically. Similar inferences were observed when partitioning by age. The AUC ranged from 69.6% for 16u matches using WTN ratings to 74.8% for 16u matches using UTR ratings. All confidence intervals overlapped, implying no statistically significant differences in classification accuracy.

While the results thus far suggest no differences in the classification accuracy of UTR relative to WTN ratings, it is conceivable differences exist on the basis of measurement precision. WTN ranges from 40 to 1 while UTR only ranges from 1 to 16.5. Without knowing the actual algorithm that generates these values it is not possible to make formal predictions about differential precision in the measures. However, if these differences in scale imply differences in precision, it may be the case that WTN ratings outperform UTR ratings only in situations where the prediction task is especially difficult. We considered two situations.

The first situation was based upon whether the match was played as part of the main draw or the consolation draw. Winners are determined in main draw matches by the best of three tiebreak sets. In the consolation draw, winners are determined by the best of three tiebreak sets, but if a third set is reached a ten point tiebreaker is played instead of a full third set. As a result, there is relatively less tennis played to determine match outcomes in the consolation draw, making the outcome more stochastic and the prediction task more difficult. WTN ratings may therefore exhibit superiority over UTR ratings when we focus on consolation draw matches. Re-estimation of equation 1a and 1b by whether the match was part of the main draw or consolation draw is presented in columns 6 and 7 of Table 3. The AUC for main draw UTR and WTN ratings were 75.4% and 72.6%, respectively. Consolation draw AUC values were lower at 71.9% and 67.6% for UTR and WTN ratings, respectively. However, Figure 1 revealed the 95% confidence intervals overlap for WTN and UTR ratings in both the main draw and the consolation draw, suggesting no superiority in WTN rating classification accuracy.

The second situation we considered was whether there was a large or small difference in the ratings between players. We define large and small differences by reference to the sample median differences reported in Panel B of Table 2. Matches where UTR_DIFF was greater than (less than or equal to) the median of 0.63 were considered large (small) differences. Matches where WTN_DIFF was less than (greater than or equal to) the median of -2.20 were considered large (small) differences. Larger rating differences should better distinguish the more skilled player relative to the less skilled player. Smaller differences, however, make it more difficult to ascertain with certainty who the better player is, making the prediction task more difficult. If WTN provides a more precise measure of player skill, WTN may exhibit superior classification accuracy when player rating differences are relatively small. Re-estimation of equation 1a and 1b by the magnitude of the rating differential is provided in the final two columns of Table 3. We found that when the skill difference was large, the AUC equaled 69.5% and 66.7% for UTR and WTN ratings, respectively, and Figure 1 revealed these reject equivalence to chance levels. For small differences the AUC values decreased to 59.8% and 54.8% for UTR and WTN ratings, respectively, and Figure 1 revealed both values also reject chance levels. However, the 95% confidence intervals overlapped for matches with large skill differences and matches with small skill differences, suggesting that UTR and WTN ratings provided statistically equivalent classification accuracy regardless of magnitudes in skill rating differences.

DISCUSSION

This study provided, to our knowledge, the first empirical evidence comparing UTR and WTN ratings as measures of player skill. Under the assumption that more skilled players were more likely to win a match in head-to-head competition, we assessed the classification accuracy of UTR and WTN ratings for predicting match outcomes. Differences in competitor UTR ratings predicted match outcomes with 73.9% accuracy, compared with 70.4% for WTN ratings. Classification accuracy exceeded chance levels of 50% in both cases, but the accuracy rates were not statistically different from one another. Moreover, we found no statistical differences in classification accuracy between UTR and WTN ratings when we investigated match outcomes by age division, gender, match format, or by the magnitude of the difference in player ratings. The lack of superiority in classification accuracy of one rating over another was consistent with both ratings capturing player skill to a very similar extent. Overall, we conclude that UTR and WTN ratings are equivalent measures of player skill for the sample analyzed here.

To execute our analysis comparing UTR and WTN ratings, we required sample observations to have both ratings. We acknowledge, however, that situations can arise where a junior player does not have both a UTR and WTN rating. For example, consider a player in the United States who currently only plays high school tennis but wishes to play a USTA event that will seed players based on WTN ratings. Such a player will have a UTR rating because high schools utilize UTR ratings (24) but will not have a WTN as a result of not playing USTA matches previously. How should a tournament director handle seeding such a player? From our data, we can estimate mapping functions between UTR and WTN ratings by estimating an Ordinary Least Squares regression where the dependent variable is the WTN rating and the independent variable is the UTR rating. That estimation reveals that the predicted WTN rating is mathematically equal to (39.1836 – 2.3973*UTR). Inputting the sample average UTR rating in Table 2 of 9.83 into this formula we uncover the average sample WTN rating. We expect situations where a player has a WTN rating but not a UTR rating to be relatively rare in the United States at this point because USTA matches are inputs to both UTR and WTN rating algorithms. For completeness, we nonetheless regress UTR ratings on WTN ratings and uncover that the predicted UTR rating is mathematically equal to (15.3567 – 0.3539*WTN). We caution that our mapping functions are derived player ratings from one national tournament and as such care should be taken when using the mapping function for UTR and WTN ratings that fall outside of the range of UTR and WTN values we study.

Our findings are subject to the following limitations. First, how our results generalize outside the United States is unclear. As more and more countries adopt WTN ratings, repeating the analysis presented here outside of the United States will help assess whether classification accuracy varies in different geographic locations. Second, our sample of junior national level tennis players does not allow us to speak to whether rating-based classification accuracy differs for matches between novices or between professionals. Repeating our analysis for local and sectional USTA tournaments would facilitate an analysis of more novice players, but analysis of professional matches remains unfeasible given the lack of WTN data. If at some point WTN ratings become publicly available at the professional level, we could assess whether ratings add any predictive power above the growing list of specific player and match characteristics that are associated with winning tennis matches (41). For example, Ma, Liu and Ma (21) established the importance of player specific performance on professional match outcomes including the number of aces, first serve percentages and break points converted. Such rich point level data is, however, not currently available for amateurs making such an analysis for junior tournaments impossible. Third, Ma, Liu and Ma (21) conducted a longitudinal analysis highlighting that certain determinants of winning have changed over time at the professional level. In the context of our study, examining future instances of the USTA Junior National Championships will help establish whether the classification accuracy of UTR and WTN ratings vary over time.

CONCLUSIONS

Given two global tennis player skill ratings now exist, we assessed whether one rating was superior to the other by comparing their classification accuracy for match outcomes. Using a sample of matches from a junior national tournament, we found both UTR ratings and WTN ratings exhibited classification accuracy levels that exceeded chance levels and were similar in magnitude to the accuracy levels using bookmaker odds at the professional level. These findings suggested both UTR and WTN ratings were useful for measuring relative player skill. UTR and WTN rating classification accuracy levels did not statistically differ from one another, suggesting use of either rating was appropriate for assessing relative player skill. Our evidence using junior level tennis matches extends the literature studying how relative player rakings predict match outcomes at the professional level (7) and is of practical importance given the number of youth tennis players recently topped 6 million in the United States (25).

APPLICATIONS IN SPORT

Our findings provide useful information for tennis associations, tennis coaches, tournament directors, college coaches and academics who study tennis players. For tennis associations considering adopting an official rating, on the dimension of assessing player skill of junior competitors, we find one rating does not dominate the other. For tennis coaches interested in organizing practices to ensure play parity, tournament directors accepting and seeding players, and college coaches screening potential recruits our evidence suggests that either measure is equivalent in terms of assessing player skill. For academics who use player ratings when studying the determinants of tennis player skill (18) or developing tennis handicaps in junior tennis as a function of player skill differences (4), our findings imply that using either UTR or WTN ratings as a proxy for player skill is appropriate.

Acknowledgements

We received excellent research assistance from Roshan Banapuram Raja and Qiru (Vincent) Zhang. We appreciate helpful comments from Kyle Bunds, Jon Casper, Julio del Corral, Ales Filipcic, Will Mayew and Ian Mayew. The views expressed in this paper are those of the authors and do not represent positions of the United States Professional Tennis Association.

REFERENCES

ATP. (2022). Association of Tennis Professionals Singles Rankings. https://www.atptour.com/en/rankings/singles
Bader, S. (2022). What events count towards my UTR rating? https://support.universaltennis.com/en/support/solutions/articles/9000138934-what-events-count-towards-my-utr-rating-
Boulier, B. L., & Stekler, H. O. (1999). Are sports seedings good predictors?: an evaluation. International Journal of Forecasting, 15(1), 83-91. DOI: 10.1016/S0169-2070(98)00067-3.
Chan, T. C., & Singal, R. (2018). A Bayesian regression approach to handicapping tennis players based on a rating system. Journal of Quantitative Analysis in Sports, 14(3), 131-141. DOI: 10.1515/jqas-2017-0103.
Clarke, S. R., & Dyte, D. (2000). Using official ratings to simulate major tennis tournaments. International transactions in operational research, 7(6), 585-594. DOI: 10.1016/S0969-6016(00)00036-8.
del Corral, J. (2009). Competitive balance and match uncertainty in grand-slam tennis: effects of seeding system, gender, and court surface. Journal of Sports Economics, 10(6), 563-581. DOI: 10.1177/1527002509334650.
del Corral, J. (2019). Hitting the Ball Forward: The Economics of Racquet Sports. The SAGE Handbook of Sports Economics, 452. DOI: 10.4135/9781526470447.
del Corral, J., & Prieto-Rodríguez, J. (2010). Are differences in ranks good predictors for Grand Slam tennis matches?. International Journal of Forecasting, 26(3), 551-563. DOI: 10.1016/j.ijforecast.2009.12.006.
Gomez-Gonzalez, C., & del Corral, J. (2020). Professional tennis in the twenty-first century: Hawk-Eye on competitive balance. In Outcome Uncertainty in Sporting Events (pp. 27-43). Edward Elgar Publishing.
Hobson, J. L., Mayew, W. J., & Venkatachalam, M. (2012). Analyzing speech to detect financial misreporting. Journal of Accounting Research, 50(2), 349-392. DOI: 10.1111/j.1475-679X.2011.00433.x.
Hosmer Jr, D. W., & Lemashow, S. Applied Logistic Regression. New York: A Wiley Interscience Publication. 2000. Reference Source.
ITA. (2017). Intercollegiate Tennis Association and Universal Tennis announce five-year partnership agreement. https://www.itatennis.co/ita-archives/AboutITA/News/Intercollegiate_Tennis_Association_And_Universal_Tennis_Announce_Five-Year_Partnership_Agreement.html.
ITA. (2023). Intercollegiate Tennis Association Adopts ITF World Tennis Number as Exclusive Official Rating for College Tennis. https://www.wearecollegetennis.com/2023/01/05/wtn-named-official-rating-of-college-tennis/.
ITF. (2019). ITF and national associations announce world tennis number project. https://www.itftennis.com/en/news-and-media/articles/itf-and-national-associations-announce-world-tennis-number-project/.
ITF. (2022). ITF World Tennis Number FAQs for Junior World Tennis Tour. https://www.itftennis.com/media/7971/wtn-faqs-updated-0605.pdf.
Klaassen, F. J., & Magnus, J. R. (2003). Forecasting the winner of a tennis match. European Journal of Operational Research, 148(2), 257-267. DOI: 10.1016/S0377-2217(02)00682-3.
Krumer, A., Rosenboim, M., & Shapir, O. M. (2016). Gender, competitiveness, and physical characteristics: Evidence from professional tennis. Journal of Sports Economics, 17(3), 234-259. DOI: 10.1177/1527002514528516.
Kurtz, J. A., Grazer, J., Alban, B., & Marino, M. (2019). Ability for tennis specific variables and agility for determining the Universal Tennis Ranking (UTR). The Sports Journal.
Lisi, F. (2017). Tennis betting: can statistics beat bookmakers?. Electronic Journal of Applied Statistical Analysis, 10(3), 790-808. DOI: 10.1285/i20705948v10n3p790.
Lowenkamp, C. T., Holsinger, A. M., & Cohen, T. H. (2015). PCRA revisited: Testing the validity of the Federal Post Conviction Risk Assessment (PCRA). Psychological Services, 12(2), 149.
Ma, S. M., Liu, C. C., Tan, Y., & Ma, S. C. (2013). Winning matches in Grand Slam men’s singles: An analysis of player performance-related variables from 1991 to 2008. Journal of sports sciences, 31(11), 1147-1155. DOI: 10.1080/02640414.2013.775472.
Mali, M. (2021). UTR rating basics and how it works. https://blog.universaltennis.com/back-to-basics-what-is-utr-and-how-does-it-work/.
Martínez-Gallego, R., Villafaina, S., Crespo, M., & Fuentes-García, J. P. (2022). Gender and age influence in pre-competitive and post-competitive anxiety in young tennis players. Sustainability, 14(9), 4966. DOI: 10.3390/su14094966.
NFHS. (2021). NFHS announces Universal Tennis as new official partner. https://nfhs.org/articles/nfhs-announces-universal-tennis-as-new-official-partner/.
Racquet Sports Industry. (2021). 2021 TIA Tennis Industry Participation Report. Tennis Industry Magazine. http://www.tennisindustrymag.com/issues/202109/industry-research.pdf.
Rossingh, D. (2020). Larry Ellison-backed Universal Tennis invests $20 million in three-year global tour. Forbes. https://www.forbes.com/sites/daniellerossingh/2020/12/07/ellison-backed-universal-tennis-invests-20m-in-three-year-global-tour/.
Sanford, J. (2022). Tennis Australia becomes first governing body to adopt UTR as official rating system nationwide. Tennis.com. https://www.tennis.com/baseline/articles/tennis-australia-partners-universal-tennis-rating-sytem-first-governing-body.
Ucuz, I., Ucar, C., Gurer, H., & Yildiz, S. (2020). Hypothalamo-pituitary-adrenal axis activity and related factors in adolescent during a tennis tournament. DOI: 10.5455/annalsmedres.2020.08.801.
Universal Tennis. (2018). How UTR Rating Works. https://blog.universaltennis.com/how-utr-works/.
Universal Tennis. (2022). How UTR works. https://www.universaltennis.com/how-utr-works.
Universal Tennis. (2022). Universal Tennis App. https://app.universaltennis.com/.
Universal Tennis. (2022). Universal Tennis launches into pickleball with event management software and rating. https://www.tennis.com/baseline/articles/universal-tennis-expands-into-pickleball-with-new-rating-system-event-management.
USTA. (2022). ITF World Tennis Number. https://www.usta.com/en/home/play/itf-world-tennis-number.html.
USTA. (2022). ITF World Tennis Number FAQs. https://customercare.usta.com/hc/en-us/articles/4414716969492-ITF-World-Tennis-Number-FAQs.
USTA. (2022). Tournaments Lookup. https://playtennis.usta.com/tournaments.
USTA. (2022). United States Tennis Association Junior Tournaments Rankings. https://www.usta.com/en/home/play/rankings.html.
USTA. (2022). USTA launces ITF World Tennis Number widget online. https://www.usta.com/en/home/stay-current/national/usta-launches-itf-world-tennis-number-widget-online.html.
WTA. (2022). Women’s Tennis Association Singles Rankings. https://www.wtatennis.com/rankings/singles.
WTN. (2022). ITF World Tennis Number frequently asked questions. https://worldtennisnumber.com/eng/faq.
WTN. (2022). WTN Search players. https://worldtennisnumber.com/eng/player-search.
Yue, J. C., Chou, E. P., Hsieh, M. H., & Hsiao, L. C. (2022). A study of forecasting tennis matches via the Glicko model. Plos one, 17(4), e0266838. DOI: 10.1371/journal.pone.0266838.
Zou, K. H., O’Malley, A. J., & Mauri, L. (2007). Receiver-operating characteristic analysis for evaluating diagnostic tests and predictive models. Circulation, 115(5), 654-657. DOI: 10.1161/CIRCULATIONAHA.105.594929.