A New Market Research Approach in Sport-Data Mining

Submitted by: Chen-Yueh Chen & Yi-Hsiu Lin

Introduction

Numerous organizations in the field of business have shown that great success and lucrative outcomes can be accomplished through implementing data mining. For example, Wal-Mart used data mining and found a link between the sales of babies’ diapers and beer. Based on this result, Wal-Mart placed beer close to the babies’ diapers, which resulted in a significant increase in terms of beer sales (Saban, 2001). Another salient example is American Express. American Express built a data mining model to examine millions of data and calculated “purchase scores”—customer’s propensity to make purchases, which not only provided merchants with valuable information, but also reduced American Express’ marketing expenses (Saban, 2001). As a result, research efforts made in data mining are warranted due to numerous successes accomplished while utilizing it.

Although data mining has been widely and successfully used in the domain of business operations, data mining in sport is just in its infancy (Fielitz & Scott, 2003; Lefton, 2003). In other words, the sports industry has generally been a poor and light user of data mining (Jutkins, 1998). It turned out that few papers related to data mining in the area of sport were found in sport journals. However, Lewis (2004) pointed out that data mining will become a critical component of selling and marketing sports teams. Similarly, the concept of data mining will become main stream in sports as an effective complementary marketing tool in the future (Martin, 2005). As a result, data mining warrants sport marketing researchers’ attention and efforts.

The purpose of the present article was to advocate the data mining approach to be utilized in the sport industry in order to effectively achieve sport organizations’ marketing goals. The organization of the current article is as follows: first, definitions and benefits of data mining were discussed; second, successful cases of application of data mining in sport were illustrated; third, proposed techniques of data mining that are appropriate and potentially useful in the sport industry were described followed by discussions and conclusions.

Definitions and Benefits of Data Mining

Data mining is a process of extracting previously unknown, valid, actionable, and ultimately comprehensible information from large databases and then using the information to make crucial business decisions (Cabena et al., 1998). From a different perspective, Kotler (2003) described data mining as “involving the use of sophisticated statistical and mathematical techniques such as cluster analysis, automatic interaction detection, predictive modeling, and neural networking” (p.54). Most of the definitions of data mining fall into these two aforementioned categories. As a result, from the combination of the two definitions, data mining is the process of using sophisticated mathematical or statistical models to extract valuable, valid, and actionable information from a database to accomplish an organization’s goals. (For similar definitions, also see Berry & Linoff, 2004; Hair, Anderson, Tatham, & Black, 1998).

The benefits of executing data mining are as follows: implementing up-selling, increasing season-ticket sales, monitoring season-ticket usage, raising transplanted-fan ticket sales, and executing cross-selling (James, 2004). Additionally, other benefits include (a) retaining current customers, (b) determining customers’ lifetime value, (c) developing relationships with customers, (d) improving delivery of sales promotion, (e) reinforcing consumers purchase decisions, (f) customizing consumer services, (g) facilitating marketing research, (h) profiling the customers, and (i) identifying the best customers for an organization (Aaker, Kumar, & Day, 2000; “Happy Customer,” 2004; Kotler, 2003).

Cases of Executing Data Mining in Sport

Although data mining has not been as widely employed in sport as it has in business, various successful applications in sport still exist. The following observations demonstrate how effective data mining can be for sport organizations and how sport organizations benefit from implementing data mining.

Dick and Sack (2003) conducted a study about effective marketing techniques in the NBA. They contended that a more effective and efficient way to ensure that advertising messages are received by the target markets was to use data mining. Several NBA teams such as the Cleveland Cavaliers, the Seattle SuperSonics, the Portland Trail Blazers, and the Miami Heat have successfully utilized data mining. The Cleveland Cavaliers created a database that includes customers’ names, addresses, telephone numbers, and other detailed information on the products purchased. By analyzing that database, the Cleveland Cavaliers consistently gave a follow-up call to those who bought tickets from Ticketmaster to determine whether they were interested in other games or events (Bonvissuto, 2005). The Seattle SuperSonics also developed a data mining program to raise its revenues and increase its season ticket holders. Additionally, the Portland Trail Blazers analyzed their customer database to help forecast advertising revenues and spot ticket-sale trends (Whiting, 2001). Finally, Miami Heat officials contend that data mining delivers an even more effective targeted audience than traditional advertising or traditional mass-media marketing. By using data mining, the overall Miami Heat season-ticket renewal rate in 2005 was expected to be around eight-five percent (Lombardo, 2005).

Data mining can also be applied by coaches to identify player patterns that box scores do not reveal, which helps win games by extracting relevant information from the database. In a 1997 playoff series, the Orlando Magic discovered Darrell Armstrong’s talent through data mining and inserted him into the starting lineup. The coach increased Armstrong’s responsibility in this series because the data showed that if Armstrong was on the court, the probability of an Orlando Magic win increased. Finally, the Orlando Magic won two consecutive games, and Armstrong personally won the Sixth Man Award in 1999 (Restivo, 1999). In addition, Brian James, assistant coach of the Toronto Raptors, employed a data mining application to know what kinds of plays opponents will use. Utilizing data mining in this way makes it easier for coaches to make decisions about when and how to position their players for maximum effect (Baltazar, 2000). Francett (1997) and Hudgins-Bonafield (1997) stated that data mining applications help analyze a huge amount of data to reveal winning player combinations for coaches. Moreover, the data mining approach to postgame analysis and improvement takes much less time than the traditional approach—forever rewinding the videotape. Namely, data mining makes analysis more efficient.

In summary, not only have the major league teams adopted data mining to increase ticket sale revenues and season-ticket holders’ renewal rates, but also sport coaches have utilized data mining to achieve their goals and objectives. Information extracted from records about players’ performance enables coaches to position and direct their players in a game. Consequently, data mining is a powerful technique with flexible applicability in sport.

Proposed Techniques for Data Mining in Sport

This section briefly presents an overview of the frequently used statistical models or techniques for data mining in terms of marketing, sales, and customer relationship management. The tasks that have been performed in the area of data mining are as follows: classification, estimation, prediction, and profiling (Berry & Linoff, 2004). The overview will include the definitions and the properties of the models along with the circumstance under which a model should be used. These models include (a) Discriminant Analysis, (b) Logistic Regression, (c) Decision Trees, (d) Artificial Neural Networks, (e) Collaborative Filtering, (f) Market Basket Analysis, and (g) Survival Analysis.

Discriminant Analysis

Discriminant analysis is a statistical method using linear functions to distinguish groups based on the independent variables. Discriminant analysis is the appropriate statistical technique when the dependent variable is categorical and the independent variable is continuous (Hair, Jr., Anderson, Tatham, & Black, 1998; Tabachnick & Fidell, 2001). It is an old and extensively used parametric statistical approach in classification. It works by comparing a weighted sum of the input variables to a constant value in the weights, and the constants are determined in such a way that the least square error of misclassification is minimized (Tabachnick & Fidell, 2001). Sport organizations can use it, for example, to classify customers as high-, medium-, or low-value customers in terms of their monetary contribution to the sport organization. This enables a sport organization to allocate marketing resources more effectively.

Logistic Regression

Logistic regression is a widely used technique for classifying subjects into two mutually exclusive exhaustive categories (Ratner, 2003). In logistic regression, the maximum likelihood estimation is employed to estimate the probability of classifying a subject into a group. The logistic regression is often used as a benchmark in the field of data mining when comparing the accuracy of model prediction. Professional sport teams can employ it to investigate the characteristics of the season ticket holders who end up terminating season ticket purchases and predict the probability of terminating season ticket purchases.

Decision Trees

Decision trees are one of the most popular methods in data mining and are frequently used for data exploration (Borisov, Chikalov, Eruhimov, & Tuv, 2005). Decision trees are a data mining technique that can be used to divide or partition a large collection of heterogeneous data into successively smaller sets of homogeneous data by using a sequence of simple decision rules with respect to a selected target variable (Berry & Linoff, 2004). In essence, decision trees are utilized to partition the data by employing independent variables to identify the subgroups that contribute most to the dependent variable (Chakrapani, 2004). This technique can also be used to classify and/or predict in the sport settings.

Artificial Neural Networks

Artificial neural networks (ANNs) are computer-intensive computational techniques that simulate the function of neural activity in a human brain (Chakrapani, 2004). To put it differently, ANNs are the computational tools for data exploration and model development to help identify patterns or structures in the data (Smith & Gupta, 2002). ANNs consist of three layers of processing units: input layer, hidden layer, and output layer. Since the final decision is binary (0 or 1), the value for the output layer is the predicted value of the decision. If the output value is 0.5 or above, then the decision is assumed to be an acceptance, while if it is 0.5 or below, then the decision is a rejection (Kumar & Olmeda, 1999). Compared to the traditional statistical methods, which are usually linear-based, ANNs use the non-linear approach (Cho & Ngai, 2003) and do not depend on a set of specified procedures. ANNs have been widely used in recent years as a classification technique and have been applied to a variety of business fields including bond rating, bankruptcy prediction, and stock market prediction. ANNs are superior over the regression-type models because of their ability to detect non-linear relationships and to adapt to changing input (Chakrapani, 2004). Sport organizations can use it to predict and classify their customers to better allocate marketing resources, i.e., more accurately segmenting the market and targeting custoemrs.

Collaborative Filtering

Collaborative filtering (CF) is a new technique in the area of data mining, assisting people to make choices based on other people’s choices. Similarly, Berry and Linoff (2004) described collaborative filtering as an approach to making and providing personalized recommendations. The collaborative filtering approach starts with evaluating a history of customer product preferences as well as demographics and ends up with determining similarities so that people who may like the same products will be put together (Berry & Linoff, 2004). Namely, this approach employs the reactions or preference of others within the database as well as their similarity to generating recommendations. Professional sport teams can utilize it to make recommendations or promote sporting events/sport merchandise to their customers based on what other customers purchased or consumed.

Market Basket Analysis

Market basket analysis is a data mining technique aiming to understand point-of-sale transaction data (Berry & Linoff, 2004). In other words, market basket analysis deals with such business problems as which products tend to be purchased together as well as which are most appropriate to promotion (Berry & Linoff, 2004). To perform basket analysis, three levels of market basket data are required: customers, orders, and items. Customer data refers to customer information including customer’s IDs, names, addresses and so on. Order data represents a single purchase event by a customer including the total amount of the purchase, payment type and whatever other data is related to and relevant to this transaction. Item data contains the price paid for the purchased item and the number of items (Berry & Linoff, 2004). Sport organizations can acquire benefits, such as deciding which product should have a promotion, the segmentation of customers, and the identification of the relationships among product items by using this technique.

Survival Analysis

Survival analysis method, also known as Event History Analysis, Reliability Analysis, Time to Failure and Duration Analysis, is developed mainly to deal with the probability that a certain event will occur but also deals with when it will occur (Harrison & Ansell, 2002). Namely, it deals with the time between events (Drye, Wetherill, & Pinnock, 2001). For example, survival analysis can identify when existing customers will re-attend professional sporting events based on their past game attendance records, which provides valuable information for professional sport teams in terms of promotional decision-making.

Discussions and Conclusions

Both advances in information technology and organizations’ needs have facilitated the upsurge of data mining. Even though a data mining approach has been successfully adopted to accomplish a number of organizations’ marketing goals and objectives in business, it is still in the infancy stage in the domain of sport. However, lack of use of data mining in the sport business does not mean that it is not applicable or important in the sport business. Instead, it is a great opportunity for sport businesses to adapt data mining and benefit from implementing it. With correct and appropriate use of data mining, sports organizations can benefit from the strategies and tactics developed from analyzing customer databases.

The development of models or algorithms in the area of data mining is upsurging to fulfill a variety of problems in practice. Various models have been commonly and successfully employed to solve real world problems. Tasks that are performed vary from model to model. Consequently, no rule of thumb exists that explains which model is the best model in solving a practical problem. In other words, the selection of the model depends heavily on the type of problems, the data structure an organization possesses, and the objective of an organization. Therefore, it is critical to have a thorough examination of organizational goals and data structure before choosing data mining techniques.

Reference

Aaker, D. A., Kumar., V., & Day, G. S. (2000). Marketing research (7th ed.). NY: John Wiley & Sons, Inc.
Baltazar, H. (2000). NBA coaches’ latest weapon: Data mining. PC Week, 17(10), 69.
Berry, M. J. A., & Linoff, G. S. (2004). Data mining techniques: For marketing, sales, and customer relationship management (2nd ed.). Indiana: Wiley Publishing, Inc.
Bonvissuto, K. (2005). Cavaliers forge fan friendships with strategic database use. Crain’s Cleveland Business, 26(8).
Borisov, A., Chikalov, I., Eruhimov, V., & Tuv, E. (2005). Performance and scalability analysis of tree-based models in Large-Scale Data-Mining Problems. International Technology Journal, 9(2), 143-151.
Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., & Zanasi, A. (1998). Discovering data mining: From concept to implementation. NJ: Prentice Hall.
Chakrapani, C. (2004). Statistics in market research. New York: Oxford University Press Inc.
Cho, V., & Nagi, E. (2003). Data mining for selection of insurance sales agents. Expert Systems, 20(3), 123-132.
Dick, R., & Sack, A.L. (2003). NBA marketing directors’ perceptions of effective marketing techniques: A longitude perspective. International Sports Journal, 7(1), 88-99.
Drye, T., Wetherill, G., & Pinnock, A. (2001). When are customers in the market? Applying survival analysis to marketing challenges. Journal of Targeting, Measurement, and Analysis for Marketing, 10(2), 179-188.
Fielitz, L., & Scott, D. (2003). Prediction of physical performance using data mining. Research Quarterly for Exercise and Sport, 74(1), 25.
Francett, B. (1997). The NBA gets a jump on data mining. Software Magazine, 17(9).
Hair, Jr., J. F., Anderson, R. E., Tatham, R. L., & Black, W. C. (1998). Multivariate data analysis (5th ed.). New Jersey: Upper Saddle River: Prentice Hall.
Happy Customer. (2004, April 1). Happy customer: Creative ways to keep your season ticket holders happy throughout the year. The Magala Report. Retrived May 17, 2005, from http://www.migalareport.com/apr04_story4_pf.cfm
Harrison, T., & Ansell, J. (2002). Customer retention in the insurance industry: Using survival analysis to predict cross-selling opportunities. Journal of Financial Service Marketing, 6(3), 229-239.
Hudgins-Bonafield, C. (1997). Data mining software scores high with the NBA. Network Computing, 8(11).
James, V.L. (2004). Build fan base from your database. Street & Smith’s SportsBusiness Journal. Retrieved June 24, 2005, from SportsBusiness Journal archive database.
Jutkins, R. (1998). Direct marketing: A new strategy in the sports industry. Direct Marketing, 60(10), 34-35.
Kotler, P. (2003). Marketing management. (11th ed.). New Jersey: Upper Saddle River Pearson, Education, Inc.
Kumar, A., & Olmeda, I. (1999). A study of composite or hybrid classifiers for knowledge discovery. Journal on Computing, 11(3), 267-277.
Lefton, T. (2003). NFL flexes database marketing muscle to sell tix, boost viewership. Street & Smith’s SportsBusiness Journal, Retrieved June 24, 2005, from SportsBusiness Journal archive database.
Lewis, J. (2004). Career spotlight: Tom Glick. Street & Smith’s SportsBusiness Journal. Retrieved September 24, 2005, from SportsBusiness Journal archive database.
Lombardo, J. (2005). A new ball game in South Florida: Heat puts twist on marketing. Street & Smith’s SportsBusiness Journal, Retrieved June 24, 2005, from SportsBusiness Journal archive database.
Martin, L. (2005). Customer relations: An overview of how customer data can drive incremental sales for your organization. The Magala Report. Retrived November 3, 2005, from http://www.migalareport.com/nov05_story5_pf.cfm
Ratner, B. (2003). Statistical modeling and analysis for database marketing: Effective techniques for mining big data. New York: Chapman & Hall/CRC.
Restivo, K. (1999). Professional sports teams turn to resellers for an edge. Computer Dealer News, 15(28).
Saban, K. A. (2001). The data mining process: At a critical crossroads in development. Journal of Database Marketing, 8(2), 157-167.
Smith, K., & Gupta, J. (2002). Neural networks in business: Techniques and applications. PA: Idea Group Publishing.
Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). MA : Allyn and Bacon.
Whiting, R. (2001). Customers come into focus with combination software. Retrieved June 11, 2005, from the LexisNexis database.