A Case Study Exploring Self-Team Evaluations and Feedback through Team-Designed Behavior Scales

Submitted by Robert Brill, Fernando Cifuentes and Logan Stano

Robert Brill is an Associate Professor of Psychology at Moravian College where he teaches courses and conducts research in Industrial / Organizational Psychology and Sports Psychology. He also consults with a number of organizations in the Lehigh Valley area of Pennsylvania. Fernando Cifuentes and Logan Stano are Psychology majors and student researchers at Moravian College.

ABSTRACT

PURPOSE: This case study set out to explore a feedback intervention that incorporated team-generated scales created from best practice research principles from industrial / organizational psychology.  METHOD: A college men’s soccer team developed behaviorally-based anchored rating scales on 14 performance dimensions, and then provided self and team member ratings on each dimension.  Each player received feedback on team average ratings about them relative to self-ratings.  Player perceptions were assessed prior to scale development, prior to ratings, and after feedback was received.  RESULTS:  Findings indicate that the experience was challenging but positive; perceptions of potential and current ability changed significantly in opposite directions between ratings and feedback suggesting that players experienced a simultaneous improvement in motivation and reality check on their perceived potential.  CONCLUSION: The data suggests that this feedback intervention may be a worthwhile endeavor to help motivate individuals and strengthen team cohesion.  APPLICATIONS IN SPORT:  In order to supplement a coach’s feedback and unify teammate performance expectations, the creation and administration of behavior-based self and peer ratings may be a needed and viable option.  If so, this case study offers a good model to attempt such an intervention.

INTRODUCTION

In athletics, the challenge of credible feedback is often relegated solely to the coach, who may or may not be perceived as a reliable source of feedback.  Also, individuals may see their performance and the standards that define performance effectiveness quite differently from their coach or teammates.  This presents a twofold challenge and opportunity regarding feedback.  First, the team can strive to achieve a unified set of performance expectations; and secondly, self and peer ratings can be employed to provide more credible and accepted feedback on performance.  Building on empirically supported best practices within industrial / organizational (I/O) psychology, the present case study explored an intervention process to address both feedback challenges (unified performance standards and reliable self and peer rating comparisons) by guiding a men’s college soccer team to a) create a performance model that instilled a common frame of reference about performance standards for the team; and b) use the performance model to administer evaluation tools to provide feedback in which team average ratings were presented compared to self-ratings.

Feedback has been touted as a critical part of performance management systems within business and industry, and shown to yield positive outcomes in both worker attitudes and performance behavior (5,7,11,19).  Not surprisingly, the benefits of feedback on athletic performance have been an area of scientific interest for quite some time as well (2,21).  The current case study explores a feedback intervention that employs three best practice attributes that flow from the research literature within organizational psychology: behavior-based scales, multisource peer ratings as comparison for self-ratings, and a process to facilitate a team-generated performance evaluation tool.

Within business and industry, behavior-based scales have a long history of improving the reliability, accuracy and usefulness of performance appraisals since their inception (17) to contemporary times (14).  However, typical feedback strategies explored in athletic contexts typically employ a rating tool based on pre-structured generalized dimensions (e.g., effort, motivation, team player, etc.) with vague adjective scales, rather than behavioral anchors.  Early research indicated these more specific and detailed anchors led to a positive effect on the attitudes and reactions of participants to the evaluation process, produced higher face validity, and subsequent improved motivational response to the feedback (1,10).  The job-relatedness from behavior-based scales seems to strengthen recipients’ belief that the feedback is more genuine and complete than alternative forms; and also facilitate the establishment of specific goals for improvement by indicating which types of behaviors should be performed and which should be avoided (4,22).

Often self-evaluations are inflated as a fear response to perceived as less than adequate (1,5).  Multi-source peer feedback has been shown to improve rating accuracy and credibility by overcoming some of the ambiguity underlying individual subjective judgments about teammates (4,9,14).  Systems incorporating multiple perspectives, including peer assessments, have become increasingly important to modern organizations (8).  Incorporating self and team ratings can reap benefits within an athletic team by challenging misperceptions within one’s self-awareness, identifying expectation gaps, and arm a coach with key information to manage conflicts and provide specific action plans for improvement in a more open, honest and engaging manner; which parallel outcomes found in workplace settings (12).   Particularly, teammate ratings can take the coach’s evaluative input out of the equation, at least initially, in a constructive feedback session. Those who participate in multi-source feedback have been shown to be more likely to use feedback, and participate in activities to develop perceived deficiencies, thus improving their performance (18).

In addition to improving the evaluation scales themselves, the team input into the development of behavior-based scales can be both a cohesion-building and motivating experience.  The process begins with subject-matter experts (in this case the player athletes), providing input about the relevant performance dimensions, as well as examples of actual performance behaviors called “critical incidents” and classify whether the examples represent high, medium, or low performance within each relevant dimension of the job or sport (1,14).  When team input for development of an evaluation tool is used, team members feel more ownership of and satisfaction with the evaluation criteria and potentially work harder to meet the standards they developed (14,15).  In addition, research suggests that gaining collective input consistently yields more accurate ratings when compared against established “true scores” of performance (20) by establishing a common frame of reference regarding performance standards that subsequently may yield shared mental models of performance for the players similar to those that have been found to be associated with higher performing work pairs (13), and speculated to be an important part of the cognitive characteristics of high performing sports teams (16).

In the current case study, we expect the players to have a powerful experience in crafting the feedback instrument, providing self and teammate ratings, as well as receiving feedback.  We anticipate the process will be met with a mix of positive expectations coupled with concerns reflecting evaluation apprehension.

METHOD

Participants

During their off-season, returning members of a college-level men’s soccer program (n = 24), participated in a three phase feedback intervention in which they: 1) developed an evaluation tool, 2) rated themselves and each teammate using that scale, and 3) received the aggregated feedback from the coach.  This experience was initiated by the head coach of the team who collaborated with the researchers throughout the process, and made each phase mandatory for returning players.

Procedure

In phase one, players were empowered to create their own performance evaluation tool with the understanding that they will eventually be rating themselves and their teammates on the instrument they collectively create.  Players were randomly assigned to three focus group sessions in which researchers facilitated their input contribution to identifying and defining relevant performance dimensions, and generating both critical behaviors within those dimensions.  During each focus group, players were presented with a 12 dimension performance model assembled by the researchers (Motivation, Self-discipline, Selflessness, Task Persistence, Consistency, Intrinsic Standards, Dependability, Mental Toughness, Trustworthiness, Coachability, and Leadership) and a draft definition of each.  Based on player input Trustworthiness was deemed to be redundant with other dimensions and thus dropped; while three additional performance dimensions were added (Positive Practice Transfer, Fear of Failure, Team Chemistry). Focus group suggestions also led to some wording revision for five of the existing definitions.  Players then generated specific, concrete behaviors they felt defined poor, average, and excellent displays within each dimension.  Disagreements were talked through to consensus; items that could not be agreed upon were not used.  After the focus groups, researchers aligned the items on a ten point scale based on their interpretation of the items as discussed in the focus groups.  An example of one of these scales is provided in Figure 1.

In phase two, players were given an overview of each scale, one at a time, presented by the researchers on overhead and distributed as a handout.  Once they were fully oriented to and familiar with the scale, players rated themselves and each teammate on that scale, each time circling their name on the listing to indicate the self-rating.  The players were assured that all of their teammate ratings would be kept anonymous by being reported and shared as an overall average of the other 23 teammate ratings for each dimension.   The rating sheet for each dimension also provided opportunity for brief qualitative comments to supplement the numerical rating.  Ratings were collected prior the introduction of the next dimension. Due to the complexity of the scales, ratings were collected over three rating sessions (i.e., ratings generated for 4 or 5 dimensions per session).  Each session was done in a large classroom in which players were asked to spread out in order to ensure privacy and anonymity of ratings.

In phase three, researchers generated a feedback report for each player; an excel spreadsheet in which each dimension was listed along with the self-rating and average ratings provided by teammates, along with any comments accompanying each dimension.   Just prior to leaving for their summer break, players participated in an individual feedback session with the head coach, receiving and discussing the summary report, which the coach reported as highly effective and powerful.  He found it very helpful to be armed with data-driven feedback, which as he pointed out, “takes me out of the equation and allows for a more honest discussion of their strengths and weaknesses based on their teammates’ perspective”.

Concurrently during the intervention phases, players also completed self-report measures of desirable athletic outcomes (e.g., perceived ability, potential, motivation, dedication, etc.) for themselves and the team before phase one, after phase two, and after phase three when they returned to school in the Fall semester.  Response scales used a seven point scale (0 = Extremely Low / Weak to 6 = Extremely High / Strong).  Researchers also collected perceptions of the evaluation system (e.g., benefits of feedback, commitment to feedback process, etc.) after phase one and after phase two.  These items were designed as personal statements, such as “I feel committed to this process of mutual feedback”, “This type of feedback will help me improve”, and “I am concerned that this process of feedback will be a waste of time”.  Response scales employed five levels of disagreement/agreement (i.e., 1 = Strongly Disagree to 5 = Strongly Agree).

RESULTS

As demonstrated in prior research (3, 6, 14) using subject matter experts in a focus group context to generate behavior items within each dimension has been shown to yield high content validity of those instruments for police performance (3), nursing performance (6), and teamwork in classroom settings (14).  Although the present study was not able to demonstrate predictive validity, future studies should investigate associations between these types of rating scales and objective performance measures.  Inter-rater reliability was assessed by calculating the inter-correlations between each of the 276 paired raters across all player ratings within each dimension.  This yielded a fairly strong average inter-rater reliability within 13 of the dimensions (ranging from average r = .47 [SD = 0.18] to average r = .59 [SD = 0.13]) , with one anomalous scale in team chemistry, average r = .37 (SD = .22).  This scale seems to have had less agreement among the players, possibly due to the complex and potential ambiguous nature of the scale anchors.

Comparisons of player perceptions regarding their current ability levels and potential changed significantly across the three phases, F (2, 46) = 3.91, p < .05;  η2 = .21 rising significantly after the scale development and rating phase (M=4.19) compared to baseline (M=3.81).  This ascending trend continued after feedback (M=4.44), but was not significant.  Self-assessment of the player’s potential also changed significantly, F (2, 46) = 4.03, p < .05; η2 = .18, across phases, rising significantly from baseline (M=5.30) to after phase two(M=5.65), but dropping significantly after feedback was received (M=5.19).  These trends are reflected in Figure 2 and seem to reflect the mixed emotions that come from such an engaging and disclosing process, particularly the enhanced efficacy on one’s perception of their ability, and the reality check that may be accompanying some of the disparities between self and team evaluations.  No significant differences were found among the other four self-reported desirable athletic outcome measures (i.e., team chemistry, motivation, dedication, and confidence in having the needed direction to improve).

Moving from self-perceptions to perceptions of the process, as shown in Table 1, players reported high levels of satisfaction and optimism about the potential impact of the scale they created.  Negative items were endorsed by some but with relatively low frequency.  Self-reports remained similarly positive after employing the scales to provide feedback from generated self and teammate ratings.  Although there were no significant changes in perceptions of the process from the second phase of creating the instrument to actually receiving the feedback, this seems likely due to the strong level of positive feelings already expressed at the initial phase of scale development (i.e., ceiling effect).

DISCUSSION

The results suggest that the time and effort invested by teams in creating and using their own behavior-based performance model evaluation tool may be warranted and create positive psychological benefits for athletes; particularly, creating a greater confidence in their perceived abilities, while grounding them with a more realistic sense of their potential.  Players reported high levels of confidence in and empowerment from the process.  Interview comments with the coach suggest the process produced a highly effective tool for him to motivate his players toward achieving greater commitment to, and realization of performance excellence.  Thus, the present case study yielded strong positive outcomes that paralleled what Pain and colleagues discovered when using weekly feedback about the general performance environment of a soccer team (15), but instead the present study focused on a one-time intense evaluation of individual player feedback.  Results from this study also converge with findings within the industrial / organizational psychology literature on the positive effects of multi-source ratings in a team context and credible feedback from evaluative instruments for which the performers have a sense of ownership (4,12,14 ).

As with all case studies, the primary limitations are the lack of generalizability and inability to isolate any clear causal impact on desirable outcomes.   Also, although we successfully implemented a process facilitating team-generated behavior-based scales as drawn from the industrial / organizational psychology literature, we chose to eliminate one phase sometimes applied.  The early research guidelines prescribe a very laborious and time-intensive step in which all the critical incident behaviors are evaluated by each worker (i.e., player) as to the level it reflects within the dimension (1).  The average rating by the team would then be a more representative indication of the team’s collective perception of the behavior’s value.  Due to other team priorities and based on our sense of cost-effectiveness, the more efficient process described above was deemed to be prudently used.  Also not formally introduced as part of the current study’s intervention, but clearly ripe with potential, would be a subsequent deliberately structured goal-setting process in which players use the ratings and behavior-based content in areas needing improvement to create a specific goal for improvement aligned with the language of the scales.  Complementing behavior-based assessment with goal setting has been a prevalent and successful part of performance management systems in the workplace (5, 11).

The next step is to look at performance impact as an outcome.  Researchers attempted to develop a quasi-control group by identifying matching players within the same conference based on class year, position, and playing minutes; but the available options did not yield sufficient matches.  A pre-test baseline was also not viable for performance comparison since playing time increased dramatically for the majority of players.  Future efforts can also attempt to employ an experimental and/or longitudinal design to try to isolate and better understand the effects of the feedback intervention on attitudinal, motivational, as well as team dynamic indicators, such as the shared mental models discussed in both the industrial / organizational (13) and sports psychology (16) research spheres.

CONCLUSIONS

The critical role of feedback for the conscientious, motivated athlete can be enhanced by engaging teams in the development of behavior-based scales so that they experience shared mental models of performance expectations.  These scales can then be used to administer self and teammate ratings as compelling sources of feedback to motivate and guide players toward individual and team goal-setting and performance improvement.

APPLICATIONS IN SPORT

Coaches are often content, and perhaps even complacent, in a style in which they see themselves as the sole source of credible feedback with the potential impact of change.  A tool to supplement and improve the feedback function for athletes and the team collectively could employ a team-developed behavior-based performance evaluation instrument that would allow athletes to develop a shared sense of performance expectations and reflect upon self-assessments relative to how they are viewed by their teammates.

ACKNOWLEDGMENTS

The researchers would like to thank the Moravian College Men’s Soccer team, especially their head coach, Mr. Todd Ervin, for all their cooperation and commitment to the project.  In addition, we wish to thank Moravian College for their support of the project through a SOAR (Student Opportunities in Academic Research) Grant.

REFERENCES

1. Bernardin, H., & Beatty, R. (1984). Performance appraisal: Assessing human behavior at

   work. Boston: Kent.

2. Carpentier, J., & Mageau, G. (2013). When change-oriented feedback enhances motivation,

well-being and performance: A look at autonomy-supportive feedback in sport. Psychology of

 Sport and Exercise, 14(3), 423-435.

3. Catano, V. M., Carr, W., Campbell, C. A. (2007). Performance appraisal of behavior based

competencies: A reliable and valid procedure. Personnel Psychology, 60, 201-230.

doi:10.1111/j.1744-6570.2007.00070.x

4. Cushing, A., Abbott, S., Lothian, D., Hall, A., & Westwood, O. R. (2011). Peer feedback as an

aid to learning—What do we want? Feedback. When do we want it? Now!  Medical Teacher,

33(2), e105-e112. doi:10.3109/0142159X.2011.542522

5. DeNisi, A. & Pritchard, R. (2006). Performance appraisal, performance management, and

improving individual performance: A motivational framework. Management and

 Organizational Review, 2, 253-277.

6. Greenslade, J. H., & Jimmieson, N. L. (2007). Distinguishing between task and contextual

performance for nurses: Development of a job performance scale. Journal of Advanced

Nursing, 58, 602-611. doi: 10.1111/j.1365-2648.2007.04256.x

7. Heslin, P. Vandewalle, D., & Latham, G. (2006). Keen to help? Managers’ implicit person

theories and their subsequent employee coaching. Personnel Psychology, 59, 871-902.

8. Hoffman, B., Lance, C., Bynum, B., & Gentry, W. (2010). Rater source effects are alive and

well after all.  Personnel Psychology, 63, 119-151.

9. Holt, J. E., Kinchin, G., & Clarke, G. (2012). Effects of peer-assessed feedback, goal setting

and a group contingency on performance and learning by 10–12-year-old academy soccer

players. Physical Education and Sport Pedagogy, 17(3), 231-250.

doi:10.1080/17408989.2012.690568

10. Hom, P., DeNisi, A., Kinicki, A., & Bannister, B. (1982). Effectiveness of performance

feedback from behaviorally anchored rating scales. Journal of Applied Psychology, 67 (5),

568-576. doi: 10.1037/0021-9010.67.5.568

11. Krenn, B., Würth, S., & Hergovich, A. (2013). The impact of feedback on goal setting and

task performance: Testing the feedback intervention theory. Swiss Journal of Psychology,

72(2), 79-89. doi:10.1024/1421-0185/a000101

12. Levy, P. E., & Steelman, L. A. (1997). Performance appraisal for team-based organizations: A

prototypical multiple rater system. In M. M. Beyerlein, D. A. Johnson, S. T. Beyerlein

(Eds.) , Advances in interdisciplinary studies of work teams, Vol. 4 (pp. 141-165). US:

Elsevier Science/JAI Press.

13. Mathieu, J., Heffner, T., Goodwin, G., Salas, E., & Cannon-Bowers, J. (2000). The influence

of shared mental models on team process and performance. Journal of Applied Psychology,

  85(2), 273-283.

14. Ohland, M. W., Loughry, M. L., Woehr, D. J., Bullard, L. G., Felder, R. M., Finelli, C. J., &

Schmucker, D. G. (2012). The comprehensive assessment of team member effectiveness:

Development of a behaviorally anchored rating scale for self- and peer evaluation. Academy

of Management Learning & Education, 11(4), 609-630. doi: 10.5465/amle.2010.0177

15. Pain, M., Harwood, C., & Mullen, R. (2012). Improving the performance environment of a

soccer team during a competitive season: An exploratory action research study. The Sport

  Psychologist, 26(3), 390-411.

16. Reimer, T., Park, E., Hinsz, V. (2006). Shared and coordinated cognition in competitive and

dynamic task environments: An information-processing perspective for team sports.

International Journal of Sport and Exercise Psychology, 4(4), 376-400.

17. Smith, C., & Kenall, L. (1963). Retranslation of expectations: An approach to the

construction of unambiguous anchors for rating scales. Journal of Applied Psychology, 47,

149-155.

18. Smither, J. W., London, M., & Richmond, K. (2005). The relationship between leaders’

personality and their reactions to and use of multisource feedback: A longitudinal study.

Group & Organization Management, 30(2), 181-210. doi: 10.1177/1059601103254912

19. Steelman, L., Levy, P., & Snell, A. (2004). The feedback environment scale (FES): Construct

definition, measurement, and validation.  Education and Psychological Measurement, 64(1), 165-184.

20. Sulsky, L., & Day, D. (1994). Effects of frame-of-reference and cognitive categorization: An empirical

investigation of rater memory issues. Journal of Applied Psychology, 79, 535-543.

21. Tauer, J., & Harackiewicz, J. (1999). Winning isn’t everything: Competition, achievement orientation,

and intrinsic motivation. Journal of Experimental Social Psychology, 35, 209-238.

22. Wright, M., Phillips-Bute, B., Petrusa, E., Griffin, K., Hobbs, G., & Taekman, J. (2009). Assessing

teamwork in medical education and practice: Relating behavioural teamwork ratings and clinical

performance. Medical Teacher, 31(1), 30-38.

 

CaptureCapture2Capture3


-Download A Case Study Exploring Self-Team Evaluations and Feedback through Team-Designed Behavior Scales as PDF-