NURSE EDUCATION TODAY
Peer evaluation: more questions than answers Nicole Rousseau Post-Graduate Student, Department of Nursing Studies, University of Edinburgh
Edith Cote Lecturer, Universite Laval, Quebec
Andree Quenville Clinical Instructor, Univcrsite Laval, Quebec
documented the chaotic state of clinical evaluation.l'? Among the numerous methods su ggested or tes ted to compensate for the subjectivity of instructor's judgment, self-evaluation is often proposed. Peer evaluation, on the other hand, has received little att en tion from nurse educators. The purposes of this article arc: (1) to explain- why p eer evaluation was re tained as one of the methods used to grade students in a community health course; (2) to describe how this method was implemented and what results were obtained; and (3) to discuss, on the basis of the results and the literature review. some of the issues related to peer evaluation as well as to raise questions that remain unanswered. Peer evaluation. referred to here, has been examined over three years with several groups of third-year (final year) basic . baccalaureate nursing students in a community health nursing course. This is a required course whose major component is a five-week full-time training period in a clinical setting. The goal of this clinical exp erience is to allow students to go through one of the steps of programme planning in community he alth . Th e community health programme to be worked on is chosen jointly by th e staff of the clinical setting (a department of community health), the teach ers responsible for the course and the clinical instructor. Therefore, students become involved in an on-going community health programme. \VOOLLEY lIAS WELL
REASONS FOR RETAlNING PEER EVALUATION Our reasons for retaining peer evaluation fall into two categories: practical and educat ional. Each in structor and teacher must supervise eight or nine students, ie two teams, working on two different projects, frequently in different locations. Thus, direct supervision of each student is impossible most of the time and is not considered necessary. Peer evaluation appeared to be one solution to this practical problem as it would increase, we thought,
the fairness of an evaluati on based upon short and occasional observations. In addition to these practical reasons, peer evaluation was seen as a means to give students an opportunity to become critical about their peers' performance. It seemed to us that it was important to develop this ability among final-year students since, as professional nurses, they would eith er be responsible for the evaluation of health care p er son n el or they wou ld p articipate in the evalu ati on of such personnel. They have to learn that decisions must be based upon as objective an evaluation as possible and that these decisions may have farreaching consequences. Therefore one of our main goals was to develop professional maturity among students.
DESCRIPTION OF THE METHOD AND OF THE DATA ANALYSIS Students worked in teams of four or five and, as a team, had to produce a written report at the end of the course. The complete evaluation included an instructor's evaluation of each student's ability to apply the concepts underlying programme planning to th e project (30 %). a joint evaluation of the final report by the instructor and one of the teachers responsible for the cours e (40 %), an ins tructor's evalu ation of each student's contribution to the project (15 %) and, fin ally, a student's evaluation of the contribution of each member of th e team to th e project: peer evaluation (I5 %). T hese four methods of evaluation were based upon criteria developed by the teachers. Peer evaluation worked as follows : at th e end of the traini ng period, the students were asked to fill in an evaluation form for each member of their learn. Tbis form consisted of a modified Likert-type scale using a list of 10 criteria which described varied aspects of the quantitative as well as the qualitative contribution of the student to the project. The scale included four categories each having an identified weight. 29
NURSE EDUCATION TODAY Example The student executed the tasks given by the group A Always 150
C Sometimes 50
The contribution of the student to the solution of problems arising during the course of the project was A
Very Good 150
C Good 80
It must be noted that students formed the teams themselves, thus indirectly selecting their raters. Two minor· modifications were made to this instrument after one year of use: its grading value was increased from 10 % to 15 %, and the students were asked to write examples of the behaviour they were referring to when choosing a category for each criterion. The reason for the first modification was that the students liked this method but found that its value was too low. The second modification was aimed at solving the problem of leniency on the part of a few students, It was hypothesised that if they had to specify on the basis of to what behaviour they were attributing a grade, the students would be more reluctant to give everyone an A. The completed forms were handed to the teachers at the end of the course. The rater was not required to identify herself (himself) on the form. The students were free to examine the evaluation forms handed out by their peers, but none of them used this opportunity. The results presented here deal with one fundamental question: Are students competent to evaluate their colleagues? To answer this question, three types of analysis have been done.
1. A simple linear correlation between the grade given by the students and the grade given by the instructor was calculated. It was assumed that the instructor was a competent evaluator; so, the higher the correlation the more competent an evaluator was the student. We believe the instructor's evaluation was the best standard available in our context because, having had the opportunity to observe several groups of students, the instructor was in a good position to compare and rate individual performances. We agree that this assumption presents some weaknesses, even more so since, as it was mentioned earlier, the instructor's observations were occasional. 2. The frequency of examples of the ratee's contribution given by the rating students was calculated. It was assumed that if they had a good understanding of the criteria and took their job seriously, the students would provide frequent examples. 3. An analysis of the examples provided was undertaken to examine if they were related to the criterion and if they were useful in terms of their formative value for the ratee. Evaluators 30
must not only be competent in determining the right grade, but also be able to help students to progress in their learning. Each example was thus examined and classified as 'related' or 'not related', 'useful' or 'not useful'. This classification was done by the three authors simply using group consensus. An example was qualified as 'related' if it described a behaviour that exemplified the criterion under consideration; it was classified as 'useful' if the authors felt that it could help the ratec to improve in the future her (his) performance with regard to the behaviour under consideration. In addition to these analyses, two other aspects were of interest to us. In the third year of the programme, the students were ,divided into two groups, each one following a sequence of clinical nursing. Consequently, the community health nursing course was given twice a year. The results obtained with students beginning their final year were compared those of students near graduation in order to test the following hypothesis: having benefited from several months of training in varied clinical settings; students near graduation should be more competent evaluators than their colleagues beginning their final year. To test this hypothesis, the following comparisons were made. The coefficients of correlation between peer evaluation and instructor's evaluation were compared for students beginning their final year with those finishing it. The subjects of comparison were: (a) the frequency of examples of the ratee's contribution; (b) the proportion of related and useful examples. Finally, as it was mentioned in the description of the method, we believed that peer evaluation would be done more seriously, hence that the correlation would be higher, if the raters were asked to give examples of the contribution of the ratees with regard to each criterion. To test this final hypothesis, the coefficients of correlation obtained for the first year of the experiment (examples were not asked) were compared with those of the last two years of the experiment (examples were asked). RESULTS OBTAINED The data were provided by 154 students: 56; 48, and 50 students respectively in each year of the experimentation.
1. Correlation between peer evaluation and instructor's evaluation The coefficient of the correlation for the total of the three years is 0.5110 but if we separate the first year from the last two years, we obtain respectively a coefficient of 0.1752 and 0.6783. The coefficient of correlation was also calculated for both the first and the last groups supervised during the first and the third year of the experiment. It was not calculated for the second year because the two training periods had
NURSE EDUCATION TODAY been too close to each other that particular year. Therefore, we would not be able to observe a difference in the evaluating students' competency that could possibly be attributable to an increase in their professional maturity. The results obtained are shown in Table 1. Table 1: Comparison of the coefficients of correlation between peer evaluation and instructors' evaluation obtained front students beginning and finishing their final year.
2. Frequency of examples of the ratee's contribution given by the rating students The total frequency of examples could not be calculated here because the data were missing for some of the students in the second year of the experiment. (It must be remembered also that students were not asked to provide examples during the first year.) Therefore only the data from the third year of experimentation were analysed. A total proportion of 35.6 % of the forms completed during the third year contained examples. Only 25 % of the forms completed by the near graduation students contained examples of the ratee's contribution, while 48.6 % of those completed by students beginning their final year provided such examples. 3. Quality of the examples provided by the students In the third year of the experimentation, students have provided a total of 222 examples of which 77.9 % were judged related to the criteria and 61.7 % were considered useful. When comparing the quality of these examples between students beginning the students completing their final year, we obtained the results presented in Table 2. Table 2: Comparison of the quality of the exa mpdes provided by students beginning and finishing their final year.
Quality students Beginning students of the examples
91· (61.9 %)
Number of examples provided
"These are not mutually exclusive.
DISCUSSION As it was stated in the introduction, the discussion is based upon the results obtained and a literature review. This review is limited to a search with ERIC using 'peer evaluation' as the descriptor and to an
inventory of International Nursing Index going back to 1977 inclusive. To measure students' competence in evaluating their peers, Morton and Macbeth have calculated the correlation coefficients between staff, peer, and self assessments of fourth-year students in surgery, the staff assessment being the mean of four individual assessments made by the surgeon supervisors (the equivalent of our clinical instructors). They obtained a coefficient 'r' of 0.53 (P
NURSE EDUCATION TODAY It is more encouraging to examine the quality of the examples provided. Indeed, if 77.9 % of them were judged related to the criteria and 61.7 % considered useful, we can conclude that those students who take time and effort to evaluate their peers seriously can do it competently. \\'Tho arc those students? Lutkus has measured his students' ability to rate papers. FoIlowing are his results and conclusion:
'The correlations obtained between paper grades and the writer's ability as a rater ranged between r =0.632 and r=O.703 (P
students are able or willing to provide examples. Thus the question of how to develop this conscientiousness and competency remains unanswered. \Y/e must also keep in mind that all these results use the same basis for comparison, ie the teacher's judgment. Is this a valid point of reference? What are the alternatives?
CONCLUSION One of our main reasons for retammg peer evaluation was to provide our final-year students with an opportunity to develop their professional maturity through an evaluation of their peers' performance. \Vle have discovered that peer evaluation was worth using and that its validity could be improved by refining the methods of application. Asking evaluators to justify their rating with examples is an improvement but is insufficient to reach our intended goal. Through our analysis of the examples, we realised that we had a wealth of pertinent and useful comments that were completely wasted since the very person to whom they should have been addressed (the student being evaluated) had never seen them. Two positive aspects of peer evaluation are that students like it and that it compensates to a certain extent for the inability of the instructor to evaluate her students on a continuous basis. Furthermore, several students have told us that they had learned much in doing their peers' evaluation and that the fact of having to give examples had taught them that evaluating was not an easy task. Thus peer evaluation seems to be pedagogical, at least for the raters, and we believe that it should be given more attention by nurse educators.
Acknowledgements Acknowledgements are expressed to Dr Lisbeth Hockey for her useful comments in reviewing this paper.
References 1. Woolley, A S 1977 . The long and tortured history of clinical evaluation, Nursing Outlook, 25, 5, pp . 308-315. 2. M orton, J E, Macbeth, \", A A G 1977. Correl ations between staff, peer and self-assessments of fourth-rear students in surgery, Med ical Education, 11, pp . 167-170. 3. Parker, R C, Kristcl, D S 1976. Student peer evaluation, Journal of Chemical Education, 53, 3, p. 177. 4. Id em p 178. 5. Lutkus, A 1978 . Using PEERRATE: A computerised system of student term paper grading: p aper read at the American Psy chological Association Convention, Toronto, Canada. 6. Id em p 8 .