Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Student Evaluations



I agree with most of the comments about student evaluations of
teaching (SET on our campus) and their general lack of
meaningfulness. But I hope Ludwik is wrong in his final comments
about where the change in attitude has to come from. I have embarked
on a one person campaign on our campus to enlighten my colleagues as
to alternative ways of evaluation. This was the result of my being
denied an award (of $1000!) two years ago ( because I unintentionally
failed to include my SET results) which is annually given to about
30% of our faculty for excellence in teaching). Many faculty here
really are interested in finding a better way to evaluate teaching
but don't know how to proceed.

Here is the real dilemma (as I see it): I can show by pre and post
testing with instruments like the Force Concept Inventory that my
students do indeed learn something in my class. This is NOT true in
general in many other subjects. For example in talking to colleagues
in History, they can only come up with trivial kinds of pre and post
testing on facts, no one seems to have a clear idea how to test on
conceptual knowledge in History. As I gather from the literature MOST
pre and post testing (in physics and other subjects) HAS BEEN
UNSUCCESSFUL IN SHOWING ANY IMPROVEMENT IN STUDENT KNOWLEDGE.
Historically faculty have been very reluctant to embrace other
methods of evaluating student learning because the available tools do
NOT show that students have learned anything (in most cases).

Here is what I have been sending around to colleagues and
administrators as an argument towards better evaluation techniques
(please feel free to criticize). I quote from my annual report
(required of all faculty each year):

A. Methods of evaluation
After spending a significant amount of time studying the research
literature on evaluating teaching effectiveness (see the
bibliographic sample below), particularly the literature related to
teaching physics, I have concluded that the following is a fair, if
not complete, categorization of various methods of evaluation. (See
Ref. 19, below, for further comments on the first four.)
1. Course evaluation surveys from current students.
2. Teaching evaluation surveys from course alumni (either
senior surveys or post graduation surveys).
3. Peer evaluation by faculty.
4. Portfolios.
5. Pre- and post- testing.

Of these choices 2 and 4 seem more appropriate for specialized, upper
level courses (which I do not typically teach). An interesting
reference applicable to peer evaluation of physics teaching is 16,
below; performing an adequate and meaningful peer evaluation is not a
trivial matter).

Against student evaluations: There is a vast literature (the
bibliography below represents only a tiny sampling) both in support
and in opposition to method 1 (current student evaluations or SET)
from which I think it is fair to conclude at least the following:
a) Student evaluations are useful for improving feedback and
communication between students and faculty (but this unfortunately
comes at the end of the course);
b) There are confounding variables which are difficult to
separate in the use of student evaluations. For example higher grades
do correlate positively with better student evaluations. But it could
be argued either that giving higher grades will result in higher
student evaluations or it could equally well be argued that higher
evaluations indicate better teaching which results in better grades -
the correlation is inconclusive;
c) The design and interpretation of most student evaluations
is probably flawed since it is seldom determined in advance whether
the evaluations are to be used as a tool for improving teaching or as
an evaluation mechanism to be used in making personnel decisions
about instructors. Frequently the same evaluation is expected to
function as both;
d) There is a weak, positive correlation between high student
evaluations and high student performance as determined using other
evaluation tools;
e) Most of the claims (both positive and negative) about the
validity of student evaluations in establishing teaching
effectiveness have been tested using other means of evaluation, for
example pre- and post- testing (method 5) of subject material learned
by the student.

Given the above comments and conclusions it seems clear to me that
pre- and post- testing, when available, is the more effective
evaluation process when compared to SET scores, particularly in
introductory physics where the goal is to convey particular concepts
which the student either does or does not understand. A well
established diagnostic test in mechanics is available (the Force
Concept Inventory; see Refs. 5, 6, 7, 8, 9, 10) and I have used that
as my primary evaluation tool this year. Further discussion can be
found below.

There are several groups working on baseline and diagnostic tests for
other areas in physics, for example in electricity and magnetism.
These tests are not yet available so pre- and post- testing on other
topics and in other courses was not done by me this year. One could
project that teaching methods which improved scores in the topical
area of mechanics would also be effective in improving student
understanding in other areas but this remains to be investigated.

As a supplement to pre- and post- testing and to facilitate
instructor/student communication I have used two different types of
student evaluations. One method is the standard SET given at the end
of the semester. The other is an anonymous, on-line feedback page
where students can anonymously express frustration or offer
suggestions during the semester (when it is more useful).

Bibliography:
1. Wilbert J. McKeachie "Student Ratings: The Validity of Use",
AMERICAN PSYCHOLOGIST, November (1997), p1218-1225.
2. Anthony G. Greenwald and Gerald M. Gillmore,"Grading Leniency Is
A Removable Contaminant of Student Ratings,"Journal of Educational
Psychology, Vol. 89, No.4, (1997) pp.1209-1216.
3. Anthony G. Greenwald and Gerald M. Gillmore,"No Pain, No Gain?
The Importance of Measuring Course Workload in Student Ratings of
Instructions" Journal of Educational Psychology, Vol. 89, No.4,
(1997), pp.743-751.
4. W. M. Williams and S. J. Ceci, "How'm I Doing?", Change,
Sept./Oct. (1997), pp.13-23.
5. I. Halloun and D. Hestenes, "The initial knowledge state of
college physics students", Am. J. Phys. 53, (1985) p1043.
6. D. Hestenes, M. Wells, and G. Swackhamer, "Force Concept
Inventory", Phys. Teach. 30, (1992) p 141. [a revised 1995 version
due to I. Halloun, R.R. Hake, E.P. Mosca, and D. Hestenes is in E.
Mazur, "Peer Instruction: A User's Manual" (Prentice Hall, 1997) and
on the Web (password protected) at
<http://modeling.la.asu.edu/modeling.html>].
7. R.R. Hake, "Interactive-engagement vs traditional methods: A
six-thousand-student survey of mechanics test data for introductory
physics courses", Am. J. Phys., Jan. (1998 ) and on the Web at
<http://carini.physics.indiana.edu/SDI/>; see also
<http://www.aahe.org/hake.htm>.
8. D. Hestenes and M. Wells, "A Mechanics Baseline Test", Phys.
Teach.30, (1992) p 159.
9. R.J. Beichner, "Testing student interpretation of kinematics
graphs", Am. J. Phys. 62, (1994) p 750.
10. D.R. Sokoloff and R.K. Thornton, "Using Interactive Lecture
Demonstrations to Create an Active Learning Environment", Phys.Teach.
35, (1997) p 340.
11. P.A. Cohen, "Student ratings of Instruction and Student
Achievement: A Meta-analysis of Multisection Validity Studies",
Review of Educational Research 51, (1981) p 281.
12. H.W. Marsh, "Students' Evaluations of University Teaching:
Dimensionality, Reliability, Validity, Potential Biases, and
Utility", J. of Educational Psychology 76, (1984) p 707.
13. K.A. Feldman, "The Association Between Student Ratings of
Specific Instructional Dimensions and Student Achievement: Refining
and Extending the Synthesis of Data from Multisection Validity
Studies", Research on Higher Education 30, (1989) p 583.
14. W.J. McKeachie, "Instructional Evaluation: Current Issues and
possible improvements", J. of Higher Education 58(3), (1987) p 344.
15. R.R. Hake and J.C. Swihart, "Diagnostic Student Computerized
Evaluation of Multicomponent Courses (DISCOE), Teach. Learning
(Indiana University, January 1979, available on request).
16. S. Tobias and R.R. Hake "Professors as physics students: What
can they teach us?" Am. J. Phys. 56, (1988) p 786.
17. Society for A Return to Academic Standards:
http://www.bus.lsu.edu/accounting/faculty/lcrumbley/sfrtas.html
18. R.M. Felder, "What do they know, anyway?", Chemical Engineering
Education vol. 26, 3, p 134.
19. D. Styper,
http://www.physics.oberlin.edu/~dstyer/EvaluationOfTeaching.html
20. A. B. Arons, "Teaching Introductory Physics" (John Wiley and Sons, 1997).



Date: Sun, 26 Dec 1999 22:57:23 -0500
From: Ludwik Kowalski <KowalskiL@MAIL.MONTCLAIR.EDU>
Subject: Re: Student Evaluations of Teaching
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii; x-mac-type=54455854;
x-mac-creator=4D4F5353

If it were up to me I would evaluate performance of teachers on
the basis of how well their students know what is expected. To
make this possible two prerequisites must be met.

1) We must agree (or be forced to agree) on what the expected
knowledge in each department and at each level is. It should be
defined by taking under consideration how much average
students can possibly learn in each semester. Textbooks should
clearly define the common obligatory core, even when optional
chapters are included (for the above average and curious students).

2) Evaluation of students (final exams and final grades) must be
separated from teaching. By this I mean that those who teach
should not be involved in examinations. In this way it would no
longer be possible to have situations in which we say that "this
or that will not be on the exam because we skipped it".

Sooner or later society will realize that teachers, like other
professionals, should be evaluated on the basis of results, not on
the basis of how students feel about teachers. I was often less than
perfect, in terms of passing those who should have failed, but
nobody ever criticized me for this kind of malpractice. I agree
with most of what Mark wrote in the

IrascibleProfessor.com/comments-12-26-99.htm

But an individual teacher, or even a group of teachers (such as
Phys-Lers), can do very little to change the existing situation.
The reform should come from above and be highly coordinated
on the national level. American Physical Society, and similar
organizations in other areas, should get together and design a
realistic plan for imposing desirable changes. Several brader
issues, such as what to do with those who fail, should also
be addressed.

Ludwik Kowalski

--------------------------------

-----------------------------------------------------
kyle forinash 812-941-2390
kforinas@ius.edu
Natural Science Division
Indiana University Southeast
New Albany, IN 47150
http://Physics.ius.edu/
-----------------------------------------------------