Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Student Evaluations (was "Re: On Correlations")



In his 7/14/00 POD(1) post "Re: On Correlations," Ed Nufer writes:

"Let me add the irreverent thought (a personal bias I admit) that
giving student ratings credibility for what they DONT measure . . . .
(student learning). . . . has contributed to our not seeing the
urgent need to directly measure learning, and levels of thinking --
things we have expressed in these discussions that we view as
important. We need tools that do these things well and in practical
ways."

I agree completely with Ed's view. Because many university
administrators and even some faculty remain convinced, despite all
evidence to the contrary, that STUDENT EVALUATIONS ARE LEGITIMATE
MEASURES OF STUDENT LEARNING, there is little incentive to attempt
more direct but less convenient measures. Besides, the most obvious
way to measure student learning in a course, vis., pre-post testing,
is well known by education/psychology specialists to be useless.(1)
As far as I know, only physicists and astronomers have been naïve
enough to attempt pre-post testing of student learning.

I addressed the issue of student evaluations in an 11/24/97 Phys-L(2)
post "Re: Are student evaluations useful?" and again in a 10/13/98
AERA-D(3) post "Re: teacher evals: Worth more than $0.02 comments."
Here is:

I. An edited and updated version of the AERA-D post.

II. A response by student-evaluation champion Lawrence Roche.

BTW:

1. Robert Haskell has written extensively on ethical and legal
aspects of student evaluations.(27,28)

2. If you wish to respond to this very long post, PLEASE, out of
courtesy to other list subscribers, avoid the finger-jerk reaction of
hitting the reply button!!(29)

Richard Hake, Emeritus Professor of Physics, Indiana University
24245 Hatteras Street, Woodland Hills, CA 91367
<rrhake@earthlink.net>
<http://www.physics.indiana.edu/~hake>


I. Hake's 10/13/98 AERA-D(3) post "Re: teacher evals: Worth more than
$0.02 comments." (Edited and updated on 7/16/00)

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
In his 13 Oct 1998 AERA-D posting "Re: teacher evals: Worth more
than $0.02 comments," Lawrence Roche defends student evaluations as
valid measures of teaching effectiveness.

Kyle Forinash, a physics professor at Indiana University Southeast,
in his 11/17/97 posting "Are Student Evaluations Useful" to
PhysLrnR(4) stated that according to a quote being passed around on
his campus:

"....student ratings are the single most valid source of data on
teaching effectiveness. In fact, as Marsh and Roche(5) point out,
there is little evidence of the validity of any other sources of
data."

IF "teaching effectiveness" is equated with student learning then, at
least for introductory physics, I think it is very doubtful that:

(a) student ratings are the most valid source of data on teaching
effectiveness, or

(b) "there is little evidence of the validity of any other sources of data."

Halloun and Hestenes,(6) evidently unaware of the uselessness of
pre-post testing,(1, 7-10) found that four professors with good or
superlative student evaluations, but who utilized traditional
passive-student lectures, were almost totally ineffective in
imparting any conceptual understanding of Newtonian mechanics to
students in introductory physics courses at Arizona State University.
Here student learning was measured by pre- to post-test gains on the
Halloun-Hestenes Mechanics Diagnostic (MD) test of conceptual
understanding (6). More recent pre/post testing(11-14) using the MD
test and the rather similar Force Concept Inventory(11) shows that
much higher gains [a nearly two-standard-deviation difference(12) in
the NORMALIZED gain

<g> = (<%Posttest> - <%Pretest)>/100% - <%Pretest>)]

can be achieved in interactive-engagement courses(11-14).

Among other measures of student learning in introductory physics that
would appear superior to student evaluations are the (a) Mechanics
Baseline test(15), (b) Test of Understanding Graphs in Kinematics
(16),(c) Force and Motion Conceptual Evaluation(17), (d) the
Conceptual Survey of Electricity and Magnetism(18), (e) Advanced
Placement exams.

As for references which purport to show the validity of student
evaluations in gauging student learning, Cohen(19) in a much quoted
meta-analysis of 41 studies on 68 separate multisection courses
claimed that "the average correlation between an overall instructor
rating and student achievement was +0.43; the average correlation
between an overall course rating and student achievement was +0.47
... the results ... provide strong support for the validity of
student ratings as measures of teaching effectiveness." However, in
my opinion, there are at least three problems with Cohen's analysis:

(a) the grading satisfaction hypothesis discussed by Marsh(20)
(students who are aware that they will receive high grades reward
their instructors with favorable evaluations) could account, at least
in part, for the positive correlation, as also indicated by Roche in
his 10/13/98 AERE-D posting,

(b) there was no pretesting to disclose initial knowledge states of
the students [possibly because it is well known to
education/psychology specialists(1, 7-10) that pre-post testing is
useless],

(c) the quality of the "achievement tests" was not examined (were
they of the plug-in-regurgitation type so common in introductory
physics courses?).

With regard to deficiency "c", Feldman(21), in a review and
reanalysis of Cohen's data, points out that "McKeachie.. (ref. 22)
... has recently reminded educational researchers and practitioners
that the achievement tests assessing student learning in the sorts of
studies reviewed here typically measure lower-level educational
objectives such as memory of facts and definitions rather than
higher-level outcomes such as critical thinking and problem solving
that are usually taken as important in higher education."

If "teaching effectiveness" is gauged not just by student learning
but also by affective outcomes (e.g., students' enthusiasm,
satisfaction, motivation, interest, and appreciation) then, in my
opinion, teaching evaluations can serve useful purposes, especially
as diagnostic tools given relatively early in the semester (see e.g.,
ref. 23).

In ref. 13 I list various impediments to the effective implementation
of interactive-engagement methods in introductory physics classes.
Among these is the administrative misuse of student evaluations to
gauge the cognitive (rather than just the affective) impact of
courses.
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh



II. In response to the above post, Lawrence Roche, in his 10/16/98
AERA-D post "Teacher evals: Corrections" & Meta-concoctions" wrote
(using AIP rather than Roche's APA format references):

rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
"Hake's critique of student ratings validity research based on Cohen's
original meta-analysis(19) deserves a brief rebuff - Evidence for
validity most certainly does NOT rest on Cohen's original study.(19)
Cohen(24) subsequently critically analysed and refined his analysis;
see also Abrami, d'Apollonia & Cohen(25) on the multisection validity
paradigm generally. I strongly recommend the discussion of validity
by Marsh & Dunkin,(26) which is much more extensive on most topics
than Marsh & I(5)"
rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr

Although I have not had the time to read refs. 24-26, my guess, based
on two-decades worth of perusing the repetitious psychology/education
literature on student evaluations is that the premise STUDENT
EVALUATIONS ARE LEGITIMATE MEASURES OF STUDENT LEARNING is not
adequately demonstrated.

Richard Hake, Emeritus Professor of Physics, Indiana University
24245 Hatteras Street, Woodland Hills, CA 91367
<rrhake@earthlink.net>
<http://www.physics.indiana.edu/~hake>



REFERENCES AND FOOTNOTES
1. POD Network - Professional and Organizational Development in
Higher Education <http://www.podnetwork.org/main.html>
a. Discussion Group Information <http://www.podnetwork.org/elec/maillist.html>
b. Discussion Group Homepage
<http://www.podnetwork.org>

To subscribe to POD send the message "sub pod <your name>" (without
the quotes and carets) to <listproc@catfish.valdosta.edu>.

Although POD has essentially no archive, one can send the message
"GET POD log0007" to <listproc@catfish.valdosta.edu> to access:

a. the recent discussion on student evaluations (see especially the
thread "Re: Chrony: ". . . Stopped Reading Stu Eval's" [regarding the
editorial of 7/7/00 in the Chronicle of Higher Education's "Point of
View," entitled "Why I Stopped Reading My Student Evaluations," by
Lucia Perillo, associate professor of creative writing and MacArthur
Foundation fellow at Southern Illinois University.]

b. the uselessness of pre-post testing (see the thread "Problem
Solving in Physics" initiated by an MIT physics professor's question
about how he might pre-post test to see whether or not
problem-solving ability had been increased by his course).

2. Phys-L <http://purcell.phy.nau.edu/phys-l/> is a forum with about
650 subscribers interested in physics education. For posts on
teacher evaluations go to the archives
<http://mailgate.nau.edu/archives/phys-l.html>, click on "Search the
archives," and then type "evaluations" into the subject slot to
obtain 94 hits as of 7/16/00.

3. American Educational Research Association, Section D - Measurement
and Research Methodology <http://www.aera.net/divisions/d>.
For posts on teacher evaluations (a) go to the archives
<http://lists.asu.edu/archives/aera-d.html>, (b) click on "Search the
archives," and (c) then type "teacher evals" into the subject slot to
obtain 53 hits as of 7/16/00.

To see my own contribution in the light of meaningful measures of
student learning, add to "a" - "c" above: "d" type "hake" into the
author slot.

To see student-evaluation champion Lawrence Roche's rebuttals add to
"a"-"c" above: "d" type "Roche" into the author slot.

For yet more AERA-D posts by Roche, simply type "Roche" in the author
slot (with all other slots blank) to obtain 44 hits as of 7/16/2000.
In his latest post of 10/15/99, Roche steadfastly maintains, despite
evidence to the contrary from physics-education research, that
"student ratings seem to be the MOST valid tool available ...... (to
measure teaching effectiveness)." (His CAPS)

4. PhysLrnR with archive at
<http://listserv.boisestate.edu/archives/physlrnr.html> is a
discussion list with about 475 subscribers interested in
physics-education research. The archives go back to only to 1998 and
appear to carry few, if any, posts on student evaluations.

5. H.W. Marsh & L.A. Roche, "Making students' evaluations of teaching
effectiveness effective," American Psychologist, 52, 1187-1197(1997).

6. I. Halloun and D. Hestenes, "The initial knowledge state of
college physics students," Am. J. Phys. 53, 1043 (1985).

7. T.D. Cook and D.T. Campbell, "Quasi-Experimention: Design &
Analysis Issues for Field Settings" (Houghton Mifflin, 1979).

8. L. Cronbach & L. Furby, "How we should measure 'change'--or
should we?" Psychological Bulletin 74, 68-80 (1970).

9. R. Kaplan, & D. Saccuzzo, "Psychological Testing: Principles,
applications, and issues" (Brooks/Cole, 1997), pp. 113-115.

10. R.E. Slavin, "Research Methods in Education" (Allyn and Bacon,
2nd ed., 1992).

11. D. Hestenes, M. Wells, and G. Swackhamer, "Force Concept
Inventory," Phys. Teach. 30, 141 (1992) [a revised 1995 version due
to I. Halloun, R.R. Hake, E.P. Mosca, and D. Hestenes is in E. Mazur,
"Peer Instruction: A User's Manual" (Prentice Hall, 1997)and on the
Web (password protected) at
<http://modeling.la.asu.edu/modeling.html>].

12. R.R. Hake, "Interactive-engagement vs traditional methods: A
six-thousand-student survey of mechanics test data for introductory
physics courses," Am. J. Phys. 66, 64 (1998) and on the Web at
<http://www.physics.indiana.edu/~SDI/>; see also
<http://www.aahe.org/hake.htm>.

13. R.R. Hake, "Interactive-engagement methods in introductory
mechanics courses," on the Web at
<http://www.physics.indiana.edu/~sdi/> and submitted on 6/19/98 to
the "Physics Education Research Supplement to AJP"(PERS).

14. My 7/14/00 AERA-D post "Is it Time to Reassess the Value of
Pre-post Testing?" lists 11 recent physics
education-research-articles all consistent with refs. 12 & 13
[despite the claim of education/psychology specialists(1, 7-10) that
conclusions based on pre-post testing cannot be generalized.

15. D. Hestenes and M. Wells, "A Mechanics Baseline Test," Phys.
Teach. 30, 159 (1992).

16. R.J. Beichner, "Testing student interpretation of kinematics
graphs," Am. J. Phys. 62, 750 (1994).

17. R.K. Thornton & D.R. Sokoloff, "Assessing student learning of
Newton's laws: The Force and Motion Conceptual Evaluation and the
Evaluation of Active Learning Laboratory and Lecture Curricula" Am.
J. Phys. 66(4), 338-352 (1998).

18. D.P. Maloney, T.L. O'Kuma, C.J. Heiggelke, and A. Van Heuvelen,
AAPT Announcer 29(2), 82 (1999).

19. P.A. Cohen, "Student ratings of Instruction and Student
Achievement: A Meta-analysis of Multisection Validity Studies,"
Review of Educational Research 51, 281 (1981).

20. H.W. Marsh, "Students' Evaluations of University Teaching:
Dimensionality, Reliability, Validity, Potential Biases, and
Utility," J. of Educational Psychology 76, 707 (1984).

21. K.A. Feldman, "The Association Between Student Ratings of
Specific Instructional Dimensions and Student Achievement: Refining
and Extending the Synthesis of Data from Multisection Validity
Studies," Research on Higher Education 30, 583 (1989).

22. W.J. McKeachie, "Instructional Evaluation: Current Issues and
possible improvements," J. of Higher Education 58(3), 344 (1987).

23. R.R. Hake and J.C. Swihart, "Diagnostic Student Computerized
Evaluation of Multicomponent Courses (DISCOE), Teaching & Learning
(Indiana University, January 1979, on the web at
<http://www.physics.indiana.edu/~hake>.

24. P.A. Cohen, "A critical analysis and reanalysis of the
multisection validity meta-analysis. Paper presented at the 1987
Annual Meeting of the American Educational Research Association,
Washington, D.C. (ERIC Document No. ED 283 876) (April, 1987).

25. P.C. Abrami, S. d'Apollonia, and P.A. Cohen, "Validity of student
ratings of instruction: What we know and what we do not, Journal of
Educational Psychology 82, 219-231(1990).

26. H.W. Marsh & M. Dunkin, "Students' evaluations of university
teaching: A multidimensional perspective" in "Higher education:
Handbook on theory and research, Vol. 8. (Agathon, 1992) pp. 143-234.
[Reprinted in H.W. Marsh & M. Dunkin, "Students' evaluations of
university teaching: A multidimensional perspective," in R. P. Perry
& J. C. Smart (eds.), "Effective Teaching in Higher education:
Research and Practice" (Agathon, 1997) pp. 241-320.]

27. R.E. Haskell:

a. "Academic Freedom, Tenure, and Student Evaluation of Faculty:
Galloping Polls in the 21st Century," Education Policy Analysis
Archives (EPAA) 5(6), 1997 at <http://olam.ed.asu.edu/epaa/v5n6.html>.

b. Part II, "Views from the Court" EPAA 5(17), 1997 at
<http://olam.ed.asu.edu/epaa/v5n17.html>,

c. Part III, "Analysis And Implications of Views From The Court in
Relation to Accuracy and Psychometric Validity," EPAA 5(18), 1997 at
<http://olam.ed.asu.edu/epaa/v5n18.html>,

d. Part IV, "Analysis and Implications of Views From the Court in
Relation to Academic Freedom, Standards, and Quality Instruction,"
EPAA 5(21), 1997 at <http://olam.ed.asu.edu/epaa/v5n21.html>;

28. Exchanges on ref. 27:

a. Theall's Comments, EPAA 5(8,c2), 1997 at
<http://olam.ed.asu.edu/epaa/v5n8c2.html>,
b. Haskell's Response To Theall, EPAA 5(8,c3), 1997 at
<http://olam.ed.asu.edu/epaa/v5n8c3.html>;

a. Stake's Comments, EPAA 5(8), 1997 at
<http://olam.ed.asu.edu/epaa/v5n8.html>,
b. Haskell's Response To Stake, EPAA 5(8,c1), 1997 at
<http://olam.ed.asu.edu/epaa/v5n8c1.html>.

29. Why distribute yet again the replied-to post and clutter
everyone's hard drive? Some sage Netiquette advice is given by the
Phys-L list at <http://purcell.phy.nau.edu/phys-l/#etiquette>:

"Quote Sparingly: avoid excessively large replies created by quoting
complete original messages (a real problem for DIGEST-mode readers).
Instead, select and keep only appropriate quoted text, indicating
what is quoted and what is not from the original message (many email
programs will do this automatically) and manually pruning out
irrelevant sections. Indicate deletions. Leave enough original
material so you are not enigmatic."