If you reply to this long (15 kB) post please don't hit the reply
button unless you prune the copy of this post that may appear in your
reply down to a few relevant lines, otherwise the entire already
archived post may be needlessly resent to subscribers.
*********************************************
ABSTRACT: As far as I know there have been no rigorous measurement of
correlations of student evaluation of teaching (SET) ratings in large
data sets over many courses with
(a) academic expectations, (b) course delivery techniques, or (c)
gains on standardized tests of student learning, such as the Force
Concept Inventory (FCI). However, anecdotal evidence suggests a
NEGATIVE correlation of normalized gains on the FCI with SET scores.
It is argued, yet again, that SET's may be valid for gauging the
important *affective* impact of courses and for providing diagnostic
feedback to *teachers*, but they are NOT valid as measures of higher
education's primary concern: students' higher-order learning. In fact
the gross misuse of SET's as gauges of student learning is, in my
view, one of the institutional factors that thwarts substantive
educational reform.
*********************************************
Gary Turner, in his Phys-L post of 26 Oct 2006 16:31:15-0500 titled
"Research into student evaluations" wrote [bracketed by lines
"TTTTTTT. . ."; slightly edited]:
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
Can anyone recommend any research articles on the nature of the
correlation between student evaluations of teaching (SET's) and any
of the following:
a. academic expectations;
b. course delivery techniques (e.g., lecture vs active);
c. gains on standardized tests, such as the Force Concept Inventory
(FCI) [Hestenes et al. (1992);
TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
As far as I know (please correct me if I'm wrong) there have been no
rigorous measurements of such correlations in large data sets over
many courses. But for anecdotal evidence suggesting a NEGATIVE
correlation of normalized gains on the FCI with SET scores see,
e.g., "Re: What if students learn better in a course they don't
like?" [Hake (2006a)].
Nevertheless, SET enthusiasts sometimes justify the use of SET's for
gauging the cognitive impact of courses by citing measured
correlations of "achievement" on course exams and final grades with
SET ratings. But are such correlations significant with regard to
what *should* be the primary concern of higher education: students'
higher-order learning? [Shavelson & Huang (2003)].
In "The Physics Education Reform Effort: A Possible Model for Higher
Education?" [Hake (2005)], I wrote [see that article for the
references other than Hake (2002a)]:
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Investigation of the extent to which a paradigm shift from teaching
to learning . . . [Barr & Tagg (1995)]. . . is taking place requires
measurement of students' learning in college classrooms. But Wilbert
McKeachie (1987) has pointed out that the time-honored gauge of
student learning - COURSE EXAMS AND FINAL GRADES - TYPICALLY MEASURES
LOWER-LEVEL EDUCATIONAL OBJECTIVES such as memory of facts and
definitions rather than higher-level outcomes such as critical
thinking and problem solving. The same criticism (Hake 2002a) as to
assessing only lower-level learning applies to Student Evaluations of
Teaching (SET's), since their primary justification as measures of
student learning appears to lie in the modest correlation with
overall ratings of course (+ 0.47) and instructor (+ 0.43) with
"achievement" AS MEASURED BY COURSE EXAMS OR FINAL GRADES (Cohen
1981).
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
In response to Turner's post, David Marx replied on 26 Oct 2006
[slightly edited; my CAPS]:
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
I recommend a look at [Seldin (2006)].
STUDENT EVALUATIONS ARE (surprisingly) VALID MEASURES OF TEACHING
PERFORMANCE. There are a lot of misconceptions about student evals.
This book includes studies of evals.
MMMMMMMMMMMMMMMMMMMMMMMMMMMMMMMM
SET's are a valid measure of teaching performance? The crucial
question is "Valid for what?" In Hake (2002a) I wrote [see that
article for references other than Hake & Swihart (1979) and Hake
(2002b)]:
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
I think SET's can be "valid" in the sense that they can be useful for
gauging the *affective* impact of a course and for providing
diagnostic feedback to *teachers* [see, e.g., Hake & Swihart (1979)]
to assist them in making mid-course corrections. However IMHO, SET's
are NOT valid in their widespread use by *administrators* to gauge
the cognitive impact of courses [see, e.g., Williams & Ceci (1997);
Hake (2000; 2002c,d); Johnson (2002)]. In fact the gross misuse of
SET's as gauges of student learning is, in my view, one of the
institutional factors that thwarts substantive educational reform
(Hake 2002b, Lesson #12)."
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
Hake, R.R. 2002b. "Lessons from the physics education reform
effort,"Ecology and Society 5(2): 28; online at
<http://www.ecologyandsociety.org/vol5/iss2/art28/>. Ecology and
Society (formerly Conservation Ecology) is a free online
"peer-reviewedjournal of integrative science and fundamental policy
research" with about 11,000 subscribers in about 108 countries.
Hake, R.R. 2006a. "Re: What if students learn better in a course they
don't like?" online at <http://tinyurl.com/yfgu2g>. Post of 29 Jun
2006 14:27:59-0700.
Hake, R.R. 2006b. "A Possible Model For Higher Education: The Physics
Reform Effort (Author's Executive Summary)," Spark (American
Astronomical Society Newsletter), June, online at
<http://www.aas.org/education/spark/SparkJune06.pdf> (1.9MB). Scroll
down about 4/5 of the way to the end of the newsletter.
Hake, R.R. 2006c. "SET's Are Not Valid Gauges of Teaching
Performance," online at <http://tinyurl.com/rtfqw>. Post of 20 Jun
200607:50:58-0700. ABSTRACT: It is argued that if universities value
teaching that leads to student higher-level learning, then student
evaluations of teaching (SET's) do NOT afford valid evidence of
teaching performance. Instead, institutions should consider the
DIRECT measure of students' higher-level *domain-specific* learning
through pre/post testing using (a) valid and consistently reliable
tests *devised by disciplinary experts*, and (b) traditional courses
as controls.
Hake, R.R. 2006d. "SET's Are Not Valid Gauges of Teaching Performance
#2" online at
<http://listserv.nd.edu/cgi-bin/wa?A2=ind0606&L=pod&O=D&P=13806>.
Post of 21 Jun 2006 21:12:19-0700. ABSTRACT: I respond in order to 12
points made by Michael Scriven in his thoughtful response to Hake
(2006c).
Hake, R.R. 2006e. "SET's Are Not Valid Gauges of Teaching Performance
#3," online at
<http://listserv.nd.edu/cgi-bin/wa?A2=ind0606&L=pod&O=D&P=14961>.
Post of 25 Jun 2006 20:58:34-0700. ABSTRACT: I respond in order to 5
points made by Wilbert (Bill) McKeachie (WM) in his thoughtful
response to Hake (2006c).
Hake, R.R. 2006f. "SET's Are Not Valid Gauges of Teaching Performance
#4, online at
<http://listserv.nd.edu/cgi-bin/wa?A2=ind0606&L=pod&P=R15773&I=-3>.
Post of 27 Jun 2006 17:24:18 -0700. ABSTRACT: I respond in order to 6
points made by Michael Theall in his response to Hake (2006c).<
Hake, R.R. 2006g. "Re: Adjunct Faculty: Improving Results," online at
<http://lsv.uky.edu/scripts/wa.exe?A2=ind0607&L=ASSESS&P=R2&I=-3>.
Post of 3 Jul 2006 17:00:06-0700. ABSTRACT: I respond in order to 20
points made by Dan Tompkins in his ASSESS posts of 7-29 June 2006,
titled "Re: Adjunct Faculty: Improving Results," "Re: SET's Are Not
Valid Gauges of Teaching Performance," and "Re: What if students
learn better in a course they don't like?" A subtitle might be "Is It
Possible to Construct a "Philosophy Concept Test" of students'
higher-level learning?
Hestenes, D., M. Wells, & G. Swackhamer, 1992. "Force Concept
Inventory," Phys. Teach. 30: 141-158; online (except for the test
itself) at
<http://modeling.asu.edu/R&E/Research.html>. The 1995 revision by
Halloun, Hake, Mosca, & Hestenes is online (password protected) at
the same URL, and is available in English, Spanish, German,
Malaysian, Chinese, Finnish, French, Turkish, Swedish, and Russian.
Pallet, H. 2006. "Uses and Abuses of Student Ratings," Chapter 4 of
Seldin (2006). At <http://www.ankerpub.com/SeldinEFP-Preface.pdf> it
is stated that 'William Pallett examines the uses and abuses of
student ratings. He contends that they are a valuable resource but
should count just 30% to 50% in the overall evaluation of teaching,
that such ratings can serve multiple purposes, that administrators
sometimes make too much of too little difference in ratings,and that
student rating results should be categorized into no more than three
to five groups."
Reis, R. 2006. "Uses and Abuses of Student Ratings," Tomorrow's
Professor Message Msg. #756. This posting looks at Pallett (2006).
Free subscriptions to "Tomorrow's Professor" are available at
<https://mailman.stanford.edu/mailman/listinfo/tomorrows-professor>.
Shavelson, R.J. & L. Huang. 2003. "Responding Responsibly To the
Frenzy to Assess Learning in Higher Education," Change Magazine,
January/February; online at
<http://www.stanford.edu/dept/SUSE/SEAL/>. See the first "highlight."
Seldin, P. ed. 2006. "Evaluating Faculty Performance: A Practical
Guide to Assessing Teaching, Research, and Service," Anker
Publishing. Anker information at <http://tinyurl.com/y7bffv>. [Amazon
and Barnes & Noble appear to be aware of only the 1999 version of
this book, while Anker gives no indication that previous editions
exist.]