Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Can Pre/post Testing Inform Curriculum Design? (was "Can Biologists Learn. . .") - PART 1



PART 1
If you respond to this long (29K) post PLEASE DON'T HIT THE REPLY
BUTTON - the bane of discussion lists - and thereby inflict it yet
again on list subscribers.

Please excuse this cross-post, in the interests of interdisciplinary
synergy, to discussion lists with archives at:

ASSESS <http://lsv.uky.edu/archives/assess.html>,
Biopi-L <http://listserv.ksu.edu/archives/biopi-l.html>,
Chemed-L <http://mailer.uwf.edu/archives/chemed-l.html>,
EVALTALK <http://bama.ua.edu/archives/evaltalk.html>,
FYA <http://listserv.sc.edu/archives/fya-list.html>,
Phys-L <http://lists.nau.edu/archives/phys-l.html>,
PhysLrnR <http://listserv.boisestate.edu/archives/physlrnr.html>,
Physhare <http://lists.psu.edu/archives/physhare.html>,
POD <http://listserv.nd.edu/archives/pod.html>

In his PhysLrnR post of 5 Sep 2003 21:35:47-0600 titled "Re: Can
Biologists Learn Anything from PAER?"[PAER = Physics/Astronomy
Education Research], Mike Zeilik first quotes Paul Camp's (2003)
PhysLrnR post:

"The learning process is far more complex than can possibly be captured in
pre/post settings. The reason these tests are not very informative
for curriculum design is because pre/post in an inherently coarse
measure of success or failure. It determines whether or not a concept
has been acquired, not how it has been acquired. The developmental
process is far more interesting and requires a finer scale data
collection effort to reveal."

Then Mike writes (my CAPS):

"I don't know anyone who has claimed to the contrary. We made very clear (I
hope!) the limitations of the Astronomy Diagnostic Test (ADT) . . .
.[see e.g., Zeilik (2002). . . , for example. WHAT I FIND FRUSTRATING
IS THAT SOME FACULTY USE THE ARGUMENTS SUCH AS THIS TO STALL THE
DEVELOPMENT OF ANY ASSESSMENTS OR TO RATIONALIZE NOT USING THEM.

NO ONE HAS CLAIMED TO THE CONTRARY ?? Mike is evidently unaware of:

(a) my own criticism (Hake 1998b) of the Pride, Vokos, & McDermott
Position (PVMP) (1985) [which is very similar to Camp's Position (CP)
and to the Andy Johnson Position (AJP) (2003) - the AJP was countered
by Hake (2003b)],

(b) my comment [Hake 2002a, references keyed to the present
REFERENCES list) regarding non-classical test theory - see also Hake
2003e]:

"It should be remarked that most of the analysis of the FCI (Force
Concept Inventory), MD (Mechanics Diagnostic test), FMCE (Force
Motion Concept Inventory), CSEM (Conceptual Survey of Electricity and
Magnetism), and other physics assessment tests has been done within
the framework of "Classical Test Theory" in which only the number of
correct answers is considered in the scoring. However more
sophisticated analyses are being developed [e.g., Bao & Redish (2001)
for the FCI, and Thornton (1995) for the FMCE]; and other
psychometric approaches such as Item Response and Rasch analyses may
offer advantages in some cases [see e.g., Hake (2002b) for a few
references to the voluminous psychometric literature."

A similar point was recently made by Bob Beichner (2003) in a
PhysLrnR post: "[Bao & Redish (2001)] provides insight into the
'mixed state' of students as they move from a naive toward a more
conventional understanding of different topics. It looks not at just
right/wrong question scores like traditional factor analysis, but
instead uses specific incorrect answers as basis vectors in a
multidimensional space modeling (in a limited fashion) what students
think about an idea."

However, I agree with Mike that PVMP/CP/AJP is used by faculty to
"stall the development of any assessments or to rationalize not using
them." I also suspect that PVMP/CP/AJP was at least partially
responsible for the lamentable failure of the NRC's expert education
committees [NRC (1997; 1999; 2003a,b); McCray et al. (2003);
Pelligrino et al.(2001); Shavelson & Towne (2002)] to:

(a) recognize the landmark qualitative/quantitative physics education
research of Halloun and Hestenes (1985a,b) in developing the
Mechanics Diagnostic (MD) test [precursor to the Force Concept
Inventory (FCI) (Hestenes et al. (1992)],

(b) learn from the lessons of the physics education reform effort
[Hake (2002a)], especially:

"Lesson #3. High-quality standardized tests of the cognitive and
affective impact of courses are essential for gauging the
effectiveness of non-traditional educational methods relative to
traditional methods."

Elaborating on Lesson #3. I wrote in Hake (2002a) [bracketed between
lines "HHHHHHH. . . ."; see that article for references other than
Hake (2003a,b,c)]:

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
As indicated in the introduction, so great is the inertia of the
educational establishment (Lesson #13) that THREE DECADES OF
PHYSICS-EDUCATION RESEARCH DEMONSTRATING THE FUTILITY OF THE
PASSIVE-STUDENT LECTURE IN INTRODUCTORY COURSES WERE IGNORED UNTIL
HIGH-QUALITY STANDARDIZED TESTS THAT COULD EASILY BE ADMINISTERED TO
THOUSANDS OF STUDENTS BECAME AVAILABLE. These tests are yielding
increasingly convincing evidence that interactive engagement (IE)
methods enhance conceptual understanding and problem solving
abilities far more than do traditional methods. . . . As far as I
know, disciplines other than physics, astronomy (Adams et al. 2000;
Zeilik et al. 1997, 1998, 1999), and possibly economics (Saunders
1991, Kennedy & Siegfried 1997, Chizmar & Ostrosky 1998, Allgood and
Walstad 1999) have yet to develop any such tests and therefore cannot
effectively gauge either the need for or the efficacy of their
reformefforts . . . .[But more recently pre/post testing is beginning
to gain a foothold in other disciplines (Hake 2003a,b,c]. . . In my
opinion, ALL DISCIPLINES SHOULD CONSIDER THE CONSTRUCTION OF
HIGH-QUALITY STANDARDIZED TESTS OF ESSENTIAL INTRODUCTORY COURSE
CONCEPTS.
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

According to Hestenes (1998): "Some critics doubt that hard evidence
is possible in education. The FCI results stand as a counter example.
. . .[for arguments that hard evidence (both quantitative and
qualitative) in education IS possible see, e.g., Hestenes (1979);
Redish (1990); Hake (2002), "Can Educational Research be Scientific
Research"); and Shavelson & Towne (2002)]. . . . The data base of
results from the FCI is enormous and growing rapidly. I have direct
knowledge of data on more than 20,000 students and 300 physics
classes spanning the range from high school to graduate school.
Judging from the steady stream of FCI reports at AAPT meetings, the
FCI has undoubtedly been used in hundreds of other physics
classrooms. Richard Hake has compiled, analyzed and published a
substantial portion of that data. [Hake (1998a)]. Halloun and I are
analyzing more extensive data on high school students and teachers.
Altogether, the data provides overwhelming support for our original
conclusions. . . . [(1) "....the student's initial qualitative,
common-sense beliefs about motion and .... (its).... causes have a
large effect on performance in physics, but conventional instruction
induces only a small change in those beliefs." (2) "Considering the
wide differences in the teaching styles of the four
professors....(involved in the study)....the basic knowledge gain
under conventional instruction is essentially independent of the
professor."] . . . . . THE DATA BASE IS NOW SO BROAD THAT THE
UNSETTLING MESSAGE IT BRINGS CAN NO LONGER BE ATTRIBUTED TO BIAS OR
INCOMPETENCE OF THE ORIGINAL INVESTIGATORS." (My CAPS.)

In further support of the importance of pre/post testing with valid
and consistently reliable multiple-choice tests such as the FCI, I
criticized the Pride/Vokis/McDermott Position (PVMP) in Hake (1998b):

HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH
According to Pride et al. (1998) "The results . . . [of Pride et al.
(1998)]. . . demonstrate that responses to multiple choice questions
often do not give an accurate indication of the level of
understanding and that questions that require students to explain
their reasoning are necessary. . . Good performance on a multiple
choice test may be a necessary condition, but it is not a sufficient
criterion for making this judgment. . . (of functional understanding
of the material). . . .broad assessment instruments are not sensitive
to fine structure and thus may not accurately reveal the extent of
student learning. Moreover, SUCH INFORMATION DOES NOT CONTRIBUTE TO A
RESEARCH BASE THAT IS USEFUL FOR THE DESIGN OF INSTRUCTIONAL
MATERIALS." (MY CAPS.)

If it IS true that broad assessment instruments such as the FCI/MD
and MB [Mechanics Baseline (Hestenes & Wells 1992)] are NOT useful
for the design of instructional material but only for increasing
"faculty awareness of the failure of many students to distinguish
between Newtonian concepts and erroneous common sense beliefs, both
before and after instruction in physics," [Pride et al. (1998)] then
the value of surveys such as this one [Hake (1998a,b; 2002a, 2003d]
is rather limited.

I think that most physics-education researchers would agree that
Multiple Choice (MC) tests, even those as carefully crafted as the
MD/FCI and MB cannot probe students' conceptual understanding as
deeply as can the searching (and labor intensive) analyses of:

(a) student interviews conducted by physics experts, or, arguably,

(b) well-designed, free-response problem exams.

In my opinion, MC tests, interviews, problem exams, and case studies
all have their advantages, disadvantages, and trade-offs and should
be used in combination so as to be mutually supportive whenever
possible. The FCI/MD questions, answers, and distractors, were, in
fact, developed from extensive interview data [Halloun & Hestenes
(1985a,b); Hestenes et al. (1992)].

The present survey, in addition to the MD/FCI and MB test results,
gathers information from detailed questionnaire (Hake 1997) responses
of instructors, and invokes supplementary case studies (e-mail and
telephone interviews) in situations where questionable or unexpected
test results were initially
obtained.

The advantage of carefully designed MC tests (especially if
supplemented with other research and testing procedures), is that
they allow a standardized measurement with uniform grading over a
large population and thus may afford a more practical route to
evaluating the effectiveness of methods used in large-enrollment
introductory courses at one or many institutions than, by themselves,
individual interviews, individually graded exams, or case studies.

The present survey:

(a) strongly suggests that classroom use of IE methods [i.e., those
designed at least in part to promote conceptual understanding through
interactive engagement of students in heads on (always) and hands-on
(usually) activities which yield immediate feedback through
discussion with peers and/or instructors] can increase mechanics
course effectiveness in both conceptual understanding and
problem-solving well beyond that achieved with traditional (T)
methods;

(b) shows that, for the survey courses, current IE methods fail to
produce normalized gains in the High-g [<g> above 0.69] region,
suggesting the need
for improvement of IE strategies in content and/or implementation;

(c) gives references to the surveyed IE methods, materials,
instructors, and institutions;

(d) discusses the various implementation problems that appear to have
occurred; and

(e) suggests ways to overcome those problems.

In my opinion, the foregoing information and suggestions ARE of
potential value in designing instructional materials, e.g., current
materials need to be improved, new materials should be designed to
promote interactive engagement while avoiding the survey-indicated
implementation pitfalls.

Therefore, I DISAGREE WITH PRIDE et al. THAT BROAD ASSESSMENT
INSTRUMENTS DO NOT "CONTRIBUTE TO A RESEARCH BASE THAT IS USEFUL FOR
THE DESIGN OF INSTRUCTIONAL MATERIALS."
HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH

Because of the unfortunate and unjustified suppression of Hake
(1998b), the above message never got to Pride et al., the PER
community, physics teachers, or the members of NRC's expert education
panels [NRC (1997; 1999; 2003a,b); McCray et al. (2003)].

PERHAPS PRIDE ET AL., PAUL CAMP, OR ANDY JOHNSON WOULD LIKE TO
RESPOND TO MY CRITICISM OF THEIR POSITIONS.

Richard Hake, Emeritus Professor of Physics, Indiana University
24245 Hatteras Street, Woodland Hills, CA 91367
<rrhake@earthlink.net>
<http://www.physics.indiana.edu/~hake>
<http://www.physics.indiana.edu/~sdi>

CONTINUED IN PART 2