Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-L] Re: Teacher quality and pre/post tests - PART 1



PART 1

If you object to cross-posting as a way to tunnel through
intra-disciplinary barriers, dislike referencing, or have no interest
in this subject, please hit "delete" now. And if you respond to this
long (32 kB) post, please don't hit the reply button unless you prune
the original message normally contained in your reply down to a few
lines, otherwise you may inflict this entire post yet again on
suffering list subscribers.

In his AP-Physics post of 4 Jun 2005 15:54:40-0700 titled "Re:
Teacher quality and pre/post tests," Boris Korsunsky, of TPT's
"Physics Challenges" fame, adduced five arguments (A-E below) to
which I shall respond:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
A. Boris wrote: "The pre/post testing method of assessing educational
progress has obvious advantages and just as obvious (and very
serious) shortcoming: inevitable teaching to the (post-)test."

In Hake (1998a) I analyzed sources of error in the average normalized
gains <g>'s as calculated from average pretests <pre> and average
posttests <post> for the 48 "interactive engagement" and 14
"traditional" courses of that survey. Relevant excerpts from that
paper follow [bracketed by lines "hhhhhhhhhh. . . ."]:

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
V. ERRORS IN THE NORMALIZED GAIN
A. Statistical Fluctuations ("Random Errors")
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .

B. Systematic Error
. . . . . . . . . . . . . . . . . . . . . . . . .
2. Teaching to the Test and Test-question Leakage.
Considering the elemental nature of the FCI questions,. . . both the
average. . . [of the average normalized gains <g> of 48 Interactive
Engagement courses]. . . <<g>>48IE = 0.48 ± 0.14, and maximum <g> =
0.69 are disappointingly low, and below those which might be expected
if teaching to the test or test-question leakage were important
influences.

Of the 48 data sets [Hake 1998b) for IE courses (a) 27 were supplied
by respondents to our requests for data, of which 22 (81%) were
accompanied by a completed survey questionnaire, (b) 13 have been
discussed in the literature, and (c) 5 are Indiana University courses
of which I have firsthand knowledge.

All survey-form respondents indicated that they thought they had avoided
"teaching to the test" in answering the question "To what extent do
you think you were able to avoid 'teaching to the test(s)' (i.e.,
going over experiments, questions, or problems identical or nearly
identical to the test items)?" Likewise, published reports of the
courses in group "b" and my own knowledge of courses in group "c"
suggests an absence of "teaching to the test" in the restricted sense
indicated in the question.

(In the broadest sense, IE courses all "teach to the test" to some
extent if this means teaching so as to give students some
understanding of the basic concepts of Newtonian mechanics as
examined on the FCI/MD tests. However this is the bias we are
attempting to measure.)

There has been no evidence of test-question leakage in the Indiana
posttest results (e.g., significant mismatches for individual
students between FCI scores and other course grades). So far there
has been only one report [Hestenes et al. (1992)] of such leakage in
the literature - as indicated in Hake (1998ab), the suspect data were
excised from the survey.
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

Boris might like to contest my discussion of the teaching-to-the-test
error possibility, as well as my other defenses of the
pre/post-testing holy grail [e.g., Hake (2002a, 2002b) against the
infidel hordes.


BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
B. Boris wrote: "I am still amused by the original FCI article in TPT
. . .[Hestenes et al. (1992)]. . . where the authors note that
*their own* classes produced the greatest gains on the FCI even
though "we all took care not to teach to the test" (I can't locate
the article right now, and it is a very loose quote but I am sure I
got the gist of it)."

Since Hestenes et al. (1992) is online and thus only a click away
from one's computer screen at <http://modeling.asu.edu/R&E/FCI.PDF>
(100kB) there's no need to rummage through one's chaotic files for a
hard copy. Perusal of the online copy suggests that the quote Boris
refers to may be from page 6:

"The 'Force Concept Inventory' test has been given to more than 1500
high-school students and more than 500 university students. Results
are displayed in Table III along with post test scores on the
'Mechanics Baseline' described in a companion paper [Hestenes & Wells
(1992)]. For the purpose of rough comparison, the Baseline test can
be regarded as a problem-solving test involving basic Newtonian
concepts. Except for two of the authors (Wells and Swackhamer), all
teachers with class test results in Table III were blind to both
tests when their teaching was done. Both Wells and Swackhamer were
scrupulously careful not to teach to the tests in their own classes."

Boris may believe that:

1.Teaching to the test is **rampant**:

(a) the blindfolds of all teachers with class test results in Table
III (other than Wells and Swackhamer) had holes in them,

(b) Wells and Swackhamer DID teach to the test - despite the claim of
Hestenes et al.,

(c) most instructors of other IE courses subjected to pre/post
testing have taught to the test, and

(d) this pervasive teaching to the test is responsible for the two
standard deviation superiority of <<g>>(IE) to <<g>>(T) observed by
Hake (1998a,b) and many other physics education research groups as
referenced in Hake (2002c,d).

2. PER's have been wasting their times developing interactive
engagement methods [for a listing see Hake (1998b)] whose superiority
to traditional methods as gauged by <g> is illusory.

3. Pre/post testers in fields such as astronomy, economics, biology,
chemistry, computer science, engineering, and physics [see, e.g.,
Hake (2004a,b,c; 2005c,d)] have all been wasting their times.

BTW, any subscribers who, despite Boris's skepticism, might wish to
try pre/post testing in their own classrooms, should be aware:

(a) of the diagnostic test information available at NCSU (2005) and
FLAG (2005);

(b) that in "Assessment of Physics Teaching Methods" [Hake (2002d)],
I summarized pre/post test administration and reporting procedures
that have been proven effective during two decades of pre/post
testing in introductory mechanics courses;

(c) of Aaron Titus's (2005) great "assessment analysis" site that
takes the drudgery out of statistical analysis.


CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
C. Boris wrote: "Just how does one *not* teach to the test that one fully
intends to give at the end of the year? Imagine that we AP teachers
are judged on how much our students gain on a *given* AP test given
*before* and *after* the year. Would the teachers teach a good
physics course or would they teach how to solve these three problems?
Call me cynical but I vote for the latter . . . especially if their
pay is 'performance-based.' "

In the latest addition to my "Archemdean Lever" saga [Hake (2005a) -
AP Physics, Phys-L, and Physhare were mercifully spared] I seemed to
agree with Boris (so I now think I may have been wrong ;->). I wrote:

hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
I agree with Lasry and Pelton that whether or not it's possible to
measure the *relative* effectiveness of a teacher in promoting
student learning (as by *normalized* gains) depends on the context of
the measurement, but disagree that there is validity to Berliner's
concerns **as he presented them in Berliner (2005)**.

As I indicated in Hake (2005b), according to "Campbell's Law"
[Campbell (1975), Nichols & Berliner (2005)]:

"The more any quantitative social indicator is used for social decision
making, the more subject it will be to corruption pressures and the
more apt it will be to distort and corrupt the social processes it is
intended to monitor."

*On the basis of "Campbell's Law"* one could argue that any attempt
to measure the relative effectiveness of a teacher in promoting
student learning *in the context of high-stakes testing* is doomed to
failure.

However, Berliner's argument that "no current system is acceptable
for assessing this dimension of teacher quality" doesn't appear to be
based on "Campbell's Law," but rather on "psychometric problems." I
assume Berliner [as others in the Psychology/Education/Psychometric
(PEP) community, e.g., Cronbach & Furby (1970), Suskie (2004a)] would
claim that such "psychometric problems" exist even in the low-stakes
formative pre/post testing assessments of physics education research.
(These are formative in the sense that they are being used in an
ongoing process to improve physics instruction nation wide, and not
to arrive at a summative evaluation of individual course or
individual teacher merit.)
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh


DDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDDD
D. Boris wrote: "I am oversimplifying, of course - but I think that
the "PEP". . . [Psychology/Education/Psychometric}. . . community
(death by acronym?) has every right to be skeptical (not
"dismissive") of the PERPS (physics education research, problem
solving) community's favorite pre/post test method of assessing
educational progress. Given that I count myself as a member of *both*
communities, I hope nobody takes my humble opinion personally."

The pre/post paranoic PEP community's standard objections to pre/post
testing were listed by Linda Suskie (2004a), and contested (I
modestly forbear use of the word "demolished") by Hake (2004a) and
Scriven (2004). Boris might like to contest the contestation.


EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE
E. Boris wrote: "Since the context of Richard's message was teacher quality,
I would like to add that, while I agree that it is, indeed, important
to improve it, academic results of the students (however they are
measured) are determined by a huge number of factors outside of the
teachers' control. For instance, statistical models created for MCAS
scores (Massachusetts state competency tests) showed (as far as I
know) that the factors most closely correlated with students scores
are *not* related to the quality of the schools and/or teachers. One
of the most significant ones was the size of the town library;
another was educational use of the home computer and having one's own
room for study - all of which are proxies for SES, of course."

Yes, I know, and economist Eric Hanushek's (1989) statistical
analyses keep telling us that money spent on schools doesn't matter
insofar as student achievement is concerned. I am skeptical of the
results of both Hanushek and MCAS. As an antidote for the misleading
conclusions of statistical analyses of resources such as "money
spent," "quality of schools," "quality of teachers," "use of home
computer," class size, etc., usually by education-ignorant policy
analysts, see the insightful article by Cohen et al.(2002).


Richard Hake, Emeritus Professor of Physics, Indiana University
24245 Hatteras Street, Woodland Hills, CA 91367
<rrhake@earthlink.net>
<http://www.physics.indiana.edu/~hake>
<http://www.physics.indiana.edu/~sdi>

" . . . I know from both experience and research that the teacher is
at the heart of student learning and school improvement by virtue of
being the classroom authority and gatekeeper for change. Thus the
preparation, induction, and career development of teachers remain the
Archimedean lever for both short- and long-term improvement of public
schools."
Larry Cuban. 2003. "Why Is It So Hard To Get Good Schools?" Teachers
College Press.

REFERENCES are in PART 2
--
Richard Hake, Emeritus Professor of Physics, Indiana University
24245 Hatteras Street, Woodland Hills, CA 91367
<rrhake@earthlink.net>
<http://www.physics.indiana.edu/~hake>
<http://www.physics.indiana.edu/~sdi>
_______________________________________________
Phys-L mailing list
Phys-L@electron.physics.buffalo.edu
https://www.physics.buffalo.edu/mailman/listinfo/phys-l