If you respond to this long (11 kB) post, PLEASE DON'T HIT THE REPLY
BUTTON UNLESS YOU PRUNE THE ORIGINAL MESSAGE NORMALLY CONTAINED IN
YOUR REPLY DOWN TO A FEW LINES, otherwise you may inflict this entire
post yet again on suffering list subscribers.
In his Phys-L post of 16 Apr 2005 of the above title, Jack Uretsky
(2005) wrote [bracketed by lines "UUUUUUU. . . . ."]:
UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
I don't understand. If you can't control the variables, how can you get
statistically "valid" results?
My expeerimentalizt friends (in particle physics) now divide
uncertainties into a +statistical" part and a "systematic" part.
These two parts must be combined somehow (how to combine is still IMO
an unsolved problem) in order to arrive at a meaningful statement of
uncertainty. When several experiments measure the same quantity,
then a comparison of quoted uncertainties gives one an intuitive
feeling for the uncertainty of our knowledge of the quantity being
measured.
I have seen nothing in the field of measuring teaching techniques that
tells me that we have much insight into how to make "statistically valid"
comparisons.
UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
[I shall politely forego criticism of Jack's reply-button-pushing,
archive-clogging, re-infliction on suffering Phys-L'ers of two ENTIRE
already archived posts, Shapiro (2005) and Hake (2005a) - why not use
"webcites" (Hake 2005b)].
Jack appears to be either oblivious or dismissive of:
(a) the approximately two-standard deviation difference in average
normalized gains <g> between interactive-engagement (IE) and
traditional (T) mechanics courses observed by myself [Hake (1998a,b)]
and MANY OTHER RESEARCH GROUPS [referenced in Hake (2002a,b)];
(b) Cohen's (1988) effect size d = 2.43 for the average <g>'s of IE
and T courses calculated in Hake (2002a) for the data of Hake
(1998a,b).
Cohen's (1988, p. 24) rule of thumb - based on typical results in
social science research - that d = 0.2, 0.5, 0.8 imply respectively
"small," "medium," and "large" effects. But Cohen cautions that the
adjectives "are relative, not only to each other, but to the area of
behavioral science or even more particularly to the specific content
and research method being employed in any given investigation." Eight
reasons for the unusually high g = 2.43 are given in Hake (2004).
It's possible that Jack seeks statistical validation through Null
Hypothesis Significance Testing (NHST) - the significance of which
has been under attack for many years. The effect size is commonly
used in meta-analyses, and strongly recommended by many psychologists
and biologists [for references see Hake (2002a)] as a preferred
alternative (or at least addition) to the usually inappropriate
t-tests and p values associated with null-hypothesis testing.
As related in Hake (2002a), Carver (1993) subjected the Michelson &
Morley (1887) data to a simple analysis of variance (ANOVA) and found
**statistical significance** associated with the direction the light
was traveling
(p < 0.001)! He writes [my *emphasis*]:
"It is interesting to speculate how the course of history might have
changed if Michelson and Morley had been trained to use this *corrupt
form of the scientific method,* that is, testing the null hypothesis
first. They might have concluded that there was evidence of
*significant* differences in the speed of light associated with its
direction and that therefore there was evidence for the luminiferous
ether . . . . Fortunately Michelson and Morley . . .(first) . . . .
interpreted their data with respect to their research hypothesis."
Consistent with the scientific methodology of physical scientists
such as Michelson/Morley, Rozeboom (1960) wrote: ". . . the primary
aim of a scientific experiment is not to precipitate decisions, but
to make an appropriate adjustment in the degree to which one accepts,
or believes, the hypothesis or hypotheses being tested."
For a plethora of other anti-NHST references see Hake (2002a). For a
mildly pro-NHST discussion see Wainer & Robinson (2003). The latter
conclude that "NHST is most often useful as an adjunct to other
results (e.g., effect sizes) rather than as a stand-alone result."
Among examples of NHST analyses of physics education research results
are e.g., Beichner (1994), Beichner et al. (1999), Cheng et al.
(2004), and Hake (2002a).
REFERENCES
Beichner, R.J. 1994. Testing student interpretation of kinematics
graphs. Am. J. Phys. 62(8): 750-762.
Beichner, R., L. Bernold, E. Burniston, P. Dail, R. Felder, J.
Gastineau, M. Gjertsen, and J. Risley. 1999. Case study of the
physics component of an integrated curriculum. Physics Ed. Res.
Supplement to Am. J. Phys. 67(7): S16-S24.
Cheng, K.K., B.A. Thacker, R.L. Cardenas, & C. Crouch. 2004. "Using
an online homework system enhances students' learning of physics
concepts in an introductory course," Am. J. Phys. 72(11): 1447-1453.
Carver. R.P. 1993. "The case against statistical significance
testing, revisited.' Journal of Experimental Education 61(4): 287-292.
Cohen, J. 1988. "Statistical power analysis for the behavioral
sciences." Lawrence Erlbaum, 2nd ed.
Hake, R.R. 2002a. "Lessons from the physics education reform effort,"
Ecology and Society 5(2): 28; online at
<http://www.ecologyandsociety.org/vol5/iss2/art28/>. Ecology and
Society
(formerly Conservation Ecology) is a free online "peer-reviewed
journal of integrative science and fundamental policy research" with
about 11,000 subscribers in about 108 countries.
Hake, R.R. 2005a. "Should Randomized Control Trials Be the Gold
Standard of Educational Research ?" online at
<http://lists.asu.edu/cgi-bin/wa?A2=ind0504&L=aera-l&T=0&O=D&P=1945>.
Post of
15 Apr 2005 to AERA-C, AERA-D, AERA-G, AERA-H, AERA-J, AERA-K, AERA-L,
AP-Physics, ASSESS, Biopi-L, Chemed-L, EvalTalk, Math-Learn, Phys-L,
Physhare, POD, STLHE-L, & TIPS.
Michelson, A.A. & E.W. Morley. 1887. On the relative motion of earth
and luminiferous ether. American Journal of Science 134:333-345.
Rozeboom, W.W. 1960. The fallacy of the null-hypothesis significance
test. Psychological Bulletin 57:416-428; online at
<http://psychclassics.yorku.ca/Rozeboom/>.