Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: science for all?



The problem is that I have used some standard jargon from the standard
educational literature and alluded to some results from that same
literature.

Effect size is generally defined as the change in the mean on an evaluation
divided by the standard deviation of the curve. Most of the educational
literature effect sizes are less than 1.0 and a curriculum which achieves
anything over 0.5 is usually considered to be very effective. Normally this
would be used to compare the effects of 2 different curricula and the effect
size would be calculated for the difference between the curricula.
Obviously the effect size is not a valid comparison tool when the students
come in with a statistically zero score, or a score that could be produced
by random guessing. Most teachers and education researchers should be
familiar with effect size, so it will convey meaning. Whether or not this
is the best way to compare curricula can certainly be questioned, but it is
currently the method often used.

The initial curve (Lawson test) in JCST (figure 2) essentially looks like a
normal distribution. The final one is also similar, but moved over by about
1 SD. I am judging this by the curve. The result is that the number of
students who would be classified as concrete is dramatically reduced. I
have found for the Lawson test that when one looks at individual student
scores they do not all move up, but rather each student moves a different
amount, with some making dramatic gains, and others none at all. The curve
in JCST unfortunately moves so far to the right that the right hand tail is
cut off by saturation on the test. Again, the best way to communicate about
this is to look at the same article. To communicate effectively about an
article it is helpful if both people have read it.

Incidentally the effect sizes when comparing conventional physics courses
with "interactive engagement" courses are generally greater than 1.0 and can
be 3.0. If one calculates the Hake normalized gain for the data
(post-pretest)/(max score-pretest) one comes up with 0.66 for the JCST
article gain or 66% normalized gain. This latter analysis is now being used
to compare evaluations of various types of curricula in physics. While the
absolute number has some meaning, comparison of different classes seems to
be independent of the initial knowledge state of the learner and seems to be
strongly dependent on the pedagogy used. The main defect of the Hake
analysis is that normalized gain on a physics evaluation is highly dependent
on the thinking ability as measured by the Lawson test.

Whether or not the data exactly follows the theory you outlined is indeed
problematical, but the effect size analysis is routinely used when talking
about test results for students. This allows one to compare different
student treatments in a fairly standard manner. A given distribution may
not be exactly normal, and I don't think that the idea that the ideal
distribution is the same for each student has any meaning. Students
generally fall in the same place on the curve and do not fluctuate over the
curve substantially when retested if you have a well designed test.
Students do not behave like gas molecules. One must also look at the error
on the mean to see if the gain comparisons are significant. If one has a
fairly large number of students the error on the mean will not be
significant.

I am a bit puzzled about how you would ask for the probability that they
came from the same unknown distribution. Student test scores generally rise
rather than fall after instruction. The posttest curve as a result should
not be the same as the pretest curve. Perhaps you could describe how you
would analyze the distribution of student test scores??? Or better yet try
pre and posttesting your students with one of the standard tests and report
the results in the fashion that you think is correct.

BTW if you want to see how good treatments can have a long term effect on
student achievement, read "Really Raising Standards" by Shayer&Adey. They
reference articles from refereed journals, and there is now evidence that
well designed interventions similar to some of the reform physics curricula
can have a large effect several years after the intervention.

More comments are in the rest following specific questions.

John M. Clement
Houston, TX

Jack Wrote:
OK. This seems very confused. Maybe it's just a language
difficulty, so let's start by agreeing on the mathematics.
(^{2) means "squared", \mu means the greek letter, n(\mu,\sigma^2)
stands for a normal distribution having mean \mu and standard deviation
\sigma. For more details see, e.g., Hogg and Craig, <An Introduction to
Mathematical Statistics>, ("H&G").
There is a theorem that a set of samples of size N from a normal
distribution having mean \mu and standard deviation \sigm will have means
that are distributed according to n(\mu,\sigma^{2}/N). "Standard
deviation" has two different meanings here. One, \sigma, characterizes
the original distribution from which the samples are taken. The other,
square root of \sigma^{2}/N, characterizes the distribution of means of
the samples. John clearly intends the first meaning.
But therein lies a difficulty. For John's usage to be correct we
must treat each student as a "sample" from a distribution that is:
1. Normal
2. The same for each student.
This would be tough to show.
A better approach is to view the two samples (before and after)
and ask for the probability that they come from the same (unknown)
distribution. Also relevant is the nature of the distribution of each
sample.

On Sun, 23 Dec 2001, John Clement wrote:

The effect is essentially an effect size of 1.0.
Please translate that sentence. "Size" in what units?
Essentially the mean of
the curve was moved over by and amount equal to the standard
deviation of
the original curve. For more information I would suggest
reading the cited
paper.
That wouldn't help me understand what <you> are saying.

Viewing the graph would make it quite clear.


When I looked at the actual curve I noticed that the final curve
seemed to show that most concrete students were moved to higher thinking
levels.
Are you saying that the second distribution was bimodal?

No, just that on the posttest few students had scores below 5, while on the
pretest a substantial number were in this region. Students who test as
concrete thinkers have substantial difficulty with physics concepts, but
they can probably fare better in conventional biology courses.

An educational effect this large is considered to be VERY large.
Considered by whom? On what basis?

This is by experience with measuring student achievement. The literature
generally shows effect sizes considerably below 1.0.


The particular evaluation is not testing knowledge but the
thinking ability
of the students. I have data spanning several years for students at our
school, and this evaluation seldom shows a decrease for
individual students
and never shows a decrease for the average over a group. The questions
involving proportional thinking do show some backsliding, but as many
students improve as backslide. I have only seen 1 student who
showed any
serious negative gain, and she was a foreign student with a language
problem. I have also seen many students show dramatic gain.

Yes, but do <you> have data taken after a time lapse?

As I said the evaluation for thinking does not show the normal sag of a
conventional test. It appears to be extremely stable. Yes, I have data
over several years for the same students, so it is indeed after a long time
lapse. Perhaps this was not obvious as I did not use the word same in the
previous message. The Lawson evaluation is NOT testing content knowledge,
but rather the ability to reason scientifically. I also have correlated
individual answers to questions and over time I have found that scores on
individual answers are also extremely stable with the exception of
proportional reasoning. Once scores increase they stay increased. I also
have some data for the Lawson evaluation given in the spring in Chemistry.
I have data for the same students next year in their math courses. The
students who are not taking physics generally show no gain on the Lawson
test, but the physics students show substantial gain. The non physics
students also do not show a decrease in Lawson scores on the average.
Individual students can show either loss or gain on the proportional
reasoning questions, but the gainers and losers balance. All other
questions are stable.


One standard deviation change does not mean a 30% chance that
it was due to
random fluctuations. The particular study was over about 600
students so
the error in the mean is quite small ( SD / sqrt(600) ).
Depends on which SD you're talking about. See above.

I would like to point out that improvement in content
understanding can be
extremely stable even years later when the correct teaching
techniques are
used. FCI results have been found to be very stable after an
interactive
engagement style course up to 3 years later. See: G. Francis,
J. Adams, E.
Noonan (1998). Do They Stay Fixed?, Physics Teacher, 36, 488-490.
If this is the Minnesota group, I believe that I critiqued that
paper in a previous posting (which I apparently did not preserve).

While it is easy to criticize a particular study, it is to my knowledge the
only long term study of gain on the FCI. None of the "traditional" teachers
have tried to do such a study. As such it is far superior to anecdotal
evidence. I have found that FCI and FMCE scores can show a slight drop when
repeated after a period of time, but I have not observed the sort of extreme
drop characterized by memorized knowledge. Rather than criticize, why not
do the experiment yourself and publish your results?

However,
I will also agree that when students rely on purely
memorization methods,
content material decays within about 2 weeks. Priscilla Laws
pointed out at
a workshop that they observe rising gains for about 2 weeks on
evaluations
after the concepts were presented using the type of labs that they have
found to be effective. This is consistent with Shayer&Adey's theory and
observations ("Really Raising Standards"). Incidentally my own testing
shows that after a 2 week Christmas break and several weeks of
a new topic
that FMCE results show only a small drop. I have also tried telling
students the answers to a question or 2 the day before an
evaluation, and
many still miss it. The non Newtonian distracters are often too hard to
overcome with a little prepping.
Isn't that what they're for?

Certainly, and one can sometimes prep them so well so that the scores rise,
but this effect is very transient, and generally goes away in about 2 weeks.


I gave them the answers as part of a test
review, but did not tell them that the answer I gave was
exactly the answer
to a question on the test.
Thereby defeating the principle the you are testing for
grasp of concepts?

No confirming that I am testing for grasp of concepts. The prepping had
little effect on their answers. This was done to specifically see if some
modest prepping had a significant effect on the scores for the type of
questions on the FMCE.



John M. Clement
Houston, TX



1. How did 1 SD get to be "very large"? And 1 SD of what? Usually in
educational circles people seem to quote the change in the mean of a
group. 1 SD means about a 30% chance that the change was a random
fluctuation. The physicists I know usually insist on 4 SD as the
threshold for evidence of a new effect. Also, these
statistical measures
can be utterly misleading if one does not know the distributions.

2. Much more interesting than the improvement at the end of a course
would be the results of testing after a summer, or even a
semester, break.
My experience is that the apparent "improvements" dissipate
very rapidly.

Regards,
Jack


On Sun, 23 Dec 2001, John Clement wrote (in part):

_________________________________________snip___________________________

Despite the fact that many students enter testing at the
concrete level, it
is possible to structure the course so that they improve
their thinking.
This is now being routinely done in the intro. biology course
at AzState.
Anton Lawson has been a prolific publisher of papers on the
subject, and has
been pushing this idea for years. One of the latest papers
published by one
of Lawson's collaborators mentioned that the course at
AzState pushed up
student thinking by about 1 STD, which is considered to be
a very large
increase. The article is Wycoff "Changing the culture of
Undergraduate
Science Teaching", Jour of Coll. Sci. Teach. XXX #5 pp306-312.

I can tell you the statistics for my course. About 30% of
the incoming
students are at the concrete level, but less than 15% are
still at that
level when they leave. Likewise only about 15% test as
formal thinkers
coming in, and about 30% test as formal thinkers going out.
One might
presume that courses which have a large number of physics,
engineering, or
pre-med students would have a high proportion of formal
thinkers at major
universities. I would guess 40-50% or more. While at
community colleges
the number would be more like 20%, and for physics for
poets courses or
courses with a large number of elementary ed majors the number
may be below
10%. This is purely conjecture because to my knowledge this
sort of survey
has not been systematically carried out. About 30% of the
students entering
the AzState general studies bio course tested at the formal
level, but the
vast majority were at that level at the end of the course (if I
read their
graph correctly in above mentioned paper). If we can push
thinking levels
up, and as a result have more success, shouldn't we do this?

__________________________________________________________________
__________

--
"But as much as I love and respect you, I will beat you and I
will kill
you, because that is what I must do. Tonight it is only you
and me, fish.
It is your strength against my intelligence. It is a
veritable potpourri
of metaphor, every nuance of which is fraught with meaning."
Greg Nagan from "The Old Man and the Sea" in
<The 5-MINUTE ILIAD and Other Classics>



--
"But as much as I love and respect you, I will beat you and I will kill
you, because that is what I must do. Tonight it is only you and me, fish.
It is your strength against my intelligence. It is a veritable potpourri
of metaphor, every nuance of which is fraught with meaning."
Greg Nagan from "The Old Man and the Sea" in
<The 5-MINUTE ILIAD and Other Classics>