Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-L] validity ... or ... be careful what you test for, you might get it



I borrow from
http://benmuse.typepad.com/ben_muse/2005/01/be_careful_what_1.html

Robert Massie explains:

"...There is no evidence that Indefatigable, Queen Mary, and
Invincible blew up because German 11-inch or 12-inch shells
penetrated their armored hulls and burst inside their magazines.
Rather, the almost certain cause of these cataclysmic explosions was
that the turret systems of British battle cruisers lacked adequate
flashtight arrangements and that, in each of these ships, a shell
bursting inside the upper turret had ignited powder waiting to be
loaded into the guns, sending a bolt of flame flashing unimpeded down
the sixty-foot hoist into the powder magazines. Assuming this to be
true, blame lay not with the design of British ships but with the
deliberate decision by captains and gunnery officers to discard the
flashproof scuttles originally built into British dreadnaughts. The
Royal Navy made a cult of gunnery. To win peacetime gunnery
competitions, gun crews were encouraged to fire as rapidly as
possible. Quick loading and firing required a constant supply of
ammunition at the breech of the gun, and thus a continuous flow of
powder bags moving out of the magazines and up the hoists to the
guns. Safety became secondary; gunnery officers began leaving
magazine doors and scuttles open to facilitate movement; eventually,
in some ships, these cumbersome barriers were removed. But for this
weakness none of the three battle cruisers might have been lost."

This is relevant to the notion of "validity" of a test.
http://en.wikipedia.org/wiki/Test_validity

The peacetime rapid-fire test is completely "valid" if you think the
purpose is to predict the outcome of another peacetime rapid-fire test
... but not if the purpose is to protect your ships, your crews, and
your nation from disaster.

The same logic applies to the current crop of rapid-fire multiple-guess
trivia tests. Such a test is completely "valid" if the purpose is to
predict the outcome of another rapid-fire multiple-guess trivia test ...
but not if the purpose is to protect your students and your nation from
disaster.

This applies to the big state-mandated tests, and also to things like
the FCI.

I'm not picking on the FCI for being brief and narrow; it's intended
to be brief and narrow. Instead, I'm pointing out that it's insane to
use "FCI gain" as a measure of the effectiveness of this-or-that teaching
method. It's like bragging about how well you can look under a lamp-post.
It may be the most finely-made lamp-post in the whole world, but it's
still just a lamp-post ... in the middle of Yellowstone in the daytime.
I suggest there are other areas you might want to look at.

As one example among many:

On 06/28/2012 12:43 PM, chuck britton wrote:
I believe that I can design two 'flotation devices' (boat/barge) and
could have EITHER answer be correct.

And that's why I liked it ... and why I said it is not a "clicker"
question. Converting it to a closed-book rapid-fire question would
ruin it. In the real world, people deal with underspecified questions
by digging up additional information, but this is not allowed in the
typical "standardized" testing situation or "clicker" situation.

It is common for people (including me) to complain about multiple-guess
questions, so the question arises, should we consider the barge+slab
question to be a yes-or-no question? Well, maybe or maybe not, depending
on how you look at it:
-- In its original context, it was a well-posed yes-or-no question.
-- In the form that it comes to us, stripped of the contextual details,
it does not have a yes-or-no answer, since the only reasonable response
is "Hey wait a minute, we need more information."
-- On the third hand, the proper plan of attack is to dig up enough
information so as to _make_ a yes-or-no answer possible.

From this we conclude that the closed-book rapid-fire format is a big
part of the problem. It magnifies the problems created by the multiple-
guess format.

I do /not/ consider the barge+slab question to be a trick question. In its
original context, it was a perfectly reasonable question: Would jettisoning
this-here slab make things better or worse?

In the real world, it is common for questions that are well-posed in one
context to become ill-posed in another context. If you *assumed* that all
questions in this world were well posed, then you might think this was a
nasty tricky unfair question ... but I insist the fault here is with such
assumptions, not with the question that was asked.

It would be unfair to grade students on how well they handle ill-posed
questions, if they have never been taught how to do it ... so the obvious
approach is pre-test, teach, and then post-test. That will measure the
"gain" with respect to something that actually matters.

Additional examples of underspecified problems include
http://www.av8n.com/physics/ill-posed.htm#sec-snap-string
and
http://www.av8n.com/physics/ill-posed.htm#sec-yo-yo

These have the advantage of being so simple that you can do them on the
first day of class, to make it clear that this class is dramatically
different from other classes ... that actual thinking will be required.