Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-l] A challenge.



As I recall the process for assessing the value of questions and whether there is a positive correlation between performance on the overall test and on any given question is called items analysls. See, for example, http://www.statsoft.com/textbook/streliab.html, that I found by googling items analysis.

joe

Joseph J. Bellina, Jr. Ph.D.
Professor of Physics
Saint Mary's College
Notre Dame, IN 46556

On Nov 28, 2009, at 1:24 PM, chuck britton wrote:

A few lifetimes ago I was offered the opportunity (owing to my
supposed linguistic skills) to become a translator for the US Army. I
took my chances and choose to wait things out. Unbeknownst to me,
those responsible for the most ambitious multiple-choice testing
program known to humankind were arranging to keep me away from the
jungles of Viet Nam.

We were responsible for creating, and validating the Military
Occupational Specialty skills test for every enlisted soldier in the
army. During these times (late '60's - early '70's) the population of
Army test takers outstripped that of the ETS.

To get to the point at hand - yes, there are inevitably some items
that have reverse correlation with overall test scores. These were
called Non Functioning Items (NFI's) and received quite a bit of
special attention.
I worked closely with the Test Psychologists to make sure that I was
coming up with the proper statistics.
I actually wrote the code so that NFI's would be deleted during the
initial scoring runs but was told in no uncertain terms that this was
not to be. We needed to IDENTIFY the NFI's and the Test Psych folks
would decide which ones were 'valid' and which ones indeed needed to
be deleted from the final scoring run.

I found this adventure in mainframe programming (assembler and COBOL)
to be much more rewarding than my imagined Translator experiences
would have been (pushing folks out of helicopter doors until someone
decided to make up something to tell us that we might want to hear.)

NFI's exist - some are valid - some are not.


At 10:52 PM -0500 11/27/09, Dan Crowe wrote:
I think that Jack has missed the point of my concern. I'm not
concerned about the competitive nature of the test; I'm concerned
about the fairness of the test. There could very well be _valid_
questions that would tend to increase the scores of students who
otherwise would receive lower scores on the _biased_ test that
results when such questions are systematically rejected. Such
biased tests are less valid because they don't measure what they
purport to measure. The criteria for question rejection should not
reduce the validity of the test.

Daniel Crowe
On Fri, 27 Nov 2009, Dan Crowe wrote:

Is anyone else bothered by the policy that reinforces the "smart"
vs. "dumb" kids scores? I can imagine that there are valid
questions that otherwise higher-scoring students miss more
frequently that otherwise lower-scoring students. If such
questions exist, then systematically rejecting such questions
reduces the validity of the entire test.

Daniel Crowe
Items are developed by ETS and screened by the California Department
of Education and by its Assessment Review Panel. If that all goes
well, the item is field-tested. The item must perform well on a
series of psychometric measures. Most importantly, the item must be
neither too easy nor too hard, based on student performance on the
item. And the item must discriminate well. That is, students who
perform well on the test overall should perform well on the item. And
students who don't perform well on the test overall should perform
poorly on the item. There are always some items that "smart" kids get
wrong and "dumb" kids get right. Those items are rejected.

<snip>
_______________________________________________
Forum for Physics Educators
Phys-l@carnot.physics.buffalo.edu
https://carnot.physics.buffalo.edu/mailman/listinfo/phys-l