Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-l] A challenge.



A few lifetimes ago I was offered the opportunity (owing to my supposed linguistic skills) to become a translator for the US Army. I took my chances and choose to wait things out. Unbeknownst to me, those responsible for the most ambitious multiple-choice testing program known to humankind were arranging to keep me away from the jungles of Viet Nam.

We were responsible for creating, and validating the Military Occupational Specialty skills test for every enlisted soldier in the army. During these times (late '60's - early '70's) the population of Army test takers outstripped that of the ETS.

To get to the point at hand - yes, there are inevitably some items that have reverse correlation with overall test scores. These were called Non Functioning Items (NFI's) and received quite a bit of special attention.
I worked closely with the Test Psychologists to make sure that I was coming up with the proper statistics.
I actually wrote the code so that NFI's would be deleted during the initial scoring runs but was told in no uncertain terms that this was not to be. We needed to IDENTIFY the NFI's and the Test Psych folks would decide which ones were 'valid' and which ones indeed needed to be deleted from the final scoring run.

I found this adventure in mainframe programming (assembler and COBOL) to be much more rewarding than my imagined Translator experiences would have been (pushing folks out of helicopter doors until someone decided to make up something to tell us that we might want to hear.)

NFI's exist - some are valid - some are not.


At 10:52 PM -0500 11/27/09, Dan Crowe wrote:
I think that Jack has missed the point of my concern. I'm not concerned about the competitive nature of the test; I'm concerned about the fairness of the test. There could very well be _valid_ questions that would tend to increase the scores of students who otherwise would receive lower scores on the _biased_ test that results when such questions are systematically rejected. Such biased tests are less valid because they don't measure what they purport to measure. The criteria for question rejection should not reduce the validity of the test.

Daniel Crowe
On Fri, 27 Nov 2009, Dan Crowe wrote:

Is anyone else bothered by the policy that reinforces the "smart" vs. "dumb" kids scores? I can imagine that there are valid questions that otherwise higher-scoring students miss more frequently that otherwise lower-scoring students. If such questions exist, then systematically rejecting such questions reduces the validity of the entire test.

> Daniel Crowe
> Items are developed by ETS and screened by the California Department
of Education and by its Assessment Review Panel. If that all goes
well, the item is field-tested. The item must perform well on a
series of psychometric measures. Most importantly, the item must be
neither too easy nor too hard, based on student performance on the
item. And the item must discriminate well. That is, students who
perform well on the test overall should perform well on the item. And
students who don't perform well on the test overall should perform
poorly on the item. There are always some items that "smart" kids get
wrong and "dumb" kids get right. Those items are rejected.

> <snip>