Re: [Phys-L] Tetrad excel equation
- From: John Denker <email@example.com>
- Date: Fri, 10 Nov 2017 21:38:54 -0700
On 11/10/2017 01:58 PM, Bill Norwood via Phys-l wrote:
> I need to build an equation by
> which correct answers and incorrect answers are tabulated from a sample set
> and a value given to the likelihood that a difference can be determined
> between two different items.
The question as stated is underspecified. Let me take a guess
as to what the intent was, and then outline a solution.
It seems to me there are /two/ samples involved. In the community
that uses tetrad tests, the samples are often called panels. I
will use the two terms interchangeably.
My guess is that score ("correct answers and incorrect answers")
from panel A is to be used to infer the amount of difference
between the two categories ("items"). Then that difference is
to be used to infer what the score from panel B will be, and
to determine how large panel B needs to be so that the score
will not have too much random scatter, when we compare one
instance of panel B to another (B1, B2, et cetera).
I suggest that we before we worry about the size of panel B,
we need to worry about the size of panel A, and there it is
not obvious that the two should be the same. They play
Please refer to the graph:
The black circles show the results of a panel of size 1000.
The abscissa is the true difference between categories, and
the ordinate is the tetrad score.
The magenta line is meant to indicate /roughly/ what the results
would be for a panel of infinite size. In statistic, this would
be called the /population/. A panel of size N corresponds to
drawing a /sample/ of N panelists from this population.
As you can see, there is a fair amount of scatter in the finite-panel
results. The scatter is even worse for smaller panels, as you can
If I understand the intent of the question, the first step is to
take the results of panel A and interpret them as an amount of
difference. If panel A has only 100 panelists, a score of 0.4
could plausibly correspond to any difference from 0 to 0.8 or
so. A score of 1.0 could plausibly correspond to any difference
between 3.3 or so and infinity.
The numbers in the previous paragraph come from reading the graph
inverse-wise, i.e. starting with an ordinate and finding the
Now for the second part of the job. If/when you have a value for
the difference, you can calculate the population tetrad score
(i.e. the magenta line). Then it becomes a relatively conventional
exercise in sampling theory to predict how much scatter there will
be when a sample (i.e. panel) of size N is drawn from this population.
I say "relatively" conventional because lots of people who ought
to know better routinely get this wrong. I get tired of seeing
public-opinion polls where one of the options is polling at 2%,
and the alleged margin of error is 4%. Gimme a break. Probabilities
can't be negative, so 2% ± 4% is obviously nonsense. It's easy
to guess what calculation they are doing to produce that number,
and it's the wrong calculation. For the next level of detail on
A spreadsheet was requested:
It's basically a Monte Carlo simulation of the panel, for a given
amount of difference. That is, the spreadsheet makes it easy to
compute /one/ point in the graph. You can collect data by changing
the amount of difference (in the yellow-highlighted cell) and
hitting the F9 ("recalculate") key to get a new set of random
numbers. Then save the results as an ordered pair (copy and
You could do that, but I'm way to lazy to do that much handwork,
so I wrote a perl program to loop over all difference-values.
The magenta curve is an ad_hoc phenomenological two-parameter fit
to the data. In other words, a kludge. It seems likely that some
statistician has derived an exact expression, but locating it is
more work than I feel like doing at the moment. For purposes of
understanding what's going on, and for estimating the required
sizes of the panels, the kludge is good enough.
As a sanity check, it is easy to prove that when the difference
is zero, the tetrad score is 1/3 (for the infinite population).
Also, obviously, when the difference is huge the score is 1.0.
Last but not least: The whole idea of testing an "unspecified
attribute" is just begging for trouble. People wouldn't bother
using methods like this when the difference is large, so it is
safe to assume that we are looking at small, ill-controlled,
higher-order correction terms. Assuming that there is only one
such term, i.e. assuming that the difference is one-dimensional,
seems like a very rash assumption. The items could subtly
differ as to size, shape, color, odor, shelf-life, and ten other
Now we are neck-deep in human factors issues, because depending
on the /framing/, different panelists will focus on different
attributes. This introduces horrific uncontrolled variables,
and invalidates all predictions made in the standard way using
tetrad tests and all similar unspecified-attribute tests.