Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: A paradox? Why not?



brian whatcott wrote:


A question that has not been voiced:
are these results significant?
Could they have happened by chance?

Hmm, let me try to address this. Here are
the data again. The added col 5 shows the
"error bars" for the percentages in col 4,
as explained below.

***************************************
Data for col 2 col 3 col 4 col 5
airline A on-time delayed % del +/-

Los Angeles 497 62 11.1 1.4
Phoenix 221 12 5.2 1.5
San Diego 212 20 8.6 1.9
San Francisco 503 102 16.9 1.7
Seattle 1841 305 14.2 0.8
***************************************

Data for col 2 col 3 col 4 col 5
airline B on-time delayed % del +/-

Los Angeles 694 117 14.4 1.3
Phoenix 4840 415 7.9 0.4
San Diego 383 65 14.5 1.8
San Francisco 320 129 28.7 1.6
Seattle 201 61 23.3 3.0
***************************************

1) Let us start with the claim that "the airline B
wins over A on combined cities". This is backed by
the 13.3% of delayed flights for the airline A (501
out of 3775) versus 10.9% of such delays (787
out of 6438) for the airline B. Can the difference
of 2.4% (13.3% - 10.9%) be due to chance?

In the interpretation of a G.M. output we would
say that the standard deviation on 501 is 22.3
[sqr(501)]. This can be translated into the 0.6%
of uncertainty about 13.3%. Likewise, we would
say that the standard deviation on 787 is 28.0
[sqr(787)]. This translates into the 0.4% of
uncertainty about the 10.9% figure.

The difference of 2.4% could be due to chance
but the probability is very small. Therefore, at
some level of confidence, I am willing to accept
the better rating of the airline B. Is the sqr()
rule acceptable? Yes, if the delayed departures
are independent random events occurring at
constant probabilities for each airline in each
airport.

2) The opposite claim, "airline A wins over B",
based on airport-by-airport analysis, is more
shaky. This is illustrated in columns 5. Each
value in col 5 is the bar error (one standard
deviation) on the corresponding number in
col 4. They were obtained in the same way as
above. [For example, sqr(62) is about 8 or
13% of the delayed departures. But 13% of
11.1% is 1.4%, as in col 5, etc. etc.]

One may reasonably argue that not all individual
airport differences are significant. But we have
five comparisons leading to the same conclusion.
This makes the opposite claim more significant. A
formal statistical "two samples" analysis, mentioned
by Jack, would probably confirm the validity of the
claim [rejecting the null hypothesis], but this
remains to be done.

By not accepting the conclusion of the airport-
by-airport analysis one would miss a strong
hint about the existence of a hidden variable.
How many discoveries were missed in that
way? And how many "discoveries" turned out
to be irreproducible in light of better data?
Another kind of the uncertainty principle?

E. Rutherford was well aware of this dilemma.
His position was to reject discoveries whose
justification leaned too much on statistical
analysis. Collecting more data, however, is
not always possible in practice.

Ludwik Kowalski