Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Statistics; was science for all?



Let me address the controversy (see below) between the
conclusion of Jack U (~10%) and that of Brian W. (~1%).
Presumably both are based on the same fictitious data
invented by Brian. Therefore the conclusions should be
identical. But they are not.

Brian’s sample #1 (of 25 random numbers from presumably
Gaussian distribution) yielded the sample mean m1 =7 and
the sample standard deviation s1 = 3. Brian’s sample 2 (also
25 random numbers) yielded m2=10 and s2=3. Is the difference,
10-7=3, significant or is it coincidental? That is the question.
More specifically, what is the probability that the difference is
coincidental (due to random nature of data)?

Standard deviations of samples force us to use t rather than z
statistics. Why not? I agree with Brian that standard errors for
two samples are 7/sqrt(25)=1.4 and 10/sqrt(25)=2.0. The
expected standard error of the differences between m1 and m2
(assuming many samples of the same size were collected) ,
according to a textbook (David Moore, Standard Statistics), is:

SE=sqrt( s1^2/25 + S2^2/25) = sqrt (4/25 + 1.96/25)=0.488

This leads to the two-sample statistics t = (m1-m2)/SE=6.14.

That value is so large that the accidental nature of the observed
difference between the mean values is practically impossible.
If the above value were 3.5 (for 48 degrees of freedom) then I
would say that, according to the table of critical t values, the
probability of a chance coincidence would be 0.0005 For the
calculated value of 6.14 the probability much smaller; my
table does not go to probabilities smaller than 0.0005.

The conclusion is that the difference between 10 and 7 is
highly significant. It differs from both p=0.1 and p=0.01
reported by Jack and by Brian. I suppose we are talking about
different things, I am concerned with odds favoring the null
hypothesis.

If somebody asked me about the error bar to be placed around
the observed difference of 3 then I would say it is plus or minus
SE=0.488, or 16.2 %. This would be close to Jack’s conclusion.
By the way, the textbook states that this kind of analysis is
extremely "robust"; it is not very sensitive to the assumed
Gaussian nature of data. Any bell-shaped data, can be analyzed
as shown above. I suppose that this conclusion could easily be
tested by using computer simulations.
Ludwik Kowalski

**********************************************
Jack Uretsky wrote:
Hi all-Getting back to statistics:
Well, Brian may have been aware of his barbaric error when he
wrote "I casually assume the standard error..." The point is that
the sigma of the distribution is an unkown quantity, and one
should redo the t-test taking that fact into account. Using his
numbers I find that the probability that the two means are the
same is about 10% - somewhat less than the 30% of my previous
rough estimate, but far larger than Brian's1%.

The t-test for the difference of two means when the s.d. of the
underlying distribution is the unkown is described in Hogg & Craig,
Section 6.4. Of course, all this assumes that the samples consist
of normally distributed random variable, an assumption that is
almost certainly untrue.

On Mon, 31 Dec 2001, Brian Whatcott wrote:

At 08:54 AM 12/31/01, Jack Uretsky wrote:
On Wed, 26 Dec 2001, John Clement wrote:
_________________________________________snip_


Jack's criticism is so obviously wrong, in my view, that I progressed no
further.
I will now illustrate what I see as his error.

Clement has described a statistic, effect size, (M1 - M2) / SD where
a value 0.5 is an effective change.

To test his proposition,
I take twenty five samples from a pile and find the mean is 7
and the standard deviation is 3
I take twenty five samples from a processed pile and find the mean is 10
and the standard deviation is 3.

Am I justified in concluding the piles are significantly different using
Clement's statistic, effect size = 1??

I casually assume the standard error of the difference of the sample
means is sqrt[(3/5)^2 + (3/5)^2] = 0.849 or ~ 0.85
t statistic = (10-7)/0.85 = 3.53
Degrees of Freedom = 25 -1 + 25 -1 = 48
A table gives significance at 1% level for 48 D of F as 2.68

I conclude there is a significant difference at the 1% level.
Jack concludes there is a 30% chance this difference could have been
a chance effect.
I conclude that Jack is mistaking the statistics for an individual item for an
ensemble.