Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-l] Basic statistics



On Nov 9, 2006, at 11:35 AM, John Denker wrote:

On 11/09/2006 06:24 AM, Ludwik Kowalski wrote:
.... Should the
error bar be 1.2 or should it be 0.4,

It depends.

The usual advice is to say what you mean, and mean what you
say ... but that is sometimes hard to do, because the standard
terminology in this area is a mess.

In the following I will use my own colorful but nonstandard
terminology.

There are *three* different probabilities in play:
a) the statistics of the underlying distribution from which samples
are drawn.
b) the statistics of the individual measurements ("atoms")
c) the statistics of clusters ("molecules") containing 9 atomic
measurements.

In http://blake.montclair.edu/~kowalskil/basicstat.html
it would be helpful to clearly distinguish the three different
concepts.

In any case:

a) The underlying distribution has some mean μ and some standard
deviation σ, which we might never know exactly.
b) We can estimate μ based on one atomic measurement.
c) We can estimate μ based on a cluster of 9 measurements.
d) We cannot estimate σ from a single atomic measurement.
e) We can estimate σ from a multi-atom cluster.

Here's the crux of the matter: Given enough measurements, our
estimate μ' will be a rather precise estimate of μ. This process
is known as signal averaging. The uncertainty in μ' is denoted
Δμ' and might be markedly less than σ, the standard deviation of
the underlying distribution.

Among other things, that means:
-- If we increase the number of measurements, there should not
be any systematic drift in μ' i.e. our estimate of μ. (It may
wander around randomly, but should not systematically drift.)
-- If we increase the number of measurements, there should not
be any systematic drift in σ' i.e. our estimate of σ.
-- If we increase the number of measurements, there will be a
systematic decrease in Δμ' i.e. the uncertainty of our estimate
of μ.

Therefore, the bottom line is that you have a choice:
*) If you choose to consider μ to be the object of interest, then
you should report μ' ± Δμ', which tells people how well you have
estimated μ.
*) If you choose to consider the underlying distribution to be
the object of interest, then you should report μ' ± σ'.

In the context of elementary physics labs the situation is simplified because we assume that true values do not fluctuate. Distributions are assumed to be due to random errors, which mean they are Gaussian. The question "1.2 or 0.4 ?" was posed in that context. Both answers cannot possibly be correct.

I modifying the unit to account for what Timothy wrote. It amounts to replacing all s by S=s/sqr(n). That is the easiest way to make changes without going through corrections. S is usually called "standard deviation of the mean" while s is "standard deviation in a sample." Confusion probably result from abbreviations of long names. I decided to call s "standard deviation" and to invent a distinct short name for the standard deviation of the mean. Atomicity (symbol A) would be fine. But perhaps there is better word. Please help.

Ludwik Kowalski
Let the perfect not be the enemy of the good.