Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] statistics of measurements question



On 02/26/2013 03:05 PM, Aburr@aol.com wrote:
I have some statistical questions regarding means and standard deviation
(sd) on which I would appreciate answers and comments. A common way of
thinking about simple measurements such as time (t) for any operation is to
consider the population to be all possible t measurements from which you take a
random sample using the mean and sd of the sample to infer the mean and sd
of the population. All this predicated on the population being a gaussian
distribution. Yet it is clear that the population is not gaussian (if for
no other reason than that there can be no negative times.) Does one just
ignore the very small tail of the gaussian here?

That's one of those simple questions with a complicated answer ...
or perhaps ten different answers, depending on as-yet unspecified
details. Let me say a few things that /might/ be helpful:

1) First and foremost, keep in mind that a very great deal of what
goes on in science consists of building /models/ to explain whatever
is going on.

I mention that because "standard deviation" might be relevant to
some part of the model ... but standard deviation is not a model
unto itself! You need a lot more than that.


2) At a much more prosaic level: Not all models are Gaussian. You
could, for instance, have a model that says the data has a /flat/
distribution, /evenly/ distributed over some interval from A to B.

Note that the standard deviation is perfectly well defined for
distributions other than Gaussian distributions. It is an easy
exercise to calculate the standard deviation for the aforementioned
flat distribution. It is related to |A-B| but it is not quite the
same thing.


3) In *some* cases you can have a Gaussian model that predicts that
it is vanishingly unlikely that the observed time will be negative,
in which case using the model does not violate the constraint and/or
the intuition that the times cannot be negative.


4) At the opposite extreme, I can easily imagine situations where
the observed times *can* be negative ... for instance if two
independent observers are timestamping the events and the elapsed
time is calculated by subtracting the somewhat-noisy timestamps.


5) It may be that the start-timestamp and end-timestamp exhibit
nontrivial /correlations/ ... in which case the whole concept of
standard deviation goes out the window. The standard deviation
is the square root of the variance ... but when there are N
variables there is in general an N by N covariance matrix.
Neglecting the off-diagonal elements is a verrry bad idea.


6) Last but not least, it is fairly well known that the standard
deviation of the /sample/ is perfectly well defined, but is a
terrible estimator of the standard deviation of the underlying
/population/. For starters, it is a biased estimator, and even
if we take steps to remove the bias it is still quite a noisy
estimator.

=================

Many of these issues are discussed at
http://www.av8n.com/physics/uncertainty.htm
and
http://www.av8n.com/physics/probability-intro.htm

=================

If that doesn't answer the question, please re-ask the question.
Often there are answers to specific practical problems even
when the generic philosophical problem is unanswerable.