Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] peculiar data; was: uncertainty



On 01/19/2015 11:48 AM, Paul Lulai wrote:
This activity if we include the 164.xx value in our measurements,
then the result would be that our 3 crank method for measuring the
mass of an object in lab would have an uncertainty of roughly 20
grams. Really?

Here's how I think of it:

Suppose some guy
a) Assumes that the result "must" be reported in the
form of mean ± std_deviation, and
b) Keeps all the data, including outliers.

Then that guy has a problem. However, my point is that the
problem is with (a), not (b). As previously discussed, a
lot of distributions in this world are non-Gaussian.

This is one of the big reasons why I recommend graphical
methods as opposed to mechanical mathematical methods.
*) Graphically, you can draw a non-Gaussian distribution.
In this case, can draw a /mixture/ of a Gaussian plus
an outlier. In other cases you can draw a two-hump
"camel" distribution, or a multi-spike "stegosaurus"
distribution if necessary.
*) Mechanically, if you are committed to grinding out
the mean and standard_deviation -- and nothing else --
your options are much more limited.

This may be easier to visualize in two dimensions than
one. Here is some data that would not be well fit by
one Gaussian. It would be better fit by two, and even
better by three:
http://tinyurl.com/kfxebll

Remember, whenever you are talking about uncertainty,
you are talking about a /distribution/ ... not a
number, but a distribution. There is no such thing
as an uncertain number.
++ If it's a number, it's not uncertain.
++ If it's uncertain, it's not a number.
++ You can have a distribution over numbers, in which
case the uncertainty is in the distribution ... not
in any particular number that may be drawn from the
distribution.

If you imagine the answer to be a "number" (x) or a "number"
(x±y) then you have lost the game before you begin. It
is better to think of the answer as a distribution. If
the distribution is not a Gaussian, that's no big deal.
If it's a Gaussian plus outliers, you can use mechanical
mathematical methods to describe the Gaussian part ...
but the outliers are still part of /the/ distribution.

Tangential but important remark: This experiment does
not crank three times. If anything, it cranks 75 times.
This is important because with only three samples it
would be
-- dramatically more difficult to decide what is
an outlier and what is just a coincidence.
-- hard to decide whether they fit a Gaussian or not.
-- impossible to use a two-Gaussian mixture model.