Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-l] data analysis : pendulum period versus length



On 05/11/2011 04:29 AM, Josh Gates wrote:

If I roll a pair of dice and observe five dots, then x_i=5 with no
uncertainty. That works with dice, but how about the measurement of
the period of a pendulum?

The dice and the pendulum are the same in principle. The
mathematics is slightly different for a discrete distribution
(dice spots) versus a continuous distribution (pendulum period)
... but only slightly. And you can simplify things by making
the pendulum measurement discrete: It might make a lot of
sense to measure the period with a digital stopwatch. The
data might look like:
x_1 = 1.235 (seconds)
x_2 = 1.244
x_3 = 1.253
x_4 = 1.238

Each observation is discrete. It is just as digital as the
number of spots on the dice. The same goes for measuring
the length: by the time you have written down the observed
length as a decimal number (or as a multiple of 1/16th of an
inch) it is a discrete observation. This is true whether or
not you think the underlying physical time and distance are
quantized.

If you managed to compute the time/sqrt(distance) ratio using
an analog computer, we would need to have a different conversation,
but I doubt that is relevant to the question that was asked.

Also notice that in my data tabulated above, the scatter is
huge compared to the quantization error, so the details of
how the quantization was done don't matter. This is how it
should be. The rule is, use plenty of guard digits. This
is just one of the ways in which the usual "significant figures"
rules are profoundly pernicious: They /require/ you to keep
rounding off until roundoff error is the dominant contribution
to the uncertainty ... whereas in a well-designed experiment
and/or calculation, roundoff error is never the dominant
contribution.
http://www.av8n.com/physics/uncertainty.htm

Is there a _simple_ measure that you'd suggest for characterizing the
overlap of the distributions? I can get them as far as conceptually
'getting' that distributions that don't overlap aren't likely the
same, and those with significant overlap are, but I haven't yet found
a _simple_ measure that will allow them to compare (at least, not one
that I'm super happy with: I've had them look for overlap within 1
sigma of the mean, which was easy to do, but not necessarily on good
footing. ...or maybe it is? C onceptually, I'm OK with it, but I'd
like a comparison measure without doing T tests or similar).

If you don't want to use a mathematical test _or even if you do_
you should treat the pendulum as a multi-dimensional statistical
problem. As always, the first step should be to look at the data.
Here is a plot of some Monte Carlo data:
http://www.av8n.com/physics/img48/pendulum-model.png

You see there is some scatter in the observations of the length,
and some scatter in the observations of the period. There is
also a model (shown in red) that tells us the theoretical
relationship of period versus length. The real question IMHO
is how well the data fits the model. More precisely, the
question is whether
a) the data fits the model /within the scatter/ which
means there is no sign of systematic error, or
b) there is evidence for some systematic error.

In the example that I plotted out, it looks like might be some
systematic error. It is unlikely that any systematic error in
the timing would go unnoticed for long, so we suspect there
might be some systematic error in the way the length is being
measured. OTOH there is some chance that there is no systematic
error and the discrepancy we see is just a fluke, and we can't
rule this out without more data; see below (*) for more
on this.

To summarize: sometimes people overlook the basics:

Simplicio: The apparatus isn't working.
Salvatio: Did you try plugging it in?

Simplicio: I don't know how to analyze this data.
Salvatio: Did you try plotting it out and looking at it?

At the not-quite-basic level we have:
Did you try building a Monte Carlo model so you
can see what the data /should/ look like, with
and without systematic error?

My plot does in fact show data from a Monte Carlo model. It
is built using a gnumeric spreadsheet. I can sit there and
repeatedly re-run the simulation by hitting F9. (*) That's
how I know what can be considered a fluke and what can't.

There's uncertainty in the measurement, as well as uncertainty in the
distribution of the expected value of the measurement

Note that the terminology is ambiguous here. Actually it's
worse than that, because _ambi-_ means two, and there are
more than two interpretations here:
-- The uncertainty of a single observation (which is zero).
-- The scatter, i.e. the empirical width of the /set/ of
observations.
-- The width of the underlying distribution from which the
observations were drawn. This may include contributions
from systematic error, which don't show up in the scatter.

As before, I emphasize that there is *no* uncertainty in
any single observation x_i. On the first observation, the
reading on the stopwatch was 1.235 and I am absolutely
certain about that. There will be some scatter in a /set/
of observations, but that is a property of the set, not a
property of any particular element of the set.

=================

The /general/ problem of how to compare two probability
distributions is very tricky. I know about ten schemes for
doing that. None of them are simple. And the choice of
which one to use is very sensitive to the details of what
you are comparing and what you intend to do with the answer.

If at all possible, you are better off reformulating the
question so it doesn't look like a generalized comparison
between two distributions. In the pendulum example, the
reformulated question asks whether there is evidence for
systematic error. This question is nontrivial but not nearly
so bad as the open-ended comparison question.