Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] Question on data analysis



On 05/21/2015 10:07 AM, Philip Keller wrote:

my students did the lab where you
drop coffee filters to get terminal velocities.

Let's do a slightly different experiment. Rather
than dropping some number [1 .. 5] of filters,
let's drop a single filter from different heights
and measure the time.

We standardize on fixed heights [1 .. 5] decimeters.
At each height, everybody in the class makes a
measurement, and we average all the data for that
height. We then plot the class-average data (height
versus time) and do the fit.

Here is an example of what that looks like:
http://www.av8n.com/physics/img48/coffee-filter-drop-data-fit.png

As you can see, the two-parameter straight-line fit
is in excellent agreement with the data. In contrast,
the one-parameter straight line constrained to pass
through the origin is a lousy fit to these five
data points.

ON THE OTHER HAND ... let's think about the physics.

Suppose we are interested in the behavior at short
times. The two-parameter fit says ...
... If we drop the filter from some tiny but nonzero
height, it will just sit there for approximately
0.7 seconds before falling the rest of the way to
zero. This is bad.
... If we drop it at some given time and then look
at it an infinitesimal time later, the filter will
be about 7 cm higher than it was when we dropped it.
This is very, very bad.

The actual physics is this: The actual physics is
nonlinear! It takes the filter a while to accelerate
to terminal velocity. The actual trajectory is shown
here:
http://www.av8n.com/physics/img48/coffee-filter-drop-trajectory.png

The blue curve is the logarithm of the hyperbolic
cosine; reference:
http://isites.harvard.edu/fs/docs/icb.topic1216311.files/Project%20resources/Air%20drag/coffee%20filter2.pdf
The asymptote is a straight line, namely the terminal
velocity (aka drift velocity).

You can test the nonlinear model by directly observing
the short-time behavior. Video motion-capture may
help with this.

You can see that the observed data at multiple-
decimeter heights is not wrong; it just doesn't
tell us much about the short-term behavior.

Here's how to think about this:
a) Assuming that the data can be represented by a
straight line through the origin introduces
some bias.
b) Assuming that the data can be represented by a
straight line /not/ through the origin introduces
a different bias. Still bias, just different.
c) At long times, (b) is better than (a).
d) At short times, (a) is better than (b). It's
not great, but at least it's is in the right
ballpark. This suffices to prove that a
constrained model is not necessarily worse.

This proves that bias is not the same as misconduct.
There is *ALWAYS* bias. You *MUST* have a model, and
the model *MUST* incorporate lots of restrictions.
To say the same thing the other way, it is provably
impossible to model any finite amount of data using
an infinitely flexible model. This is discussed at
https://www.av8n.com/physics/data-analysis.htm#sec-20q
or equivalently
http://www.av8n.com/physics/data-analysis.htm#sec-20q

The proof is based on fundamental principles: entropy
and things like that.