Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-l] data analysis (was independent variables)



On 10/31/2006 02:32 PM, David Strasburger-fac wrote:

(1) Is there still an important role for log-log paper, or logarithmic
graphs?

Yes. Absolutely. All day, every day.

I want my students to have appropriate tools to examine and interpret data
from experiments in which they have an idea of how things should turn out,
as well as situations in which they don't know what sort of relationship
to expect. One of the things I was taught to do was to throw data onto
log-log paper and see if it makes a line. If so, find slope.

That's true, but definitely not the whole story; see below

My students ask me: Why do this, when I can just test out a curve fit of
form y=Ax^B?

That's a good question, and indeed the latter is often worth
doing instead of -- or in addition to -- the log plot.

And there are additional possibilities.

One key issue is visibility. Exponential functions can extend over
many orders of magnitude, so in many cases if you don't use a
log plot or something similar, much of the data will be off-scale
and/or indistinguishable from zero (even if the difference from
zero is highly meaningful).

Another issue has to do with the form of the noise. Once upon a
time I had some data where the pressure was an exponential function
of time, and the raw data included quite a bit of /additive/ noise
intrinsic to the pressure gauge I had built. (The max pressure
was 10^-9 atmospheres, and the noise was less than 10^-13 atm per
root Hertz, so I'm not gonna apologize for the noise.) Because
of the noise, late in the decay the raw data could easily include
negative, values so taking the logarithm was obviously a non-starter.
More to the point, taking logarithms, even when the data was
positive, distorted the error bars -- exaggerating the downward
errors while de-emphasizing the upward error bars -- which would
have introduced serious errors into any attempt to fit to the
logified data.

To make a long story short, we curve-fit to the raw data. We
then plotted the data (and the fit) on log axes in the early
region where that made sense, so the many orders of magnitude
would fit on the paper, and we also plotted the data (and the
fit) on plain axes in the later region. We also subtracted
the fit from the data and plotted the /residuals/ on plain
axes.


(2) How is linearizing data different or more useful from curve-fitting it?

I notice that the AP test is big on linearizing data. I only mention this
technique in class so that students won't be surprised by this if it comes
up on the test. Am I doing wrong by my kids?

Curve-fitting is quantitative.

Linearizing is qualitative. Humans can eyeball a deviation from a
straight line more easily than a deviation from some weird curve.

Linearization is not the objective; it is sometimes a useful means
to the objective, which is to make the structure of the data easily
perceptible. For example, I once had a program that mis-calculated
the electron density in an s-orbital. I plotted out the density as
a scatter plot, and the error was obvious because the plot did not
have the expected rotational symmetry; it had "horns" with fourfold
symmetry. This has nothing to do with linearization and everything
to do with making the structure perceptible.


Suppose we divided Bob's students into groups and asked them to look at
the sand crater experiment he describes. Here's what they're going to do:
a. graph diameter^4 against height.
b. graph log(d) vs. log(h).
c. graph d vs. h and do a fit to Bd^A and ask for best values of B and A.

I would start with (b). It will tell you what exponent to use in (a).
Then I would do (a), because it is more sensitive than (b); that is,
if it's not a simple power law this will become more apparent with (a)
than with (b). I would then do (c) and plot the data (with the fit)
in various ways, including ways other than (a) or (b).

Will they come to different conclusions about crater-formation? Will they
learn different lessons about data analysis?

Any conclusion based on an incomplete analysis is likely to be
wrong. The key lesson is that a proper analysis uses many
methods. All the students should learn this.


(3) Is there a significant advantage (or disadvantage) to designing an
experiment so that the value of interest may be extracted from a contrived
linear relationship?

Measuring a linear relationship is nice work if you can get it.
More usually, the raw data is grossly nonlinear, and there is
some combination of additive noise, multiplicative noise, and
who-knows-what else. That's why people work hard at data analysis


example:
I wanted to teach a lesson on uncertainty last week. I walked into class
and said: "This is a speed lab. The school has been moved to another
planet where rents are lower. In order for physics research to continue,
we need to know the value of g in our new home. The Provisional Planetary
Governor wants an answer in ten minutes." No other instructions.
I set out atwood machine parts, meter sticks, stopwatches. Ten minutes
gave them about enough time to get set up and drop weights maybe eight or
ten times. Later we discussed why not everyone's results agreed, and what
level of accuracy the Provisional Planetry Council should ascribe to these
values.

Here's the question:

Most students measured the distance a weight dropped from rest, the time
for the fall, and the two masses. In this "speed lab" setting, everyone
decided to compute g based on the data for each run, then take the
average, excluding values they "didn't like."

What if instead we had tinkered with the mass values in the lab, then made
a graph of (acceleration) vs. (m-m/m+m) and looked at the slope? Is this
value for g somehow a better way of aggregating the data? worse? the same?
I suspect that this linear regression makes the extreme values more
significant than the other values...

That is not a sufficiently detailed description. I suspect this
is a curve-fitting situation where there is considerable noise
on both the ordinate *and* the abscissa. Doing this right is
kinda tricky.

(4) What IS an appropriate sequence to suggest for students faced with
data when they have no initial hypothesis about how to model it?

The following answer sounds smart-alecky, but is actually provably
correct and meaningful in a practical sense: If they really have
*no* idea about how to limit the hypotheses to be considered, there
are provably infinitely many hypotheses that will fit the data with
perfect accuracy ... most of which have no physical significance
and no predictive power. Simply put, it is absolutely necessary
to have /some/ idea what you're looking for.

For example, if the data came from simple physics-lab apparatus,
it is likely that the ordinate will be some fairly smooth function
of the abscissa.

Left to their own devices, my students will go through all the suggested
curve fits in LoggerPro looking for the one with the lowest RMSE. I tell
them this is silly - that they should have some other criteria for what's
a good model.
-Does model predict behavior at extremes? (eg does your model tell you
that a cart has zero acceleration on a flat ramp and a=g vertically?)

Anything you can do to constrain the model (at the extremes or otherwise)
will improve the quality of the results. (As an example of a non-extreme
constraint, consider the application of a /sum rule/.)

Understanding the physics involved helps a lot ... but sometimes you
simply don't understand the physics, so you are stuck with a wacky
empirical model. For example, the resistance of an Allen-Bradley
carbon-comp resistor at low temperatures is exponential in the square
root of the inverse temperature. That's surprisingly accurate over many
orders of magnitude. Maybe somebody understands where that comes from,
but I sure don't, and I've never been sufficiently motivated to find out.
The point is that I needed some functional form to use for calibrating,
extrapolating, and interpolating the resistors I was using as thermometers,
and exp(sqrt(1/t)) served just fine for that purpose.

The question arises, how wacky is too wacky? Students are not born
knowing where to draw the line.

Theory says it's a tradeoff involving how much data you have, how
precise the data is, and how many wacky hypotheses you need to consider.
If there is too little data and/or too many hypotheses, the predictive
power of the model will be, on average, provably terrible. In the other
extreme, where you have lots of good data and relatively few candidate
hypotheses, you can reliably identify the best hypothesis, and the
predictive power will be, on average, very good. All this can be
quantified using Vapnik-Chervonenkis dimensionality and related ideas.
One name for this general like of inquiry is PAC learning (where PAC
stands for Probably Almost Correct).

You may find it reassuring that the fancy theory agrees, in the limit,
with things you already know. In particular, if all the hypotheses
are polynomials with degree N (so there are at most N+1 adjustable
parameters), then the VC dimensionality is just N+1. Also, if there
is a collection of M hypotheses, with no adjustable parameters other
than "selector" that chooses which hypothesis, then the VC dimensionality
is just log(M).

"Overfitting" the data is bad. Overfitting the data by looking through
a too-long list of hypotheses is just as bad as overfitting by using
a too-high-degree polynomial.

Related important concepts include training-set error and testing-set
error. The latter quantifies the predictive power of the model.
Overfitting makes the training-set error smaller (which superficially
/seems/ good), but makes the testing-set error larger (which is bad,
really and truly bad).

Overfitting shifts the so-called bias/variance tradeoff in the direction
of more bias and less variance (training-set variance, that is).