Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: LSF method vs. averaging



I like these comments posted by John and I appreciate the mini-review of
resources. Perhaps I'm reading too much into the original question, but this
has launched a batch file in my head that outputs a homily I frequently
deliver to my lab students.

There is another aspect of this situation that I think has not yet been
addressed and it is one that comes up frequently in my Introductory physics
labs.

First, an abstract set of numbers x, y, and y/x
x y y/x
1.000 3.000 3.000
2.000 4.000 2.000
3.000 5.000 1.667
4.000 6.000 1.500
5.000 7.000 1.400

If we average the values for y/x given above, we obtain 1.913. But if we fit
the data with a curve for y=mx+b, we get a slope of 1.000. The reason for the
discrepancy has nothing to do with error scatter or statistics, per se ---
especially since the data I provided is "perfect". It is due to the fact that
the averaging method has assumed an intercept of zero, whereas "truth" here
has the intercept being 2.000. In effect, it is as if we drew a straight line
from each data pont to the origin and then averaged the slopes of those
straight lines. In effect we are saying that we are willing to believe that
there is some error in our data (hence our noble inclinations to fix that
problem by doing something sexy, like taking an average) but that there is
some divine point at the origin which has no error and through which our
experimental truth must pass.

Now, statisticians will groan at the mere thought of extrapolating a fit
beyond the domain of the abscissa data, but we practical scientists often
feel the need to do so. (E.g., what is 0 K?) On a more practical side, the
extrapolated intercept can tell us some things about our experiment, such as
a bias or "offset".

Here are some cases where the biases have real, physical meanings. The dry
cell's internal resistance affects the ohmic response of a resistor network.
The mass of a spring affects the study of period as a function of m^-0.5 in a
bobbing mass-spring oscillator. If we don't teach students to pay as much
attention to the intercepts on their trendlines "handed down from Excel the
almighty" as they do to the slopes those provide, then we've taught them only
half of the physics --- or perhaps less than half. I see too many lab reports
where folks get wrapped up in deciding if their experimental slopes matched
the theoretical value and in which they totally ignore the extrapolated
interecept. It's especially grating when their experimental data happens to
provide a believable slope (never mind the variance of that slope value!) and
yet have an enormous intercept that gets totally ignored when claiming
success.

And of course, it's often the case when the intercept, on one axis or the
other, tells us something that would be neat to know. If I had a nickel for
every Photoelectric Effect lab report that discussed the students' personally
determined value for Plank's constant and totally disregarded the work
function value that fell into their laps....

Jim



On Wednesday 2004 March 10 20:59, you wrote:
Quoting Chuck Britton <britton@NCSSM.EDU>:
At 4:12 PM -0500 3/10/04, Bob Sciamanda wrote:
For a summary of the LSF method and the rationale for using squares vs
absolute values, go to:

http://mathworld.wolfram.com/LeastSquaresFitting.html

Interesting comments from Wolfram here.
They say that squares of the 'offsets' works better than absolute
values because the squared function is differentiable.

I am *not* impressed by that argument.

A more thoughtful rationale for using the squares (as opposed
to the absolute values) can be summarized in two words:
log probability.

The following set of assumptions suffices (and there are other
sets of assumptions that lead to similar conclusions):
-- assuming errors are IID (independent and identically distributed)
-- assuming the errors in the ordinate are additive Gaussian
white noise
-- assuming the errors in the abscissa are negligible

... then the probability of error on each point is exp(-error^2),
and the overall probability for a set of points is the product of
terms like that, so the overall log probability is the sum of
squares. Minimizing the sum of squares is maximizing the
probability. That is, you are finding parameters such that the
model, with those parameters, would have been maximally likely
to generate the observed data. So far so good. Most non-experts
are satisfied with this explanation.

There is, however, a fly in the ointment. The probability in
question is, alas, a likelihood, i.e. an _a priori_ probability,
and if you really care about the data you should almost certainly
be doing MAP (maximum _a posteriori_) rather than maximum
likelihood. That is, you want to macimize the probability of the
model *given* the data, not the probability of the data *given*
the model. Still, for typical high-school exercises maximum
likelihood should be good enough.

I would also like to remind people to avoid the book by
Bevington. The programs in the book are macabre object
lessons in how not to write software. And the other parts
of the book are not particularly good, either. The
_Numerical Recipes_ book by Press et al. has reasonable
programs. Gonick's book _The Cartoon Guide to Statistics_
is a very reasonable introduction to the subject; don't be
put off by the title.

--
James R. Frysinger
Lifetime Certified Advanced Metrication Specialist
Senior Member, IEEE

http://www.cofc.edu/~frysingj
frysingerj@cofc.edu
j.frysinger@ieee.org

Office:
Physics Lab Manager, Lecturer
Dept. of Physics and Astronomy
University/College of Charleston
66 George Street
Charleston, SC 29424
843.953.7644 (phone)
843.953.4824 (FAX)

Home:
10 Captiva Row
Charleston, SC 29407
843.225.0805