Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] LSF Slope Versus Average of Single Point Slopes



On 05/07/2012 01:04 PM, Bill Nettles asked:
Is the LSF method developed based on the distribution of errors being
a Gaussian distribution?

Yes.

The thing you are minimizing is the _log improbability_. It works like this:

As always, the joint probability is the product of the pointwise probabilities
(provided they are independent ... which is not always a safe assumption, but
usually gets assumed anyway).

Hence the joint log improbability is the sum of the pointwise log improbabilities.

Hence the joint log improbability for a bunch of gaussians is a sum of squares.

Hence maximizing the joint probability corresponds to least squares.
That's all there is to it.

Note that the sum must be a /weighted/ sum. Each of the "squares" must be
weighted in accordance with that point's error bars, i.e. the width of that
point's gaussian. If the error bars are not all the same and you fail to take
this into account, the result is nonsense. You would think this would be
obvious based on the idea of maximizing the probability ... but amazingly
enough, I've seen grad students who think they can get away with an unweighted
sum of squares (even after I've warned them multiple times)........

It must be emphasized that the aforementioned probability is a likelihood,
i.e. the a_priori probability, i.e. the probability of the data given
the model ...... whereas what you almost certainly want is the a_posteriori
probability, i.e. the probability of the model given the data.

If the points are vectors, the notion of "square" needs to be made more
sophisticated, as does the notion of "weight". Let the points be X(1),
X(2), ... X(i), et cetera.

You might hope that the weighted sum might look like

E = Σ W(i) X(i) • X(i) ??? [1]

but no such luck. What you really need is the bilinear form

E = Σ X(i)_j W(i)_jk X(i)_j [2]

where W(i) is a symmetric matrix. It is /inversely/ related to the error bars,
in the sense that long error bars correspond to small eigenvalues in the W matrix.
Depending on what subfield you are working in, expression [2] is called the
-- objective function
-- loss
-- cost
-- regret
-- Mahalanobis distance
-- et cetera

Hypothetically speaking, if the W matrix happens to be a multiple of the identity
matrix, then [2] can be rewritten as [1] ... but I've never seen this happen in
the real world, not even close.

You can think of the bilinear form [2] as a "generalized square" if you want,
but at some point it is better to think of it as the negative log of a multi-
dimensional gaussian, keeping in mind that the gaussian is very likely to be
squashed and rotated relative to the axes of your vector space.