Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-L] maximum likelihood --> maximum a posteriori



On 05/04/2012 05:05 PM, I wrote:
I thought that the LSF technique would produce the optimum result.

Which it does.

Gaaaack!!!! I can't believe I wrote that.

Least-squares is better than simple averaging ... but it's by no
means "optimal". It almost *never* produces the optimal result
in the context of data reduction (or pattern recognition).

Yeah, I know it's what "everybody" gets taught ... but that
doesn't make it right. I own dozens and dozens of scholarly
books that tout maximum likelihood as the "obviously" correct
procedure ... but still it's not right.

Specifically: Least-squares fitting is a _maximum likelihood_
technique. In this context, likelihood is a very specific,
highly technical term. Basically it denotes the probability
of the data, given the model:
P(data | model) == likelihood
== a priori probability

What you almost certainly want is the probability of the model
given the data, since you are in fact given the data:
P(model | data) == a posteriori probability

Once upon a time, there was a year for which the general consensus
was that the most important thing I had done all year was to tell
some guys to switch from maximum likelihood to maximum a posteriori.
It made a huge difference.

The fact that you can "sometimes" get away with maximum likelihood
methods makes it all the more deceptive. (If least squares fitting
never worked, nobody would use it, and nobody would ever get fooled.)

Roughly speaking, here are some conditions under which I would expect
to get away with least-squares fitting: The data points are all of
the same general kind, and each parameter of the model treats every
data point on the same footing, and there is only one way in which
the model can explain the data.

To say the same thing the other way, for example, if there are multiple
minima in the objective function (i.e. multiple "leasts" in the least-
squares fitting) then this is one of the ways in which you can get
very badly fooled by maximum likelihood methods.

If you feel obliged to teach maximum-likelihood methods to your
students, please at least warn them that this is not the only
game in town, and it is absolutely *not* the gold standard.

http://www.google.com/search?q=%22maximum+a+posteriori%22+%22curve+fitting%22