Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-L] maximum likelihood versus minimum cost



Consider the polynomial

P(x) = a_0 + a_1 x + a_2 x^2 + ... a_N x^N [1]

For all N greater than 1, this is a nonlinear function of x ...
but for all N, it depends *linearly* on the coefficients a_i.
Therefore I call this an LCBF i.e. a linear combination of
basis functions. Similarly, consider the Fourier sine series

F(x) = a_1 sin(ω t) + a_2 sin(2 ω t) + .... [2]

which is nonlinear function of t but a *linear* function of the
coefficients a_i. This is an LCBF /provided/ the fundamental
frequency ω is considered known in advance. If ω is promoted
to being one of the parameters, then equation [2] is emphatically
not an LCBF.

As another example, exponential decay (as in radioactivity or
in a damped harmonic oscillator) or exponential growth (as in
bacteria) is not an LCBF, in situations where the point of the
experiment is to determine the rate-constant k:

A(t) = A(0) exp(k t) [3]

I mention this because on 05/08/2012 02:51 PM, Donald Polvani wrote:

1) No errors in the x_i.
2) The y errors are uncorrelated with the x_i.
2) The y errors are zero mean and have equal variances.

Given assumptions 1 - 3, the Gauss-Markov theorem says that the least
squares estimate with be the best linear unbiased estimator.

The point is that there is a fourth requirement, namely that your
fitting function must be an LCBF to begin with. Otherwise you
don't have a linear estimator at all, let alone a "best" linear
estimator.

As you can see from the three examples given above, in physics
we see lots of things that are LCBFs ... and also lots of things
that aren't. Some additional issues and examples are discussed
below.

==============

Back on 05/07/2012 02:28 PM, the question was:

Is the LSF method developed based on the distribution of errors being
a Gaussian distribution?

and alas I answered "yes". As others have politely pointed out, my
answer was not entirely correct. Assuming a Gaussian distribution
is /sufficient/ to justify the least-squares approach, but there
are other conditions that can also lead to least squares, notably
the special type of linear regression covered by the Gauss-Markov
theorem as mentioned above.

I don't want to make excuses for being wrong, but rather to warn
people that assuming the problem is nonlinear is vastly safer than
assuming it is linear.

Linear regression was recently described as "usual" and "ordinary".

In some narrow sense I suppose it is "ordinary" ... in the sense
that polynomials are "ordinary" ... but "ordinary" does not mean
all-purpose. In general, it is not safe to assume that the
fitting function is an LCBF (linear combination of basis functions)
or that the error bars meet the conditions demanded by the Gauss-
Markov theorem.

For instance, in any situation involving counting events, where
Poisson statistics apply, the error bars will not be uniform.

Here's another example, an important real-world example. Suppose
you are in charge of flying an airliner from Honolulu to San Francisco.
Because of variable headwinds and such, there is some /uncertainty/
about the amount of fuel required. The maximum-likelihood solution
makes it equally likely that you will underestimate the amount of
fuel or overestimate the amount of fuel.

If you carry one sigma too much fuel, the cost(*) is hundreds of
dollars per trip. If you carry one sigma too little fuel, the cost
is (at least) billions of dollars. Customers get very cranky if
you dump them in the ocean.

In fact, pilots routinely carry more than six sigma more fuel than
the maximum likelihood analysis would suggest. The situation is
summarized in the following graph:
http://www.av8n.com/physics/img48/fuel-uncertainty.png

Note:
(*) Carrying extra fuel costs money. I'm not talking about the
capital cost, which you get back at the end of the trip. I'm
talking about irreversible losses associated with carrying the
extra weight around.

This should convince you that there are lots of important situations
where you do *not* want anything resembling the least-squares solution,
or indeed any kind of maximum-likelihood solution. Maximum likelihood
minimizes the /probability/ of making mistakes, but you really ought
to be minimizing the /cost/ of making mistakes. It is common for the
cost function to be exceedingly lopsided.

To repeat: Maximum likelihood minimizes the /probability/ of
making mistakes, but you really ought to be minimizing the /cost/
of making mistakes.

Here's yet another real-world example: A couple of weeks ago we
had a big discussion of data communication and modulation schemes
including phase modulation and QAM codeword constellations. Decoding
such things is basically a curve-fitting parameter-adjusting task.
It is wildly nonlinear.

Also: Fitting a spectrum to any kind of theoretical lineshape is
nonlinear.


In situations where linear regression works, it works just fine ...
but please do not become too enamored of it. There are plenty of
real-world situations where linear regression would be spectacularly
wrong. Even if you do it accurately, it's still wrong, because it
is the answer to the wrong question.

Perhaps the strongest argument is this: The software routines
that can handle nonlinear curve fitting can also handle the
linear case with ease. Therefore IMHO the simplest approach is
to learn how to use the nonlinear routine, and use it for everything.