Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] [**External**] Re: triangular induction puzzle



On 10/12/22 11:55 AM, Prof. Keith S. Taber via Phys-l wrote:
I thought the point being made was it can be seen to represent the basic nature of inductive sciences, in that there are always
multiple ways to explain any data set (although we might prefer those
which we think show simplicity, elegance, etc.)

Yes.

Consider a graph of a sine function where you have only the points
representing the maximum amplitude, or any set of points that are
multiples of the wavelength apart - the best guess would be a
straight line.)

That's true. And it gets even worse than that. Sine waves
are the poster child for worst-case overfitting.

Consider the data shown here:
https://www.av8n.com/physics/img48/vapnik-sin-pts.png

You can fit a sine to the data in the obvious way:
https://www.av8n.com/physics/img48/vapnik-sin-1.png

Or (!) you can do it in the less obvious way:
https://www.av8n.com/physics/img48/vapnik-sin-2.png

The latter is a tighter fit, in the least-squares sense.
That is to say, the residuals are much smaller. However,
it is almost certainly worse for any practical purpose.
Mostly it's just fitting to the noise. It is useless for
predicting the next data point.

This is because a sine wave has an infinite Vapnik-Chernovenkis
dimensionality, even though it has only three parameters
(amplitude, frequency, and phase). This stands in contrast
to a polynomial, where the VC dimensionality is equal to
the number of parameters. Your intuition about limiting the
number of parameters (Ockham's razor) fails miserably for
sine waves.

See discussion (and references) here:
https://www.av8n.com/physics/thinking.htm#sec-omit