Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] Question on data analysis



This topic -- fitting versus overfitting, bias versus
variance -- is reeeeally important. IMHO, for most
people this is more important than F=ma, insofar as
it is more directly relevant to ordinary life.

For example, it shows up in anything having to do
with teaching or learning: How many examples are
required for learning from examples?

As a related point, there have been huge arguments
involving intellectual heavyweights e.g. Chomsky
and Minsky about whether there was some innate,
unlearned, universal structure to human language.
I say that obviously a great deal of language is
learned ... but if there were not also a tremendous
amount of innate structure, nothing would be learnable.

At a more mundane level, this affects advertising
and marketing. The ads you see on e.g. netflix
are based on a model that learns from examples.
It uses enormous amounts of training data ... and
also a lot of hand-built structure; otherwise
nothing would be learnable.

Some folks look down their noses at advertising
and other mundane applications ... but keep in
mind that overall there is a huge amount of money
in mundane applications ... far more than the
entire NSF budget.

The same issue can be re-expressed in lawyer language:
How much circumstantial evidence constitutes a proof?
Most lawyers act like they don't believe there is a
definite answer to that question. I say it's a bit
hard to quantify, but in principle there is a solid
answer. It depends on the quality of the data *and*
on your a_priori assumptions aka biases. If you make
no assumptions, nothing is learnable and nothing is
provable.

Actually it turns out that more than a few lawyers
do know this ... they just don't let on. They have
a whole bag of tricks for manipulating jurors and
other people, based on the well-founded assumption
that most people are not very logical ... and lawyers
like it that way. They could give prospective jurors
a short course in how to be logical, but they don't.

Similarly, politicians like being able to manipulate
voters.

This is serious business, far more serious than some
of the other stuff that gets discussed in physics
class, such as monkey shooting. If I ever wanted
to shoot a monkey, I would shoot before he let go
of the branch, not after. When a student asks what
physics class is good for, a good teacher should
have a long list of answers. Curve fitting, as an
example of learning from examples, and of inference
in general, should be high on the list. There is
a bit of a leap from trend-line fitting to industrial-
strength statistical inference, but the connection
is real.

No sane person ever fits to an unrestricted polynomial.
Even if it could be done, it wouldn't make sense.
In contrast, it might make sense to fit to some
/low-order/ polynomial. That means you are implicitly
making a /smoothness/ assumption. By the same token,
fitting to an unrestricted Fourier series is grotesquely
underdetermined. If instead you choose a /low-order/
Fourier series, you are implicitly making another
smoothness assumption.

If you want more flexible -- and more explicit --
control over your assumptions, there are sophisticated
techniques for that, such as Tikhonov regularization.

============

The IB program has a specific course called Theory
of Knowledge. Studying this topic sounds great in
theory, but the devil is in the details, and many
of the IB ToK details are messed up. First of all,
it doesn't make sense to study ToK as a separate
course. In any sensible program, ToK would be
"baked into" all the courses, from Day One ... like
the oatmeal in oatmeal cookies. It's not something
you can sprinkle on afterwards. Offering it to
seniors, at the end of the program, is particularly
weird. If it were good for anything it would be
even better earlier, to help students learn as
they move through the program. Furthermore, the
typical IB ToK course is nearly 100% vaporware.
It's like going to an art museum and looking at
paintings; it's not going to make you an artist.
You need to spend some quality time working with
real paint; just talking about it doesn't suffice.