Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-L] Re: model vs. truth



My previous allusion to the "machine learning" field
may have seemed like a wild tangent ... but let me
explain why it is more relevant and useful than it
might have appeared.

There is a whole set of simple, powerful results
going by the name of "PAC learning", where PAC
stands for probably-almost-correct, and the key
idea is this: With probability (1-delta), for
delta as small as you like but greater than zero,
your predictions will be off by no more than
+- epsilon, for epsilon as small as you like but
not zero. So you see there are _two_ parameters
that describe the goodness of fit:
-- epsilon describes how much you can miss by, and
-- delta describes how improbable such a miss is.
In the name probably-almost-correct, delta quantifies
the "probably" and epsilon quantifies the "almost".

There are a few mild, reasonable restrictions on
this; for instance it is required that you have a
"training set" of data, so you know what to predict,
and then you can see how well those predictions do
against a "testing set" ... with the restriction that
the statistical process that generates the two sets
must be the same; you can't train on apples and test
on oranges.

You can google all sorts of information on this; I
found a tutorial at
http://www-2.cs.cmu.edu/~awm/tutorials/pac.html

The key result can be found therein on slide 17:
The amount of training data you need is
*) linear in the complexity of your model, and
*) inversely proportional to your miss-margin (epsilon)
... then once you have "enough" training data,
additional data improves the probability _exponentially_.

Being able to say correct things with exponentially
high confidence is a pretty good way to approach The
Truth.

These results are consistent with your experience and
intuition about needing "enough" data points in order
to fit a function with a given number of adjustable
parameters ... but these results formalize, quantify,
and very greatly extend the previous ways of looking
at the problem.

IMHO this is a tremendous advance ... you don't need
to ask the philosopher "what is Truth" and you don't
need to ask the lawyer "how much circumstantial evidence
constitutes a proof" ... you can make a prediction and
know in advance how good the prediction will be.

Note: The tutorial cited above discusses the case of
binary data and binary models, but the PAC learning
ideas can be further generalized to handle the analog
regime. The key idea here is the "Vapnik-Chernovenkis
dimensionality" which generalizes and replaces your
intuitive idea of "number of fitting parameters".
_______________________________________________
Phys-L mailing list
Phys-L@electron.physics.buffalo.edu
https://www.physics.buffalo.edu/mailman/listinfo/phys-l