Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] Natural laws from experimental data



On 8/15/21 12:18 AM, Antti Savinainen via Phys-l wrote:

I suppose this is common knowledge in some circles, but I just found out
about this when preparing a lesson on technology, science & knowledge:
https://youtu.be/MSo6eeDsFlE

I find it astonishing that deep laws of physics (such as Hamiltonians)
could be found from data using symbolic regression algorithms, with no
predetermined physics laws. The paper was published in Science (2009).

Your astonishment and skepticism are well placed. There are
layers to the work. The prosaic layers are OK, namely finding
equations by regression. However (!) the flashier claims are
quite wrong; in particular the repeated claim that the system
learned with "no prior knowledge". This is provably wrong.
Anybody in the machine learning business should know it's
wrong, and know quantitatively why it's wrong.

Let's start with some basic notions of information and entropy,
as illustrated by the 20 questions game, as discussed here:
https://www.av8n.com/physics/twenty-questions.htm

Of particular relevance today is this section:
https://www.av8n.com/physics/twenty-questions.htm#sec-mutations

which illustrates what happens if you try to search in a search
space that's too big.

A search with "no prior knowledge" corresponds to a completely
unconstrained search, i.e. infinite entropy. That's a big problem,
and no finite amount of incoming information will solve the problem.

Closely related to entropy is the Vapnik-Chernovenkis ("VC")
dimensionality. A simple example of this, applied to curve-fitting,
is presented here:
https://www.av8n.com/physics/thinking.htm#sec-omit

You may think you can tell at a glance how to best fit a sine wave
to the data, but you might be in for a surprise. This is because
a sine wave (unlike, say, a polynomial) has infinite VC dimensionality.
https://www.av8n.com/physics/thinking.htm#sec-omit

See also the references to "PAC learning" cited therein. This is
a particularly nifty and relevant application of VC dimensionality.

Another example of entropy/information balance is the 12 coins puzzle.
https://www.av8n.com/physics/twelve-coins.htm

Yet again the ironclad conclusion is that an unconstrained search
corresponds to infinite entropy. No finite amount of incoming
information is going to solve the problem.

I guarantee you that *every* machine-learning effort starts by
constraining the search space. I've seen plenty of charlatans
and mountebanks who conceal the constraints, and claim they can
learn with "no prior knowledge" ... but anybody who knows anything
about the topic just rolls their eyes at that.

Machine learning can do some impressive things. Been there, done
that. However, learning with "no prior knowledge" is not one of
them.