Chronology |
Current Month |
Current Thread |
Current Date |

[Year List] [Month List (current year)] | [Date Index] [Thread Index] | [Thread Prev] [Thread Next] | [Date Prev] [Date Next] |

*From*: John Denker <jsd@av8n.com>*Date*: Mon, 02 Sep 2013 18:54:24 -0700

On 09/02/2013 12:07 PM, Brian Blais wrote:

... wondering what you think of E. T. Jaynes' approach to Bayesian

inference. He does not make use of set-theoretic definitions, but

in my reading of him, he does seem to admit that these have

identical consequences in applications.

1) do you agree?

In general I don't like terms like "Bayesian" or "Darwinian".

By way of analogy: I make good use of Newton's laws, but does

that make me a "Newtonian"? I hope not. Am I required to accept

everything that has been said by Newton, or about Newton? I hope

not.

As previously mentioned: "Bayesian" means different things to

different people. Some people consider Jaynes to be an orthodox

Bayesian, and some don't ... and I don't care. That's because

I like to pick and choose, on an idea-by-idea basis, from any

school of thought. I latch onto the good ideas and ignore the

rest.

Set theory is powerful enough to include everything you want to

do. Conventional statistics can be understood as a subset, as

a corollary. Interestingly, the converse is not true, AFAICT.

That is to say, the set-theory approach is *not* a subset of the

other approaches. This is covered by the following questions:

2) do you find some *quantitative* improvement using the

set-theoretic definitions. I mean, is there an actual problem where

one method works and the other not.

3) is there some *practical* improvement using the set-theoretic

definitions. I mean, are there problems that are much easier to

solve, even if both methods yield the same result in the end?

The answer to (2) and (3) is the same.

Once upon a time, I was at Bell Labs, working on a pattern recognition

problem, using a machine-learning approach. Note that speech synthesis

and speech recognition were a core activity of Bell Labs, and had been

all along, ever since the days of Aleck Bell. There were dozens of guys

working on this. Seriously smart guys. They knew literally every trick

in the book, and indeed several of them had /written/ books on the subject.

My boss's boss's boss was one of the leaders in this field.

The effort included very fundamental research and very applied development.

The development guys had a huuuge software system that was doing OCR with

a 2% error rate, which was the same as people could do on the same data

set, so this was considered quite an achievement.

There were also a bunch of seriously smart guys at various other institutions,

working on similar problems.

I don't want to go into details, but one symptom was expressed by John

von Neumann, who was not the village idiot: "With four adjustable

parameters I can fit an elephant, and with five I can wiggle his tail."

Then word got out that my buddies and I were fitting 100,000 adjustable

parameters, with good results.

As a separate matter, there was an issue with "maximum likelihood"

learning in general, including curve fitting in particular. Many

standard texts assumed this was the right thing to do ... sometimes

tacitly, sometimes in a brief footnote. Those who suspected it was

not the right thing to do did it anyway! That's because the alternatives

were considered impossible or ridiculously impractical.

Then word got out that I had a scheme to learn maximum_a_posteriori (MAP)

not maximum likelihood. This is P(a|b) instead of P(b|a). The statistics

research guys did not believe this was possible. The development guys were

skeptical, but after much inveigling and cajoling they tried my idea, and

the error rate went down from 2% to 0.2%.

It may be that these solutions "could" have been found using old-school

Bayesian and/or frequentist methods ... or maybe not. I don't know. I

know for sure that a bunch of smart, highly-motivated people tried for

many years without success. Also, there is a good reason why you would

/expect/ the set-theoretic approach to work better. The punch line is

that machine learning should be considered a search through the space

of all possible probability measures ... so it really helps to use an

approach where the measure itself is a central focus of attention. To

say the same thing the other way, if you have an approach that focuses

attention on "the" truth and/or "the" state of nature, you are never

going to discover the solution, not in a million years. Indeed, I

have told this story to world-famous statisticians who not only didn't

discover the solution, but didn't even believe it after I told them

about it.

Note: It's always good for your career when your boss's boss's boss

is wildly impressed. He knew exactly how hard these problems were,

because it was his area of expertise.

**Follow-Ups**:**Re: [Phys-L] Bayesian statisitics***From:*Brian Blais <bblais@bryant.edu>

**Re: [Phys-L] Bayesian statisitics***From:*brian whatcott <betwys1@sbcglobal.net>

**References**:**[Phys-L] Bayesian statisitics***From:*Dan Crowe <Dan.Crowe@lcps.org>

**Re: [Phys-L] Bayesian statisitics***From:*John Denker <jsd@av8n.com>

**Re: [Phys-L] Bayesian statisitics***From:*Brian Blais <bblais@bryant.edu>

- Prev by Date:
**Re: [Phys-L] Bayesian statisitics** - Next by Date:
**Re: [Phys-L] Bayesian statisitics** - Previous by thread:
**Re: [Phys-L] Bayesian statisitics** - Next by thread:
**Re: [Phys-L] Bayesian statisitics** - Index(es):