Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-l] Basic statistics



Ludwik wrote:

What should be called a population when students measure
gravitational acceleration? ...

That's quite a deep question.

On 11/10/2006 07:38 AM, Folkerts, Timothy J replied:

A statistician would, I believe, say that the "population" would be the
set of all _potential_ experiments done under a particular set of
conditions. If you could indefinitely repeat the same experiment, then
you would uncover the population and know its true parameters. Since we
don't have patience to repeat indefinitely (and eventually equipment or
other conditions will change), we instead perform a limited number of
experiments and treat these as a representative sample of all possible
repetitions.

This doesn't mean that more repetitions will necessarily get you closer
to the "true" value - just closer to the expected results from the
particular equipment used in a particular situation.

I agree. I was about to write something similar.

I would add that there are various different ways of thinking about
probability. Asking about "the population" is not the most general way
of phrasing the question. The "statistician's approach" is not the
most general approach.
-- I strongly feel that *probability measure* is the most fundamental
and most powerful approach. Whenever somebody asks a deep question,
I immediately reformulate it in terms of measure theory. That is
usually easy, and usually clears things up considerably.
-- Representing the probability measure as a "population" is a
reasonably decent approach. If you start with this approach and
generalize it the right way, it winds up almost equivalent to the
measure-theory approach.
-- Sometimes the probability can be represented by a histogram, or
by a probability distribution function.
-- Any probability measure can be represented by a disk, as in
http://www.av8n.com/physics/img48/gaussian-disk.png
which shows the example of a Gaussian. In general in such a disk
/area/ is the measure: imagine areas on a dart-board. However,
in simple cases like this we divide the disk into sectors, whereupon
circumference represents the same thing as area, and is perhaps
easier to visualize. The disk as a whole obviously, represents 100%
probability. You can immediately see that about 2/3rds of the
measure lies between -sigma and +sigma. To draw samples from such
a probability, you can convert the disk into a "wheel of fortune"
by adding a spinner, as shown in blue.

You can see that I take the measure-theory approach quite seriously.

I claim the "disk" picture is slightly more general than the
"population" picture. That's because every population I've ever
seen is finite. In my mind's eye, I can leap from finite to
countably infinite, by imagining a reeeally big population. But
the disk easily represents the continuum, with /uncountably/ many
possibilities. With enough work, you can create a population by
drawing lots of samples using the disk+spinner, but why bother?
The disk itself is a better representation, without the extra
work.

Minor point: suppose in the "g" experiment there are 3 independent
contributions to the uncertainty: perhaps some uncertainty in
the mass, some uncertainty in the length, and some uncertainty
in the time. In theory you could represent all the uncertainty by
a single disk, but it is sometimes more convenient to factor the
distribution, and represent it by three disks. (If the contributions
had not been independent, factorization would not have been possible.)

Discrete probabilities can be easily represented by coloring in
finite-sized sectors of the disk.

I guess the point of this msg is that if you're going to ask deep
questions, the deepest question is not exactly "what is the population"
but rather "what is the probability measure" that best describes a
given situation.


==================

We are at a point where three worlds collide.

Physicists have their own way of understanding probability, developed
within the physics community over the course of hundreds of years.

Meanwhile the mathematicians have developed their own way of thinking
about probability. Mathematicians /define/ probability in terms of
measure theory.

Meanwhile the statistics community has developed their own techniques
and their own terminology, largely separate from the physics
approach and from the physics approach, with their own terminology
and jargon.


physics
probability
/ \
/ \
/ \
/ \
math \
probability ________ statistics



A physicist can get to the mathematical foundations by going clockwise
through statistics, but that is the long way around. The typical
intro-level statistics book is mostly about practical "engineering"
techniques, i.e. standard solutions to standard problems; it is not
a good place to look for answers to deep, fundamental questions.


====================

Loosely speaking a /measure/ is:
-- a mapping from sets to numbers;
-- non-negative;
-- additive on the countable union of disjoint sets.
http://mathworld.wolfram.com/Measure.html

A /probability measure/ is a measure with the further property that
it is
-- bounded above.
http://mathworld.wolfram.com/ProbabilityMeasure.html

That's it. I'm not kidding. That's all you need to formalize the
foundations of probability and statistics.

Note: It is /required/ that the probability measure be bounded,
but it is merely /conventional/ to arrange that the probability
measure be bounded by _unity_. This convention is not always
helpful. One of the niftiest things I ever invented did billions
of calculations on unnormalized probabilities.

There is no chance I would have been able to invent this using
anything other than the measure-theory approach, and no chance
of learning this approach from the ordinary statistics literature.

Other approaches, including the "Bayesian" approach, the "frequentist"
approach, et cetera can be understood and classified in terms of
the measure-theory approach.

=================

When teaching this, you have to plan ahead, because of the principle
that learning proceeds from the known to the unknown ... and because
ye typical students arrive knowing even less about measure theory
than they know about probability. However, IMHO, a little bit of
time spent introducing measure theory is time well spent. It's not
that complicated, and it has lots of applications in probability
and elsewhere.

-- a mapping from sets to numbers;
-- non-negative;
-- additive on the countable union of disjoint sets;
-- bounded above.

For sure by the time that people start asking foundational questions
such as "what is the population" it is high time to crank up the
measure theory.

As for resources, I don't have the books in front of me, but I think
Apostol's freshman-level _Calculus_ touches on this while keeping it
simple, and his _Analysis_ revisits it with even more formality and
shows some of the many applications.