Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: Order being born from disorder?



At 09:05 AM 5/18/97 EDT, LUDWIK KOWALSKI wrote:
... suppose the probability distribution of a loaded die is known to be:

outcome #(r) --> 1 2 3 4 5 6
probability P(r) 0.1 0.1 0.2 0.3 0.3 0.0

In a particular test this die was thrown ten times and the distribution
of outcomes was
outcome #(r) --> 1 2 3 4 5 6
frequency 1 4 4 0 1 0

Then the test was repeated and the new casting distribution was:

outcome #(r) --> 1 2 3 4 5 6
frequency 1 1 2 3 2 1

In the spirit of his explanation, as I interpret it, the outcome of the
last test is highly ordered (close to the expected average of millions
of realizations) while the outcome of the first test is highly disordered
(very unusual).

How can I express this numerically in terms of Shanon's entropy? Which
logarithmic base is the most appropriate for this problem?
Ludwik Kowalski
************************************************************
Here are some other fragments from Dave's highly informative message:

The entropy of a distribution objectively measures (nonparametrically)
how "random" or "uncertain" the outcomes are (on the average) for samples
that are draw from the distribution. The entropy of a distribution is that
it is the average (minimal) amount of information necessary to exactly
determine which outcome obtains when a sample is drawn from the
distribution given only the information contained in the specification of
the distribution itself.

Let {P_r} represent a probability distribution where P_r means the
probability that the r_th outcome obtains when a sample is drawn. Here
r is a label that runs over each distinct (disjoint) possible outcome
and: SUM_r{P_r} = 1. Here SUM_r{...} means sum the quantity ... over all
values of the parameter r. The entropy of a distribution {P_r} is:
S = SUM_r{P_r * log(1/P_r)}. It is the expectation over the set of
outcomes of the logarithm of the reciprocal of the probability of each
outcome. In the special case that there are N outcomes and each outcome is
equally likely, (i.e. P_r = 1/N, r = 1,2,3 ...,N) then S = log(N). .....
...............................
..................................

On occasion it may be useful to make a distinction between the disorder
seen (at the *same* level of description as is used in some generic
entropy measure on some generic probability distribution) in a system and
the entropy for the distribution of outcomes for that (generic) system.
For such a generic system the entropy is the average (minimal) information
needed to exactly specify which outcome occurs when a sample is drawn from
the distribution; whereas the disorder of a given outcome can be defined as
the minimal information necessary to uniquely characterize or describe that
outcome. With this distinction, the entropy is a property of the
distribution, and the disorder is a property of the individual
realizations (outcomes).
...................................


I will begin in Bowman's spirit, as before, by computing
S = SUM_r{P_r* log(1/P_r)} for the loaded die probability distribution
given by Ludwik:

S = (0.1*log(10) +0.1*log(10) +0.2*log(5) +0.2*log(5) +0.3*log(3.3) )
= 0.1 + 0.1 + 0.2*0.70 + 0.2*0.70 + 0.3*0.52 = 0.64 (decimal digits)
which I expect is 2.1 bits

In order to describe the outcomes of two experimental series, Ludwik uses
six numbers of which five are sufficient to characterize his results.
Bowman tells us to concentrate on the MINIMAL description of
Ludwiks results.

I will take the liberty of using a similar measure of disorder (D) for
these two series of results:

D1 = (0.1*log(10) +0.4*log(2.5) + 0.4*log(2.5) + 0.1*log(10))
= 0.1 + 0.16 + 0.16 + 0.1 = 0.52 (decimal digits)
which I expect is 1.7 bits

D2 = (0.1*log(10) +0.1*log(10) +0.2*log(5) + 0.3*log(3.3) +
0.2*log(5) + 0.1*log(10))
= 0.1 +0.1 +0.14 +0.16 + 0.14 + 0.1 = 0.74 (decimal digits)
which I expect is 2.5 bits

I can suppose I am in error with this computation of D1 and D2 , for
it seems to indicate Ludwik's first series is MORE ordered than the
S computation leads us to expect ( D1=1.7 bits, S = 2.1 bits),
whereas the second series is less ordered (D2 = 2.5 bits,
S = 2.1 bits).

Regards
brian whatcott <inet@intellisys.net>
Altus OK