Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] From a Math Prof (physics BS major) at my institution ( math challenge)



On 02/22/2014 09:53 AM, Rauber, Joel wrote:

I just chatted briefly with the math class, and he commented that one
of the student sequences (I haven't personally checked this) had one
sequence that was all prime numbers, for which the odds is
exceedingly low in a truly random sample.

What's your definition of "extremely low"?

My friend Monte Carlo thinks that an all-prime row will
occur in about 3% of the sets in a random ensemble. So
I reckon that seeing such a thing is very weak evidence,
weaker even than some of the previous observations and
suggestions.

Given that people have looked at this data in N different
ways, it is a virtual certainty that there will be /some/
anomaly found, at this level of significance, just on the
basis of random fluctuations.

He thinks that student misunderstood the assignment

I reckon the only thing that makes the identification task
possible is the hope that one or more of the students is
going to screw up ... perhaps through misunderstanding the
assignment, or intentional sabotage, or whatever.

I thought the whole point is that the state lottery's
hardware RNG was rather less likely to screw up, less
likely to not carry out the assignment.

=================

My advice remains the same:

a) You really ought to test each hypothesis against some
sort of control. As the proverb says: When all else
fails, measure it. It doesn't pay to guess about whether
the probability is "exceedingly low" or not.

b) It is /sometimes/ possible to draw reliable statistical
inferences, but it requires better evidence than this.
Either there needs to be a more blatant screw-up, or we
need vastly more data.

To say the same thing another way: In this data, the
signal-to-noise ratio is really lousy. We need either
a bigger signal or less noise.

c) There's no such thing as a random number. You can have
a random /distribution/ over numbers, but then the randomness
is in the distribution, not in any particular number that
might have been drawn from the distribution.

A RNG is a random generator of numbers, not a generator of
random numbers.

As a corollary: If you want to validate a RNG, you don't
do it by running statistical tests on the output. That's
guaranteed to be a fool's errand. Instead you look at
the /mechanism/ by which the numbers are generated, and
validate the randomness of the mechanism.

If a RNG is really, really broken you can detect that
by looking at the outputs, but the converse does not
hold. You cannot tell a good RNG from a mediocre RNG
from a subtly-subverted RNG by looking at the outputs.

In other words, this whole thread has very dubious pedagogical
value. Mostly it encourages people to leap to conclusions
where angels fear to tread. Specifically, the procedure of
looking and looking until you find an anomaly is practically
the definition of a witch hunt. It is infamously unreliable.