Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] From a Math Prof (physics BS major) at my institution ( math challenge)



On 02/17/2014 11:09 AM, Richard Tarara wrote:
Second list is the real one. There are a number of consecutive
number strings (or close to consecutive) there which people will
religiously avoid when making up random numbers.

We agree that's the first thing one should check for.

However, I'm not seeing much of that. The religion seems to
have faded quite a bit.

Specifically, I calculated the differences between successive
entries on each row, and histogrammed the results for the two
data sets. The just don't look all that different, by this
measure:

http://www.av8n.com/physics/img48/human-poisson.png

I would have expected it to look "mostly" like a Poisson distribution.

Tangent: It's not really a Poisson distribution, because the zero-
difference case is verrrry under-represented in both data sets.
Presumably there was a random draw /without/ replacement. This
is not terrible, but it complicates the math.

================

I went on to calculate all the differences (not just neighboring
differences) for each row, modulo 35, and histogrammed the resuls.
This has the advantage of generating more numbers, so the statistics
are better. Also it is independent of any sorting. Also it has the
advantage that the expected answer is a horizontal straight line, so
the interpretation is simpler.

http://www.av8n.com/physics/img48/human-random.png

The first data set seems to deviate more from expectations ... but
not enormously so. If the students are avoiding something, I would
say they are avoiding /long runs/ of consecutive small differences.

Still, though, the avoidance is pretty weak. Not much of a religion.

If I felt like pursuing this further, I would do a row-by-row
analysis, on the theory that even if some students are random,
others are not.

On 02/17/2014 10:27 AM, Paul Nord wrote:

In the second set, the student-generated numbers, numbers in the 20’s are much preferred.

But are you sure it's not just a fluctuation? By way of
contrast, in the first data set, multiples of 5 are seriously
under-represented.

========================

More to the point, statistical testing like this is a one-sided
proposition. As Dykstra was fond of saying: Testing can demonstrate
the presence of bugs. It can never prove the absence of bugs.

So, it may be that one (or both) of these data sets is grotesquely
non-random. It could be a stacked deck ... but we haven't been
clever enough to detect the pattern, yet.

This is related to the exceedingly fundamental point that there is
no such thing as a random number.
-- If it's a number, it's not random.
-- If it's random, it's not a number.
-- You can have a random distribution over numbers, but then
the randomness is in the distribution, not in any particular
number that may have been drawn from the distribution.

Again: If the distribution is sufficiently non-random, you can
detect that ... but the converse does not hold. Failure to detect
a pattern does *not* mean that there is no pattern. Maybe you
just didn't find it yet.

The patterns that have been found so far are not strong enough for
me to confidently declare either distribution to be non-random.
People are really good at imagining patterns when none really
exists. It could be that we're being punked and both distributions
are random, and all we are seeing is fluctuations. Or maybe there
is some pattern that we haven't been able to capture.