Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] From a Math Prof (physics BS major) at my institution ( math challenge)



What surprised me is the differences in triple sequences, e.g., 20-21-22, etc. There were three in the 2nd set and none in the first (unless my eyes crossed at the wrong time). When I ran some random number examples (using Python's randint() function to select a number from a list), I routinely found triple sequences after sorting the five values. OTOH, two sets of 21 quintets out of ten sets had no triple sequences, so that's not even a fool-proof test. My guess is that non-statistically astute students would avoid triple sequences at all costs if they were attempting to be "random."

Anybody want to calculate the probability of getting at least 3 in a sequence, choosing 5 from 35? Doing at least 2 in sequence is enough for me right now (and I agree with the 48%).

-----Original Message-----
From: Phys-l [mailto:phys-l-bounces@phys-l.org] On Behalf Of John Denker
Sent: Tuesday, February 18, 2014 5:16 PM
To: Phys-L@Phys-L.org
Subject: Re: [Phys-L] From a Math Prof (physics BS major) at my institution (
math challenge)

On 02/18/2014 08:46 AM, Rauber, Joel wrote:
I didn't personally calculate it, but the Math Prof. told me that the
probability of consecutive numbers appearing on a truly random list is
48%, much higher than most people would guess.

I looked at two factors, the number of times consecutive numbers
appear -> leads to 2nd list is random The number of times numbers in
the range [30-35] appeared compared to the other decade ranges, which
also lends evidence that the second list was the random one.

That's very weak evidence.

In the first data set, the prevalence of consecutive pairs is about what one
would expect. In the second set, the prevalence of consecutive pairs is
/more/ than one would expect ... but this does not make the first set any
less random. It just tells you that fluctuations are huge ...
which is not surprising given the small size of the data sets.

As for the 30--35 range, the first data set has exactly the expected number of
hits in this range.

OTOH of course range is under-represented relative to the "other" decade
ranges ... but that's not surprising, since 30--35 is *not* a decade range ...
and more importantly, in the second data set this range is even /more/
under- represented. So by this criterion, the second set is less random.

So far, the only statistic that looks out of whack to me is the scarcity of
numbers ending in 0 (i.e. numbers equal to 0 mod 10) in the first set. OTOH
people have looked at a lot of statistics, and if you look at enough, sooner or
later you will find /something/ that is out of whack ...
even if the data is truly random.

=======================================

The lesson I take from this is the importance of proper experimental
controls. That includes double-blinding.
Anybody who knows the "right" answer here can easily find "factors" to
support that conclusion ... but doing it blind is not so easy.

Another type of essential control is what I call "closing the loop" i.e. cobbling
up some Monte Carlo data and feeding it through the analysis process, to see
how often the analysis screws up.

I haven't done the experiment, but I suspect that the two "factors"
suggested above, when applied to this student data and an /ensemble/ of
random data sets, would get the wrong answer about 50% of the time ...
quite possibly even worse than that, depending on implementation details ...
in other words, worse than random guessing.

Doing experiments on human subjects is very, very hard. It demands
fastidious, elaborate controls. Think of all the trouble that drug companies go
to when conducting field trials of a new drug ... double blinding, placebos,
randomized controls, long-term longitudinal studies, et cetera.

There's a simple reason why they bother with all that:
If they didn't, the results would not be reliable.

There is a lesson in this for the PER community and the education
bureaucracy in general. Testing a new book or a new teaching method is an
experiment on human subjects.
It is like a drug trial ... only much harder, because proper blinding is usually
impossible. Yet again and again, the effort that goes into such test is less
than the effort that goes into a drug trial. Less effort applied to a harder
problem. You know in your bones that the results cannot possibly be
reliable.
_______________________________________________
Forum for Physics Educators
Phys-l@phys-l.org
http://www.phys-l.org/mailman/listinfo/phys-l