Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] From a Math Prof (physics BS major) at my institution ( math challenge)



On 02/20/2014 06:49 AM, Jeffrey Schnick wrote:
I wouldn't call it a law but I was operating on the hypothesis that
in trying to cook up a random distribution of numbers people tend to
shy away from nice round numbers like 10, 20, and 30.

That might be a good rule, or it might not. I did some googling
https://www.google.com/search?q=%22digital+analysis%22+%22round+numbers%22

and found things like this:
http://www.theiia.org/ITAuditarchive/index.cfm?act=itaudit.archive&fid=95

which says: "Auditors generally direct their attention to [...] round
numbers that have occurred abnormally often [...]"

I also found this:
http://faculty.som.yale.edu/jakethomas/papers/unusual.pdf

which says that profit data tends to become more rounded and loss data
tends to become less rounded, for unsurprising reasons. This is not
directly relevant, but still it serves as a warning that the true
behavior might be more complex than always-this or never-that.

If you want to see some actual data that looks like it was cooked
up, and the associated forensic analysis, try this:
http://arxiv.org/pdf/1311.5517.pdf

=================

In situations like this, I would strongly recommend comparing the
data to the null hypothesis.

In our case, the null hypothesis is that the data set is random,
either because it is the "official" random set, or because the
students did a good job of randomizing their contributions.

I wrote a few lines of brainless code to do the Monte Carlo ...
a million data sets, 105 million data points. It tells me that
in a random data set, one expects to see numbers of the form
zero mod ten occurring 8 or 9 times in each set of 105 points.
(No, it's not 10.5 times; can you see why?)

At the next level of detail, in an ensemble of random sets, one
expects to see exactly 2 occurrences in just over 0.27% of the sets.

In a courtroom, I would not call this proof. I would not call it
probable cause. I would not even call it reasonable suspicion.
Since people on this list have looked at data set #1 in lots of
different ways, it is a near certainty that somebody will have
found /something/ at this level of significance, due to fluctuations
alone.

If you look at a random collection of stars long enough, you
will see a crab, a lion, a virgin, and lots of other stuff.

If we had more data I would be singing a different song. Note
that poisson(2,9) is not as convincing as poisson(20,90) ... by
about 16 orders of magnitude.