Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-l] fake data (teachable moment)



On 04/03/2008 08:44 AM, Rick Tarara wrote:

http://www.clemson.edu/scies/wind/Poster-Schmidt.pdf

Slide 11 above shows some typical power curves

Typical, but fake.

many others can be found on-line.

Yeah, but almost all of them are fake data.

You can use this in your class to explain why there is a
policy of near-zero tolerance for fake data.

If you turn in something to me with fake data, it had better
be *unmistakably* labeled as "artist's impression" or some
such, and there had better be a darn good reason why you
couldn't use real data.

I've heard all the lame excuses. Take a number for faster service.
1) I was in a hurry.
2) The fake data /looks/ better. It will sell better in the
marketplace.
3) I intended to emphasize just this one aspect of the data.
4) What's the harm?
5) Everybody does it.
6) Powerpoint won't plot data, and excel won't let me annotate
the graph with circles and arrows.

My rejoinders include:

4) It causes a great deal of harm. It's like a smash-and-grab
burglary at the jewelry story. It profits you by the value of
the necklace you grabbed, but it's less than a zero-sum game,
because the jeweler is out the value of the necklace, *and* out
several thousand dollars for replacing the smashed window.

2) Your used car will /look/ better in the marketplace if you
subtract 50,000 miles from the odometer. But again this is
less than a zero-sum game, because your unfair profit is the
buyer's unfair loss ... and the whole process is destructive
to commerce in general. It throws sand in the gears. It
makes it harder for anybody to buy anything.

2,4) Similarly the guys at Worldcom and MCI decided that their
annual reports would /look/ better if they published fake data
about assets, liabilities, et cetera. This was like a smash-
and-grab burglary on a grand scale, because for every hundred
million dollars they stole, they caused a billion dollars in
collateral damage, screwing up the entire industry.

1) So your time is so much more valuable than the time of
everybody who's going to look at this report?

5) I tried that line on my mother when I was five years old.
It didn't work. I never tried it again.

6) As the ancient proverb says, it is a poor workman who blames
his tools. I don't have any trouble annotating /my/ plots.

3) Your intended use for the data is not the only possible use
for the data. It is entirely foreseeable that somebody, sooner
or later, will be interested in some other aspect of the data
... for instance checking if wind power really is cubic in
airspeed. If they trust your data, they will come to the wrong
conclusion.

The victims are going to be really, really mad at /you/ for
foisting fake data on them.

===========================

As an constructive example of what scientific integrity looks like,
in _The Feynman Lectures on Physics_ he works out a theoretical
prediction for the magnetization of iron, using the mean-field
model. He plotted the model data.

My point is that he plotted the model on top of some actual data.
Actual data! The model agrees over most of the range, but there
is a perceptible discrepancy close to the critical point. He even
pointed out the discrepancy! He published this ten years before
anybody had a clue about the physics behind the discrepancy
(renormalization group).