Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-L] peculiar data; was: uncertainty



I've not been able to check the list for a while. Just saw this thread.

-- Fudging the data to remove peculiar-looking points ... that's serious misconduct.

I think that may depend on the situation. Remember what was being done.

We had ~75 data points. of them, 74 were between 185.xxg - 187.xxg. One value was at 164.xxg. We discussed the range of values, what might have produced such a range, what proper procedure is for recording the data (with all students measuring the same mass on the same triple beam balance). When we analyzed our data, we did see but did not include the 164.xx value in our range used to determine the precision of measurements with a triple beam balance. If a student submits 34,851.4g, do you suggest we keep that point too? In this case? Measuring a washer on a triple beam balance? I find that to be an extreme choice. You can take this mole hill and make it a mountain. But I disagree with you on this one.
This activity if we include the 164.xx value in our measurements, then the result would be that our 3 crank method for measuring the mass of an object in lab would have an uncertainty of roughly 20 grams. Really? If the student enters 16414g, then our uncertainty in all values measured by the triple beam balance are +/- 16,229g because someone slid decimals around the wrong. way? Really? "From this point forward, when we collect data and do experiments, all mass value measured on a triple beam balance must be recorded as precisely as possible, and the uncertainty associated with that measurement must be +/- 16,229g. Is it possible momentum was conserved in your lab?" "Will your object float in water?" The results become silly. The 16,229g mark is an extreme to make a point. However, 20 grams? The 3 crank method with an uncertainty of 20g is still rather large, in my opinion.
Separately, we always talk about keeping our data. If we get something unexpected, we keep it and take more data. We often plot all of our data on our graphs and get the best fit line and relationship from all of our data (not always, sometimes we have issues with what we would like to plot when compared to what our software will let us plot). We discuss how the cardinal sin of science is fudging data. No number of Hail Mary's can solve that problem.
We keep all data. But I don't think the 164 mark is helpful here.

Have a good one.
Paul.




-----Original Message-----
From: Phys-l [mailto:phys-l-bounces@www.phys-l.org] On Behalf Of Bill Nettles
Sent: Monday, January 19, 2015 10:01 AM
To: Phys-L@Phys-L.org
Subject: Re: [Phys-L] peculiar data; was: uncertainty

An experiment yields anamolous data, but a careful review identifies problems in the experiment (or in some of the data collection) which can be matched to the anamolous data. Do you formally report the anamolous data (I know you MUST keep it in the log book), then give justification for excluding it, or do you ignore it (and the procedural/instrumentation errors) in the formal report?

As a case study, consider the OPERA neutrino measurements. If they had found the loose/disconnected cables before writing their paper, would they include the mistake or exclude it in the journal submission? Are there different rules for HS and college experimental reports and peer-reviewed articles?

In college I'm looking for techniques and thought process, so I want to see all the data with the students' justifications, mistakes, and conclusions put into writing. I guess that would be the "internal discussion memo" rather than the journal submission.
________________________________________
From: Phys-l [phys-l-bounces@www.phys-l.org] on behalf of John Denker [jsd@av8n.com]
Sent: Monday, January 19, 2015 8:35 AM
To: Phys-L@Phys-L.org
Subject: [Phys-L] peculiar data; was: uncertainty

On 01/12/2015 12:40 PM, Paul Lulai wrote:
What I normally do:
.have every kid measure the mass of the same thing, I enter it into a spreadsheet.
.we find the average, max & min
.throw out any unresasonable values.

Sometimes throwing out "unreasonable" values is necessary, but it is exceedingly difficult to do this properly. In an introductory course, it may be easier and safer to just keep all the data.

For example: Suppose you are running an assay office. A prospector brings you 1000 samples. The first one has a negligible amount of gold. Ditto for the second one, and the third one, et cetera. Oddly enough, one of the samples has 5 sigma more gold than the average. Should you throw out that reading on the grounds that it is "unreasonable"?
You could argue that based on simple statistics, there should be only one chance in 10^12 of finding a sample like that.

However (!) if you do that, you have completely defeated the purpose of your job, and the prospector's job. It turns out that in this world, a lot of things (including
gold) are not Gaussian distributed.

Please do not teach students to throw out data just because it "looks unreasonable".

As another example: Suppose you have a long-term project to measure and re-measure the brightness of the stars.
You get around to measuring Cassiopeia every two years, in mid-November of even-numbered years. The magnitude of the brightest star is:

Year Apparent
Magnitude
1552 2.24
1554 2.24
1556 2.24
1558 2.24
1560 2.24
1562 2.24
1564 2.24
1566 2.24
1568 2.24
1570 2.24
1572 -4.3
1574 2.24
1576 2.24
1578 2.24
1580 2.24
1582 2.24
1584 2.24
1586 2.24
1588 2.24
1590 2.24
1592 2.24

Are you going to throw out the 1572 result as being "unreasonable"? If you do that, you are trashing one of the most important and interesting observations in the history of astronomy.

Please do not teach students to throw out data just because it "looks unreasonable".

One more example: If you veto intelligence reports that say those aluminum tubes are /not/ suitable for making uranium centrifuges, and veto other reports that do not support your preconceived agenda, you will make a mistake that costs trillions of dollars and kills hundreds of thousands of people in the short run, and creates problems that are likely to linger for decades.

It is possible to design experiments with internal controls ("measure twice, cut once") but doing this properly requires some rather sophisticated skills.
Doing it properly is AFAICT beyond the scope of the introductory course ... although if somebody knows of a simple way of doing it properly I would very much like to hear about it.

In particular, it is *not* acceptable to identify a "suspicious" result on purely statistical grounds and then go remeasure it. In such a case you are quite likely to see what's known as regression to the mean.
This will seemingly confirm your suspicion, but in reality such a process is horribly fallacious and invalid. This is a very common mistake.

If you are going to veto some measurements, you need to work very hard to make sure that the veto is not correlated with the thing you are trying to measure.
In particular, if you are measuring the size of eggs, it might make sense to skip any eggs that got dropped and broken ... but beware! It may be that the egg- measuring machine has a tendency to drop the smallest eggs. In that case, vetoing the broken ones leads to a gross systematic overestimate of the size.

Possibly constructive suggestion: Here is an example of an experiment with internal controls. Assign each student to measure a square. Measure all four sides /and both diagonals/. This gives a set of six measurements. Each set has some obvious internal consistency checks. If it fails these checks, you can throw out the entire set. Do not simply throw out the one measurement that looks fishy. Then average over all surviving sets to get an estimate of the size of the square. This is not a perfect example, but it is better than just throwing out readings on a whim.

Any process that minimizes one kind of error will exacerbate some other kind of error ... so you have to be really, really careful with this.

Things that are not tolerated in the real world should not be encouraged in high school.
-- Publishing data with a few peculiar, possibly
erroneous points ... that's mildly embarrassing.
-- Fudging the data to remove peculiar-looking
points ... that's serious misconduct.

Please impress on students that vetoing data is a nasty business. It easy to do wrong, and hard to do right. In most cases, it is better to keep all the data, including the anomalous data.
_______________________________________________
Forum for Physics Educators
Phys-l@www.phys-l.org
http://www.phys-l.org/mailman/listinfo/phys-l
_______________________________________________
Forum for Physics Educators
Phys-l@www.phys-l.org
http://www.phys-l.org/mailman/listinfo/phys-l