Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-L] Re: Human Error?



On 04/29/05 19:02, Michael Edmiston wrote:

[snipping a lot of reasonable, nuanced arguments]

In the end, human errors are not tolerable. That doesn't mean they
don't occur... it just means they are supposed to get fixed before
you publish your paper. Having said that, I liked John Denker's
response that students are sometimes in situations where they don't
have the opportunity or time to fix human errors. I agree with that.
Of course, if you ask my students, especially near the end of the
term, they don't have time even to do the lab once, let alone repeat
it.

I'm being unfair to take one sentence out of context,
but still I think it is dangerous to say "human errors
are not tolerable" ... if only because of the risk
that it will be taken out of context. A punchy
statement like that is apt to be remembered long
after the context is forgotten.


We agree there will always be some human errors.
For several reasons, I don't agree that it will
always be possible to fix them "before publication".
-- Sometimes there are schedule constraints, budget
constraints, or whatever ... so at some point you
have to go with the data "as is". For example,
suppose you are observing a once-in-a-lifetime
eclipse or supernova or something: retaking the
data is just not an option. If said you were
going to take data with three instruments, and
one of them was mis-aligned due to human error,
it's human error, and there's no other name for
it. You can publish the remaining 2/3rds of the
data, but at some point you'll have to explain
to the funding agency that the rest of it was
lost due to human error.
-- Even if you can "fix" the data, how can you
do that except by taking more data? That is a
process which is, by Murphy's law, also subject
to error! All you can hope to do is to take
enough data so that the physically-interesting
parameters are well determined (i.e. overdetermined)
_despite_ some irreducible incidence of error.

Actually, I have deep objections to the notion
that data errors can be "fixed" at all. Data is
data ... and to me, as a card-carrying or at
least scar-carrying experimentalist, the data is
sacred. "Fixing" the data sounds to me like you
are going to surreptitiously discard some data
points that you "don't like". What basis do
you have for "liking" or "not liking" the data?
Maybe you don't like outliers? Aaaarrghhh!!!
What if the outliers are right? What if the
outliers are telling you something very
important? (Example: Suppose you know the
mean and standard deviation for the distribution
of gold in the earth's crust. Now suppose you
make a measurement that is about 20 sigma from
the mean. Are you going to throw it out on the
grounds that it is an outlier? [Hint: People
don't mine for gold in places where there is
an "average" amount of gold.])

A statement I often repeat to the students goes like this. If you
determine it is a problem that you are only making measurements to
the nearest millimeter and you need measurements to the nearest
tenth-millimeter... you don't need a different human... you need a
different instrument.

Again I point out: the error-proof instrument might
not be available within the schedule and budget
constraints.

For example: There is a non-zero rate of car-crashes
due to driver error. It is easy to say "you don't
need a different human ... you need a different car"
but I don't know where to buy a car that is immune
to driver error.

Let me explain a little more where I'm coming from.
One of my hobbies is teaching people to fly airplanes.
The training has two main parts:
a) Flying the airplane when everything is going
right, and
b) Flying the airplane when stuff starts going
wrong.

Part (a) is almost trivial. The proverbial nine-year-old
girl next door can fly the plane when everything is going
right, which is 99+% of the time.

Part (b) is very complicated. It is predicated on
the idea that since there *will* be human error,
we simply must make the errors tolerable. Saying
"human error is not tolerable" in this context is
diametrically the wrong way to think about the task.

I can go into detail if anybody is interested, but
basically the right approach involves always putting
yourself in a situation where you can tolerate not
just one human error, but several errors on top of
errors, and still complete the mission safely.

========

BTW I agree with others who pointed out that often
it is better to think in terms of "uncertainty"
rather than "error". The two terms are not
synonymous, and usually uncertainty is the main
thing you care about. Human error is one
contribution to the uncertainty.

Similarly I approve of the terms "data analysis"
and "propagation of uncertainty" and try to
avoid the term "error analysis".

========

Just to show I'm not an extremist, I do sympathize
with those who are frustrated when they see "human
error" on lab reports. I just think the essence
of the frustration is being misdiagnosed. IMHO
the problem is not that "human error" is a wrong
description of the error ... rather, it's an
insufficient description, i.e. it is an unduly
vague catch-all. I have no objection to a
paragraph that starts out vague, so long as
the vagueness is promptly removed. If somebody
wants to talk about human error, they have to
spell out:
-- the detailed nature of the error, and
-- how we know that's what happened.

Seeing an outlier is *not* sufficient grounds
for deciding that an observation is bogus.
One must distinguish between insignificant
outliers and significant outliers ... which
can be very tricky, usually beyond the ability
of intro-level students.

So I sympathize with the students as well, for
a different reason: They're being asked to do
something that they cannot possibly do:
-- Nobody guarantees they can take error-free
within the usual schedule constraints and budget
constraints, and yet
-- they don't have the skills needed to properly
deal with error-ridden data.

We shouldn't get frustrated, and we shouldn't
assume the students are lazy or stupid (although
the lazy and stupid ones will have more problems
than the others). Data analysis is hard hard
hard. I spent most of last thursday talking to
a very smart professional scientist about how to
do the data analysis for his experiment. Both
of us have read all the standard undergrad-level
books on data analysis, and a lot more besides,
and I can assure you that all that put together
is maybe 10% of what you need to know to really
analyze real data.

One constructive (but not simple) suggestion:
We really ought to design lab courses -- all
lab courses, including the most elementary
ones -- around a "spiral" structure, so that
people take some data, do some analysis, and
then take some more data. It sounds like
Michael E. has done something along these
lines. (If this means only doing half as
many experiments during the year, so be it.
Better N/2 done right than N done wrong.)

It's a good thing I don't easily get frustrated,
or I would long ago have torn out all my hair
on account of post-docs who don't know the
first thing about doing an experiment. These
are folks with 20+ years of education from
brand-name schools, but somehow most of them
think they can take all the data and then do
all the analysis. It's a recipe for disaster.
_______________________________________________
Phys-L mailing list
Phys-L@electron.physics.buffalo.edu
https://www.physics.buffalo.edu/mailman/listinfo/phys-l