Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

sig digs versus signals extracted from noise



Hi Folks --

This is to continue the explanation of why there cannot possibly be a
"good" set of rules for rounding off numbers. To illustrate the
discussion, there is an example at
http://www.monmouth.com/~jsd/physics/roundoff.htm

(I put the data there because I needed better control of the formatting
than plain email would allow.)

The leftmost column is a label giving the row number. The next column is
the raw data. You can see that the raw data consists of numbers like
0.048 ± 0.018
and you already see that this violates the usual "sig digs" rules. There
is considerable uncertainty in the second decimal place, so why am I
recording the data to three decimal places?

Answer: as will become clearer as we go along, it is vitally important to
keep that third decimal place. We are going to calculate the average of
100 such numbers, and the average will be known tenfold more accurately
than any of the raw inputs.

To say it in slightly different terms: suppose we know there is an
important signal in that third decimal place. The signal is obscured by
noise; that is, there is a poor signal-to-noise ratio. Your mission,
should you decide to accept it, is to recover that signal.

This sort of signal-recovery is at the core of many activities in real
research labs, and in industry. The second thing I ever did in a real
physics lab was to build a communications circuit that picked up a signal
that was ten million times less powerful than the noise (SNR = -70
dB). Your typical GPS receiver deals with even worse SNRs -- and the stuff
that JPL puts in the Deep Space Network is mind-boggling. Throwing away
the signal at the first step by "rounding" the raw data would be a Bad Idea.

*** Take-home message #1: Signals can be dug out from the
noise. Uncertainty is not the same as insignificance, because a digit that
is uncertain (and many digits to the right of that!) can be dug out by
techniques such as signal-averaging. Given just a number and its
uncertainly level, without knowing the context, you cannot say whether the
uncertain digits are significant or not.

*** Take-home message #2: An expression such as 0.048 ± 0.018 expresses
_two_ quantities: the value of the signal, and an estimate of the
noise. Combining these two quantities into a single numeral by rounding is
a Bad Idea. In cases like this, if you round to express the noise, you
destroy the signal.

------

Now, returning to the numerical example: I assigned three students (Alice,
Bob, and Carol) to analyze this data. Alice didn't round any of the raw
data or intermediate results. She got an average of
0.0435 ± 0.0018
and the main value (0.0435) is the best that could be done given the
samples that were drawn from the ensemble. (The error-estimate is a
worst-case error; the probable error is somewhat smaller.)

Now Bob was doing fine until he got to row 31. At that point he decided it
was ridiculous to carry four figures (three decimal places) when the
estimated error was more than 100 counts in the last decimal place. He
figured that if rounded off one digit, there would still be at least ten
counts of uncertainty in the last place. He figured that would give him
not only "enough" accuracy, but would even give him a guard digit for good
luck.

Alas, Bob was not lucky. Part of his problem is that he assumed that
roundoff errors would be random and would add in quadrature. In this case,
they aren't and they don't. There are actually two different reasons why
every roundoff that Bob performs biases the result in the "up"
direction. These errors accumulate linearly (not in quadrature) and cause
Bob's answer to be systematically high. The offset in the answer in this
case is slightly less than the error bars, but if we had averaged a couple
hundred more points the error would have accumulated to disastrous levels.

Carol was even more unlucky. She rounded off her intermediate results so
that every number on the page reflected its own uncertainty (one count,
possibly more, in the last digit). In this case, her roundoff errors
accumulate in the "down" direction, with spectacularly bad effects.

Note that Alice, Bob, and Carol are all analyzing the same raw data; the
discrepancies between their answers are entirely due to the analysis, not
due to the randomness with which the data was drawn from the ensemble.

*** Take-home lesson #3: Do not assume that roundoff errors are
random. Do not assume that they add in quadrature. It is waaaay too easy
to run into situations where they accumulate nonrandomly, introducing a
bias into the result. Sometimes the bias is obvious, sometimes it's not.

*** Important note: with rare exceptions, computer programs round off the
data internally and automatically. Just because the computer has "lots" of
significant digits doesn't mean that it has "enough" for your
purpose. Homebrew numerical integration routines are particularly
vulnerable to serious errors arising from accumulation of roundoff errors.

----

One the things that contributes to Bob's systematic error can be traced to
the following anomaly: Consider the number 0.448. If we round it off, all
at once, to one decimal place, we get 0.4. But if we round it off in two
steps, we get 0.45 (correct to two places) which we then round off to
0.5. This can be roughly summarized by saying that the roundoff rules do
not have the associative property. You will have significantly less
trouble if you use the following Denkerized rule: round the fives toward
even digits. That is, 0.75 rounds up to 0.8, but 0.65 rounds down to
0.6. There are cases where this is imperfect (e.g. 0.454) but it's better
overall. A quick-and-dirty check for this problem is to do the calculation
twice, rounding all the fives up one time and down the next time. Or flip
a coin each time you need to round a five.

------

You can play with the spreadsheet yourself. For fun, see if you can fiddle
the formulas so that Bob's bias is downward rather than upward. For excel
version 9 (office 2000) you can read in the aformentioned .htm file; just
save it to disk and open it using excel.
http://www.monmouth.com/~jsd/physics/roundoff.htm
while people with other versions and/or other brands of software might be
better off with
http://www.monmouth.com/~jsd/physics/roundoff.xls
Note: I've got automatic recalculation turned off; you can either turn it
back on, or push your spreadsheet's "recalculate" button (F9 or some such)
when necessary.

Also BTW, hiding in columns R,S and T is a Box-Muller transformation to
create Gaussian random numbers. Hint: this is a good trick to know.

------

Additional constructive suggestions and rules of thumb:

*) Remember, uncertainty is not the same as insignificance.

*) It is always better to have too many digits than too few.

*) State both the main value and the uncertainty, either as
0.048 ± 0.018
or as 0.048(18)

*) Before you decide to round off, you must do some sort of theoretical
and/or operational check to ensure that rounding doesn't introduce a
serious error.

*) When talking with non-experts, be prepared for the possibility that they
might have no clue about signal recovery and might think Moses brought the
sig digs rules down from Mt. Sinai.