Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-L] numerical methods (was; sig figs)



On 08/20/05 01:40, Bernard Cleyet wrote:

data: [number of iterations / result
1 / 1.09
10 / 1.98
100 /1.9920
1k / 1.9921568
10k /1.99215768
100k/1.9921576907
1e6 /1.9921576931
4e6 /1.992157043 begins oscillation
6e6 / 7349
7e6 / 6066
7.5e6/ 7287
7.8e6/ 7141
7.9e6/ 7380
8e6 / 5894!
9e6 / 6096
1e7 / 6066 about eight minutes for this one

Welcome to the wonderful world of numerical methods.
Ugly results like this are par for the course.

Unless the computer truncates, I would think round off error would be
"somewhat random". Is truncation std.?

1) There is some control over rounding versus truncation,
but most computer languages put the floating-point processor
into "roundoff" mode and leave it there. And yes, there is
an industry standard for precisely how it does the roundoff.

Bizarre historical note: When the standard was being
developed, there were fights between guys who wanted
things to be done as accurately as possible, and others
who had pet algorithms that were easier to implement but
less accurate.

2) The scare quotes around "somewhat random" are appropriate.
It is super-easy to get into a situation where roundoff is
indistinguishable from truncation.

By way of example, imagine a computer where numbers are encoded
in such a way that the following numbers can be represented, but
nothing in between can be:
1.00000000
1.00000001
1.00000002
1.00000003
1.00000004
etc.

Now, suppose you have added a whole bunch of small numbers, such that
the sum is
s = 1.00000002
and suppose the next 100 addends are all very close to
ds = 0.000000001

Well, if you add s + ds (and then round off, as you must) you get
just s. If you do that 100 times, you still get just s. So
rounding is indistinguishable from truncation, as advertised.

This is ugly, because [s + (100 ds)] is significantly different
from s, namely
s' = 1.00000012

You can't improve the situation by fiddling with the roundoff
rules. If you're going to use ordinary floating point, you're
going to have to round off ... it's just a question of whether
you have enough guard digits.
*) The question of how many guard digits is "enough" depends
on details.
*) Alternatives to ordinary floating point include
-- arbitrary-precision floating point (can be adjusted to
give 100 or 1000 digits of resolution if you want)
-- interval arithmetic (i.e. keeping upper and lower bounds
on all numbers, including intermediate results)
-- rational arithmetic (i.e. ratios of integers) (includes
fixed-point as a special case) ==> no roundoff
-- ratios of bignums ==> neither roundoff nor overflow

Alas none of the alternatives are very well supported, although
there have been some dramatic improvements recently.

============================

Another trick that I have found useful goes something like this:
Suppose you are stepping a differential equation, where the
stepsize is given by the function diffeq(). Then, roughly:

s = initial_value() // some big number
ds = 0
repeat (N times) {
repeat (....) {
s_local = s + ds // there may be some roundoff here, but
// the errors are *not* cumulative
dds = diffeq(s_local)
ds = ds + dds // accumulate dds into ds
}
// here when ds is "big enough"
s = s + ds // accumulate ds into s
ds = 0 // keep ds from getting too big
}

The point is that we keep the large common-mode in s, and keep
it out of ds. The loop-within-loop structure is needed to
keep ds from getting too big.

Arrange the termination of the inner loop so that a typical
|ds| value is on the order of the geometric mean between a
typical |s| value and a typical |dds| value. If you can't
think of anything cleverer, run the inner loop for M=10^5 or
M=10^6 iterations (assuming all numbers are represented using
double precision, i.e. about 15 decimal digits of precision).

Another way of understanding the principle involved here is
to view it as re-arranging the order of summation. Add up
the terms in groups of M, then add up the groups. It's a tree
structure. You can imagine generalizing it to deeper tree
(loop within loop within loop, ddds --> dds --> ds --> s)
but I've never needed to do this.

This little trick is easy to implement and very efficient. It
is useless if there are wild variations in the magnitude of
dds, but it works fine in the much-more-common case where the
dds values are generated in some logical order, and "nearby"
dds values are all about the same size.

Also: try to arrange that all (or most) of the dds values
are of the same sign. This may be counterintuitive at first,
perhaps because you learned in high school that alternating
series often converge even in cases where the corresponding
non-alternating series doesn't. But we're not worried about
convergence here; we're worried about roundoff, and taking
the proverbial "small difference between large numbers" is
bad news. If you don't believe me, try summing the series
for exp(-10). You are much better off summing the series for
exp(10) and then taking the reciprocal.

Beloved reference: Forman S. Acton _Numerical Methods that Work_.
Don't be put off by the publication date. Computers have changed
a lot in the last 35 years, but the vast majority of the ideas in
this book are still valid (and the exceptions are obvious).