Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: Design of Experiment



Larry Woolf wrote:
...
I have seen it used and misused.

:-)

all of us who were active in the field knew what
the important processing variables already were

Indeed!
There is no substitute for understanding what's going on.

Tim Folkerts wrote:
...
APPROACH 3: Pick a small number (often two) different choices for each
variable. You can then run all possible trials (2^6 = 64) - known as a
"full factorial design" - in our case. Or better, run a carefully chosen
subset - a "partial factorial" - as illustrated below. The subset of 16
listed below is not unique, but it does have special properties. Each
factor is high 8 times and low 8 times. Furthermore, in the set of 8
trials where any one variable is high, each other variables is high 4 times
and low 4 times. You lose some info by cutting the trials, but you save
time. You discover which variable have large effects and which have small
effects. You also get a good idea about which of the variables are
interacting with others. And note the random order, which cuts down on
unidentified systematic changes.

Run A B C D E F
1 - + - - + +
2 + + - - - +
3 + - - - + -
4 - + + - + -
5 + + + - - -
6 - - - + - -
7 - + - + + +
8 - + + + + -
9 + + - + - +
10 + - + + + +
11 + - - + + -
12 - - - - - -
13 + + + + - -
14 - - + - - +
15 - - + + - +
16 + - + - + +

This is an informative example. Some people consider this
approach (#3) to be "the" big idea that you get from DoE.
I'm not sure I agree; I attack with both edges of a two-edged
sword:
-- there are other good ideas in DoE, and
-- the "factorial analysis" (partial or otherwise) idea
isn't nearly as useful as some partisans seem to suggest.

In particular, the problem that factorial analysis purports
to "solve" is provably impossible to solve in the general
case. The problem is solvable if
*) you've got an easy case, in which practically any approach
will work, and the factorial-analysis approach doesn't add
much, or
*) you have a hard case to which you can bring to bear some
understanding of the physics of the problem. (We're talking
about partial understanding; call it intuition if you like.
Perfect understanding is not required.) To the extent
that DoE suggests you can attack hard problems without any
understanding, it is a Bad Idea.

To illustrate an easy problem, suppose you want to find the
ocean. Starting from most (but not all) places in the world,
you can take a step in a random direction. If this step takes
you to a lower elevation, keep it; otherwise undo it and
choose another direction. It doesn't matter if your first
step is in the N/S direction or the E/W direction or anything
else. This approach is called "local optimization" or "local
search". The geography in most places is sufficiently uncomplicated
that local optimization will take you to your goal. There is
some chance of getting caught in a local minimum ("sand trap")
but you can modify the algorithm to add provisions for handling
such cases, so long as the topography isn't very complicated.
Also note that this easy problem has a low dimensionality: the
domain is two-dimensional.

To illustrate a hard problem, consider trying to apply this
"local optimization" idea to writing a computer program. Let
each character in the source-code be adjustable separately, so
that even a short ten-line program can be adjustable in a huge
number of separate directions. If each character can usefully
take on 40 different values (a-z 0-9 and punctuation), the number
of possible short programs is something like 40^200, which is a
rather large number. The number of medium-long or long programs
is a reeeeally large number. Now my point is that local search
is not going to do much good. If you take a non-working program
and randomly change the spelling, it is very unlikely that you'll
get something that works better. You can't afford to explore all
40^200 possibilities, or even a useful subset thereof.
-- The problem is highly nonlinear.
-- The problem is high-dimensional.
-- There are no smoothness properties you can rely on,
even though there is sometimes a teeny bit of smoothness:
if the code contains a number like "123.456" you might be
able to change it to "123.457" without drastic effect. In
contrast, think what happens in the more-common case where
you have a keyword like "while" and you change it to "whilf".

To illustrate a slightly-hard problem, consider the
squealing, chirping noise that modems make when they
first connect. They are adjusting a digital filter
with hundreds or even thousands of taps, so they can
do echo cancellation. For a high-dimensional problem
like this, factorial analysis (partial or otherwise)
would be a disaster. The only reason the problem is
tractable at all is that the echo process is linear.
Therefore the topography to be searched is relatively
simple, and local optimization works, if you are clever
enough to organize the search properly.

MOST PROBLEMS IN THE REAL WORLD ARE HARD PROBLEMS. Just
because your textbook is full of easy problems (with low
dimensionality and simple topography) doesn't mean those are
typical of the real world.

There !!are!! ways of making incremental improvements to
hard problems. Returning to the example of the computer
program: You might take several lines of code, move them
away from their original location and put a subroutine
declaration header on it, and then call it as a subroutine.
But this is a far cry from randomly changing the spelling!
It is predicated on knowing what's going on, knowing what
the symbols mean.

Partial factorial analysis is certainly not a panacea. It
might help in the sense that it lowers the work-factor from
40^200 to 40^195 -- but who cares? The latter is utterly
intractable, so the fact that it is "better" by a seemingly
large factor (40^5) is academic in the worst sense of the
word. Someday you might be lucky enough to be presented
with a problem that is "just barely intractable" so that
a trick like partial factorial analysis makes a real difference,
but I wouldn't plan on it if I were you.

There's a huge literature on combinatorial optimization and
search -- maybe 50,000 papers -- and maybe another 50,000
papers on pattern recognition, which suffers from many of
the same problems, slightly disguised. Obviously few of
these papers have any great value -- otherwise there'd be
no reason to write all the others. Mostly they just
advocate applying such-and-such heuristic to this-or-that
specific task. There's very little discussion of general
principles.

My view of the principles goes like this:
-- At one extreme, if somebody has a complete understanding
of a problem, we have ways of capturing that: just
write a computer progam with if-then-else statements
expressing what is known. This is nice work if you
can get it. (Sometimes you can.)
-- At the other extreme, we can imagine solving the problem
starting with !!no!! understanding, just by enumerating
the possibilities, or otherwise searching the parameter-
space. This is, alas, purely imaginary. It provably
does not exist, except in trivial cases. The simplest
proof involves the Vapnik-Chernovenkis dimensionality,
and provides strict upper and lower (!) bounds on the
error rates resulting from such a search.
++ The future lies in between: what we need are principles
and systematic methods for capturing _partial_ understanding,
and then doing a search guided by this partial understanding.
This is a topic that is ripe for good research -- research
into the principles, as opposed to just adding to the
bestiary of ad-hoc heuristics.

This posting is the position of the writer, not that of Samuel Tilden or
Al Gore.

This posting is the position of the writer, not that of SUNY-BSC, NAU or the AAPT.