Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: Complete the sequence problems



At 02:23 PM 7/17/99 -0800, John Mallinckrodt wrote:

As Phys-L seems a little, shall we say, listless this weekend

Har har har har :-)



Hi John --

1) You raise some important, indeed profound points.

2) Essentially every word that you wrote is correct in some sense.

3) OTOH there is much that is left unsaid, and it seems the big picture is
in danger of getting lost.

Usually a complete-the-sequence problem should be stated "find the *best*
extrapolation..." not "find the unique extrapolation..." because as you
point out there is obviously never a unique extrapolation.

To make progress, we should see if we can forge an agreement on what sort
of solutions should be considered objectively "better" than others.

In the pattern recognition field, there is a formalism that seems to work
pretty well: You imagine a whole ensemble of sequences, and you throw them
at a some "learning machines" and see which machine most often predicts the
end of the sequence (the "test data") given the beginning (the "training
data"). I'm leaving out a lot of details here for obvious reasons.

Your example could be taken to imply that all functions that accurately go
through the training data are equally good predictors of the test data.
This is certainly not the case. We know the saying from physics-lab data
analysis: "given enough parameters you can fit an elephant, and with a few
more you can wiggle his tail". Extra parameters are just "noise
amplifiers" and result in worse predictions, not better.

There is every reason to believe that the six-character predictor "months"
works better than the six-line function you presented. Here "better" does
not refer to elegance or other subjective properties; it refers to the
probability of making correct predictions.

More quantitatively, suppose you have a hierarchy of families of
predictors, each of which is contained in the next (for example
polynomials, where the set of 5th-order polynomials is strictly contained
in the set of 6th-order polynomials [if we allow the natural special case
of the leading coefficient sometimes being zero]). Then given two
predictors that fit the training data equally well, there are theorems that
strongly advise you to prefer the lower-order predictor. Indeed, sometimes
you should prefer a lower-order predictor even if it does a
less-than-optimal job of fitting the training data. These are quite strict
theorems, giving not only upper bounds on the error rates (guarantees of
success) but also lower bounds (guaranteeing that you can't do better). In
any case, the 11th-order polynomial you presented is not a contender.

Most of the literature in this field is notoriously hard to read. However,
one classic, relatively readable reference is:
Blumer, A., Ehrenfeucht, A., Haussler, D., and Warmuth, M. K. (1989).
Learnability and the Vapnik-Chervonenkis dimension.
Journal of the Association for Computing Machinery, 36(4):929-965.

At 02:23 PM 7/17/99 -0800, John Mallinckrodt wrote:

Marilyn,

Once again you have fallen into the trap, as you often do, of
supposing that the first n members of a sequence can determine the
(n+1)st member or beyond. Recently, a reader sent you the
sequence beginning

Y, Y, H, L, Y, E, Y, T, R

and asked for the next three members of the sequence. You
interpreted these values as the final letters of the first nine
months of the year and supposed that the next three members of the
sequence would be the final letters of the next three months--
"R,R,R"

But, however compelling that interpretation might be, it is simply
not dictated by the values themselves and the next three members
of the sequence could just as well be "A,A,A." For instance, if I
interpret the given sequence in terms of the associated numerical
positions in the alphabet, it becomes

25, 25, 8, 12, 25, 5, 25, 20, 18.

These numbers can, in turn, be interpreted as the values of some
function f(x) evaluated at x =

1, 2, 3, 4, 5, 6, 7, 8, 9

Of course there are an infinite number of such functions one of
which is

f(x) = 23455 - (1912504277/27720) x
+ (200368663/2400) x^2 - (1446068941/25920) x^3
+ (2802040471/120960) x^4 - (4598563451/725760) x^5
+ (33809683/28800) x^6 - (35865479/241920) x^7
+ (72409/5760) x^8 - (70979/103680) x^9
+ (13081/604800) x^10 - (481/1596672) x^11

and it "turns out" that this particular function has the
"surprising" property that f(10) = f(11) = f(12) = 1 which would
all be interpreted as "A"s.

Obviously, by suitable choice of the numerical coefficients I can
make the next three members of the sequence be whatever I choose.
Furthermore, by altering the rules of interpretation I can
introduce numerical elements, punctuation marks, or whatever I
want into the sequence.

I'm sure you understand all of this, but I fear that you may enjoy
the game of "solving" reader supplied "complete the sequence
puzzles" just a little bit too much to explain to your readership
why they simply have no unique solutions.