Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: not physics, not linguistics, and not statistics



Hi --

At 11:43 PM 5/29/99 -0400, Greg Darakjian wrote:

A student of mine asked me to help her out with the following problem.
Would somebody care to provide an answer ?

The question is so defective that "helping out" and "answering the
question" are two different things entirely.

"Linguists study the changes in word usage in a language over time.

OK.

They find that words disappear from a language

OK.

according to an exponential model
W(x) = e raised to the power -0.75 x where W(x) represents the percent of
words remaining in a language after 'x' thousand years

Horsefeathers!

The given model predicts that after 600 years, 36% of words have died out.
So let's take a look at some text that is 600+ years old, e.g.

http://etext.lib.virginia.edu/etcbin/toccer-new?id=Cha2Can&images=images/mod
eng&data=/lv1/Archive/mideng-parsed&tag=public&part=14&division=div1

a) If you count changes in spelling, then nearly 100% of words are changed,
very much inconsistent with the model.

b) If you don't count changes in spelling, it appears that 3 or 4 of
Chaucer's first 73 words are pretty much dead. That's 4 or 5%, very much
inconsistent with the model. (See fine point below.)

Formulate a visual presentation to estimate the dates for the disappearance
of 25%, 50%, 75%,and 100% of today's language.

The 100% mark would be quite difficult to obtain. If we start out with
100,000 words, it would take something like 31,000 years before the last
surviving word had a 50/50 chance of extinction. Something tells me that
even if the model were initially accurate, it would have broken down before
this point. We don't have written records of *any* language over a 3,000
year period, let alone 30,000 years.

Suppose a word fell into disuse this year.
In what year would it have originated ?

The formulation of this question betrays a nonunderstanding of statistics
(independent of the aforementioned nonunderstanding of linguistics). That
is, even if we pretend for a moment that the model correctly describes
something, the year-of-origin question is open to at least three different
correct answers. None of the possible correct answers is useful in a
practical sense or even in a pedagogical sense.

a) Any year that might be mentioned as the year of origin is unlikely to be
the right year. In fact there is at most a 0.07% chance that a given word
arose in a given year. The largest chance occurs in the most recent year.
That's the modal year. Relatively fewer 1000-year-old words expired this
year, simply because there are fewer of them in the mix right now.

b) The median year is a little over 900 years ago. That is, 50% of the
words are older than that, 50% younger.

c) The mean age is longer: a little over 1300 years. That's because among
words in the old half (older than the median), some of them are *very* old.


Furthermore, the question is needlessly complex. The age of a word is
independent of whether it did or didn't fall into disuse this year.

List any words
that have disappeared from our language in the last 100 years.

Grab a 100-year-old book and see what dead words you can see. You'll find
more if you grab something with a lot of dialog, such as a play; the
spoken language changes much faster than the written language.

A fine point: Remember that the normal reading vocabulary is larger than
the normal writing vocabulary. Therefore don't just ask whether a word is
readable, but ask whether it is writable. It's only fair to compare
100-year-old writing with today's writing, not today's reading.