Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

listserv follies: character coding etc.



Ah, the listserv software strikes again.


Presumably most of you have already surmized that the garbled
word in my last msg was "theta-epsilon-omega-rho-iota-alpha".


That msg was sent in the ISO-8859-7 (Greek-enabled)
character set, as you can tell from Content-type line
shown in the archives:
http://lists.nau.edu/cgi-bin/wa?A2=ind0406&L=phys-l&F=&S=&P=12800
But the listserv software brutally rewrote the mime-headers
and distributed the msg as text/plain without specifying
any character-set at all, as you can tell by looking at
the headers of the msg as it sits in your inbox.

Actually, the listserv faces some interesting challenges.
Digest-mode would be a nightmare if each msg had a different
character-set. Similarly a search of the archives would be
a nightmare when returning snippets from many different
msgs.

A sensible person might think that the solution to all these
problems would be to rely on utf-8.
-- Most MUAs nowadays can handle utf-8.
-- Utf-8 will make it straightforward to include Greek letters
and mathematical symbols in our msgs.
-- It might even get rid of some of the =20 nonsense.

This situation is pretty much the raison d'etre for utf-8.

But alas the listserv software is known to not handle utf-8
very well. In particular, when microsoft sees the utf-8
charset, it forces base64 transfer-encoding ... and listserv
isn't smart enough to deal with base64, so the digests and
the archives to go pot. Aaaarrrrrrggghhhhh!

============

At some point, the right choice is to throw out listserv(tm)
and replace it with mailman(tm) or something like that.
Mailman is
++ Free (unlike listserv).
Check out http://www.list.org/
++ Easy to install and configure (been there, done that).
++ Efficient enough to deliver on the order of a million
msgs per day if need be, using modest hardware.
++ It doesn't mess with the body of messages (unlike
listserv, which changes the indentation of some lines,
messing up our attempts to typeset diagrams and equations).
++ Under more active development; latest release dated May 2004
(in contrast to listserv's latest release, dated Nov 2002).
++ Open source (and cleanly structured) so if you find a bug
you can fix it.
++ Reasonably smart about character-sets and encodings.
++ Fully-featured, including digests and nicely-formatted
archives...
-- although _searching_ the archives was an issue last
time I checked. Usually people just let Google index
the archives, while the recent msgs that Google hasn't
yet got can be searched by hand.