Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

Re: [Phys-l] test : fancy symbols, utf-8 encoding



For those of you whose mailers can't handle utf-8 encoding, try
pointing your browswer at the archive, namely:
https://carnot.physics.buffalo.edu/archives/2006/02_2006/msg00240.html

Browsing the archives is *not* a test of utf-8 encoding, since
evidently the /mailman/ server re-encodes each fancy character
as an html numbered entity, and then writes the archive page
using the prosaic iso-8859-1 encoding. I consider this quite
sensible.

The archive version looks fine on my system under Firefox and
under plain old Mozilla.

Under the KDE browser (Konqueror) the Greek, French, and Cyrillic
letters look fine, but the math operators are blocked out. I
hypothesized that Konqueror was using some lame font that lacked
glyphs at these codepoints. I was unable to test this hypothesis,
because I couldn't get Konqueror to switch fonts! Apparently
Konqueror is just broken. My recommendation: Install Firefox
and forget about Konqueror.

=========

For those of you who receive the list in digest mode, I'm aware
that most of the fancy characters got trashed. Maybe some day
somebody will upgrade that part of /mailman/, but in the meantime
you'll have to look at the archive to see the fancy characters.
-- One solution would be to make the digest MIME multipart, with
different parts having their own encoding.
-- Another solution would be to make the digest 100% unicode,
and recode any messages that arrived in another encoding (which
is pretty much the purpose for which unicode was invented).
This might be doable just from the /mailman/ configuration files,
without even touching the code.

=========

FWIW it appears that /mailman/ is now setting the Reply-To: field
to point to the list, in accord with long phys-l tradition.
(For a day or two, it wasn't.)

===========================

Marc "Zeke" Kossover wrote:

>>James Clerk Maxwell :
>> ∇ ⋅ E = 4π �
>> ∇ × E + 1/c ∂B/∂t = 0
>> ∇ ⋅ B = 0
>> ∇ × B - 1/c ∂E/∂t = 4π c j

This is what you expect to see whenever a webmail system naively
coerces the encoding from utf-8 to iso-8859-1 *without* recoding
the utf characters. You should complain to the ISP. It is the
easiest thing in the world for them to notice the utf encoding
on the incoming mail and recode the characters. (Recoding is a
one-liner in perl: there's a built-in function to do it. It's not
much harder in any other language.) Evidently Yahoo doesn't do it
right; does anybody know if gmail is smarter?

=========================

To answer a question several people have asked about how I typed
in those names and equations: the answer is that my mailer doesn't
(so far as I know) provide useful support for constructing arbitrary
characters from the keyboard ... so I have to cut-and-paste them
from elsewhere.
1) I got Zhukovsky's name from some Russian site, and I got
Archimedes's name from some Greek site ... cut and paste.
2) I can typeset pretty much anything in LaTeX and turn it into
html using HeVeA. Then I can cut-and-paste from that html page
into the email I'm composing.
3) If I just need a character or two, I can cut-and-paste them
from a big table such as:
http://www.av8n.com/computer/utf/font-chart.html
(There are other such tables available on the web.)
4) Over the years I've compiled a small glossary of names and
phrases, especially those with funny diacritical marks in them.
5) Now that the Maxwell equations and Schrödinger equation are
in the phys-l archives, you can cut-and-paste them from there.


==================================

I imagine the "march of progress" cartoon will look something like
this:

0) Plain ascii text.
1) Text with a few fancy characters, encoded as utf-8 and/or as
html character-entities.
2) Unrestricted HTML (with formatting directives, not just
individual character entities).
3) Attachments.

Level (2) is open to abuse: For example, somebody might choose
to post messages using very large characters (ten inches high
or larger) ... perhaps because they look better on his screen,
or perhaps just to make a point. I don't want to read such
things. Also there's lots of potential nastiness associated
with cookies, javascript, included images, et cetera.

Level (3) is quite beyond the pale for the forseeable future.
It carries infinite risk of abuse, including viruses, worms,
and other wildlife.

For that matter, it is an open question as to whether we are
ready for level (1) ... hence the recent test messages.

In *any* case, one way to deal with things that can't be
represented in plain text is to put them on the web in some
standard, portable format (such as html, png, or pdf) and
put a link in a message to the list. This is far from
perfect, but better than nothing, i.e. better than being
rigorously restricted to plain text only.