Chronology Current Month Current Thread Current Date
[Year List] [Month List (current year)] [Date Index] [Thread Index] [Thread Prev] [Thread Next] [Date Prev] [Date Next]

[Phys-L] fancy characters in email (on phys-l and otherwise)



Hi Folks --

1) There exists such a thing as a /character encoding/.
Each email has some sort of character encoding. (Similar
words apply to web pages, but that's not the main topic
for today.)

2) I stronnnnngly recommend that you encode everything using
unicode in general and utf-8 in particular. It is supported
on every platform anybody is likely to use for email, and
has been for years. It allows encoding of Greek letters,
math operators, and thousands of other things.
Ἀρχιμήδης
Anders Jonas Ångström
Maria Salomea Skłodowska
Lórand von Eötvös
Журнал Экспериментальной и Теоретической Физики
杨辉三角
+ − ± ∧ × \ | / ⁄ ÷ ⋅ · • • ∝ ≡ ∼ ≈ ≠ < > ≤ ≥ ≪ ≫ ∥ ⊥
√ Δ ∇ ∫ ∮ ∑ ⟨ ⟩ ½ ℏ ∞ ° ← → ↔ ⇆ ↑ ↓ ↕ ⇅ « »

If some of those don't look right on your system, it's
probably a /font/ issue rather than an encoding issue,
but that's a topic for another day.

3) If you set the /default/ character encoding on your
mailer, that isn't always the encoding you will get.
Sometimes when replying to a message, the mailer will
use the same encoding as the previous message. It's
sometimes hard to notice that this is happening, but
if you notice you can fix it by setting the encoding
manually.

As a related point: I've seen bad things happen when
somebody does a cut-and-paste from a message with one
encoding into a message with another. You'd think
any decent mailer would handle this case, but some of
them don't. Note that if everybody standardized on
utf-8, this problem would disappear.

4) When phys-l forwards your message, it uses the same
encoding you used when submitting the message. If you
are getting one result from home and another from work,
it almost certainly means your two mailers are using
different settings.

5) Email has /email headers/ including /MIME headers/.
Basic headers include To: From: Date: Subject: et
cetera. The NSA calls headers "metadata". Note:
a) Metadata is data.
b) Stealing metadata is stealing.
c) A cryptosystem that leaks metadata
is a cryptosystem that leaks.

Email (and web pages) use the Content-Type: header to
specify the character encoding, e.g.
Content-Type: text/plain; charset=UTF-8

So if you have questions about encoding, send yourself
a message and look at the Content-Type. Every mailer I
know of has a way of looking at headers. If all else
fails you can save the email as a file and look at the
file, but usually there are easier ways.

Another option is to send a message to somebody else
and ask them to look at the headers.

6) Documents that use unicode "may" include a byte-order
mark at the beginning of the file. Any decent mailer
will hide this from you, but if you save the email
to a file you might see it. If you see a BOM it is
overwhelmingly likely that what follows is unicode,
but the converse is not true. In any case, a BOM
is not a substitute for an appropriate Content-Type:
MIME header.
https://en.wikipedia.org/wiki/Byte_order_mark

7) As for composing messages with fancy characters:
Usually there are ways of constructing "some" fancy
characters by typing arcane sequences on the keyboard.
I've never bothered to learn the sequence for more
than a handful of characters, partly because not
all symbols I care about can be composed that way.
For the sort of things I do, it's easier to find (or
construct) a web page that contains what I want, and
then cut-and-paste into email. Hint:
https://www.av8n.com/jsd/play.html