Dan's Mail Format Site:

Body: HTML Mail

E-mail has traditionally been a plain-text medium, ever since it was introduced on the ARPAnet in the 1970s (and possibly even earlier than that on individual time-sharing mainframe computers). However, some people wanted a way to use fancier formatting in their messages. Various proprietary formats were tried, but HTML ended up as the "standard" manner of doing this. How does HTML e-mail work, and is it a good idea or a bad one? This article discusses the hows, whys, and why-nots.

HTML E-mail: The Basics

HTML (HyperText Markup Language) is, of course, the format used for Web pages. It was invented in 1990 by Tim Berners-Lee, the creator of the Web. E-mail had existed for over a decade before that, so obviously HTML e-mail is a latecomer. However, it became possible to use HTML for the main body of an e-mail message once MIME headers were introduced. These headers (which I discuss more elsewhere) are able to specify what data format is being used, so the receiving program knows whether the message is in plain text or HTML. Starting in the late 1990s, mail programs began to support the sending and receiving of HTML messages, using the MIME type text/html.

Obviously, when the first HTML-format mail started going out, it faced a problem in that most users at the time were still using programs that didn't support the reading of HTML. This was resolved by using a multipart message, with both plain-text and HTML versions. The content type header of the message as a whole is multipart/alternative, indicating that it is composed of several parts, of which the reader should choose only one to display. That way, a mail program that understands MIME and HTML can display the HTML version; a program that understands MIME but not HTML can display the plain text version; and a program that doesn't understand MIME will display the raw code, so the plain text version is placed first so that it can be read normally (with a bunch of messy code beneath it). Some mail programs even give the viewer a choice to display the "plain" or "fancy" versions of messages as a configuration setting.

Mail programs that support the sending of HTML e-mail generally have a configuration setting to determine whether to send outbound messages in plain text or HTML form. Many default to HTML form (as discussed and criticized below) but can be configured to send only plain text at the user's option. A few, however, send in HTML form and are difficult or impossible to configure any other way. Some smarter programs decide which format to use based on what is needed for the current message; their message editor has the ability to add special formatting (bold, italics, headers, etc.) and hyperlinks, and uses HTML format if any of these features are used, but plain text if they are not.

What is HTML e-mail good for?

Some will reply "Absolutely nothing!" The next section below gives some reasons for this position. However, HTML e-mail wouldn't have become as popular as it now is if it had no advantages at all. It can have a useful purpose. A long message with a complex structure can be more readable and understandable if there are headers, emphasized passages, italicized citations, bulleted lists, and other structural elements made possible by HTML. True hyperlinks in HTML messages may work better than inserted URLs in a plain text message (which might get broken in the middle if they are too long to fit on a line). Charts, graphs, and illustrations added via inline images may be an essential part of the information content of an article or report. Even the more exotic things one can do in HTML, such as the embedding of sound or video data, can have their uses; an electronic greeting card just wouldn't be the same in plain text.

Then why do so many people hate it?

Unfortunately, for every message that uses HTML effectively, there are hundreds that use it in a useless or counterproductive way. While the use of well-structured, valid HTML can enhance the readability and understandability of a message, few e-mail writers have any interest in taking the time to do this; e-mail is generally a medium of quick comments tossed off without much effort. Usually, the writer will just type in some text and hit "Send", without any attempt at special formatting or structural elements such as headers. If the writer's program defaults to sending mail in HTML form, the resulting message will just consist of plain text with some pointless HTML tags wrapped around it. Often, such messages will actually be less readable than normal non-HTML text; the reader's mail program will be configured to display plain text in a sensible font, while HTML e-mail contains font tags that try to force the display into a font face, size, or color that is harder to make out. For this reason, many people who use mail programs that give them the option to see the plain or fancy versions of a multipart/alternative message opt to see the plain version.

There is one category of e-mail senders that actually does take the time and effort to craft carefully an HTML message that takes advantage of the strengths of this medium -- but it's likely you don't want to see the results of their work. These are the advertisers and marketers who clog your inbox with spam promoting the junk they're selling. Just like TV commercials are among the most slickly and expensively produced things on the air, and junk paper mail is much slicker and more colorful than ordinary personal letters, junk e-mail makes much more use of any fancy formatting that it's possible to wring out of today's mail reader programs than any other sort of e-mail. This, in fact, is probably a major factor that's driving the development of "enhanced" e-mail, and the reason vendors like Microsoft turn HTML e-mail on by default; the better for advertisers to make their pitches more intrusive and annoying. After all, MS and their big-business friends have their own marketing mail they want to send you if they can con you into "opting-in". That other marketers with fewer scruples follow by deluging everybody (whether opted in, out, or none-of-the-above) with tons of HTML-formatted pitches for herbal remedies, porn, gambling, and hot investments isn't their problem.

And, also, a multipart text-and-HTML message is likely to be at least three times the size of the same message as plain text; after all, it includes the plain text version, plus an HTML version that repeats all the same text plus a whole mess of code like this:

[Screen Shot]

Hence, HTML messages are wasteful of bandwidth and disk space. If they used clean, logical, valid HTML, they'd be nowhere near as wasteful, but in practice many mail programs generate incredibly messy and standards-noncompliant code. And in some cases, if you turn on HTML mail, even the alternative plain text version that accompanies it is malformatted; several programs screw up the line length of messages when HTML is enabled.

But on the other hand...

...there are online newsletters that go out to willing subscribers (in some cases they even pay to subscribe!), some of which use carefully-crafted HTML to present useful things like headers, emphasis, and illustrations. Just like paper mail, where you might subscribe to some magazines which come out on slick paper with fancy layouts, like junk mail, but you want to receive them. So HTML e-mail isn't always evil. Still, if you're publishing an e-mail newsletter, you should give your recipients a choice of whether to get it in text or HTML form; some may prefer plain text or have a mail program that doesn't deal well with HTML. And for your normal non-newsletter correspondence, stick to plain text (configuring your mail program away from defaulting to HTML if you're using a program that does this) unless you actually use the enhancements of HTML for something that helps your message (putting the whole thing in a cutesy script-style font or with a background image that looks like notepaper probably doesn't qualify).

Including Images

There are two ways to include images in HTML e-mail. One way is to include the images as file attachments associated with the HTML message (to give some more technical detail, this calls for the message to have content type multipart/related, with the first sub-part within it being multipart/alternative (containing the nested multipart combination of the plain text and HTML versions of the message) and subsequent parts being the appropriate MIME type for the images, like image/jpeg, etc. Each image has a Content-ID header giving a unique content ID string for referring to it (I describe these more in the page on MIME headers), so that the HTML can then refer to them in IMG tags using cid: URLs (as described in RFC 2111).

Whew... quite a bit of technical stuff, but fortunately you don't generally have to know it unless you're creating a program or script to generate this sort of mail (something I've actually done myself)... as an end user, you probably just have to drag the image into the message you're composing and the program does it all for you... hopefully correctly (though you never know, especially when it's a program from Microsoft).

The other way, sometimes termed "Lazy HTML", is not to attach the images to the message, but instead include references to images on the Web with normal http: URLs within IMG tags. There are a number of advantages and disadvantages to each of the methods:

When the images are referenced on the Web, they don't take up bandwidth when the user is downloading the message and disk space in the user's mailbox.
They do, however, take up bandwidth every time the user reads the message, when the image needs to be downloaded from the Web.
If the same image is referenced in a number of e-mail messages using the same URL, however, the user's program will probably cache it and it won't have to be downloaded and stored repeatedly; attaching the image would take up space and time in every message in which it appears.
If the user is offline while reading mail (as often happens when a dialup connection is used; the user downloads the mail then disconnects to save connection charges and avoid tying up the phone line), the images won't be displayed if they're on the Web rather than attached.
On the other hand, if a mail program only displays plain text messages, but can send HTML e-mail to a separate Web browser to be displayed, then attached images probably won't work there, but images from the Web will display correctly.
A sender without access to a Web server to post files has no way to send images by the web, but can still attach images directly.
Spammers sometimes embed specially-named images called "Web Bugs", whose names encode the specific recipient of the message; when these images are requested from the Web, this sends a signal that the message was read so that they know your address is a "live prospect" who can be spammed further. Because of this, some mail programs don't display remote images from the Web by default; the user has to specifically tell the program to show images in a particular message, or they'll show up blank.
Senders of bulk messages (which includes newsletters going to willing subscribers; not all bulk mailers are spammers!) can generally get out their mailings more quickly and efficiently with images on the Web rather than attached; the messages are then smaller in size and transmit faster, while the server load to send images from the Web server is spread out over hours or days as the messages are read.
Still, if it's a very large bulk mailing, load on the Web server to serve the images may be heavy; you'd better have a server that's up to this load (much of which will come all at once within moments of sending the message, as the more attentive readers open it immediately).
Users might keep messages archived in their mail program's folders for a long time (I've got some archived messages from years ago), but images on Web servers might go away eventually. If they do, the old archived messages will no longer display correctly.

As you can see, there are arguments to be made for both approaches, but on the whole, attached images usually work better than remote ones.

Again, as usual, Microsoft mail clients have their own nonstandard ways of doing things, providing yet another way images get attached: the "outbind:" URI scheme. I haven't been able to find any actual documentation of this (apparently unregistered) scheme, but it seems to be prefixed to a URI of an image on the Web, like outbind://14/http://www.somesite.example/test.jpg, and then the image itself is actually attached to the message, but instead of a "Content-Id" header, it has a "Content-Location" header with the image URI. This apparently tells the mail viewer that it can either view the image from the attachment or fetch it from the Web. Surprisingly, I've actually observed this to work in non-Microsoft mail clients as well. I don't know what the number (14 in the above example) means, and its being preceded with a double slash seems bogus given that this is intended in URI syntax to precede a hostname or other authority.

Single-Part HTML-Only Messages

There are a few mail programs (Hotmail seems to be the main offender) that send HTML mail as a single part, not a multipart message with both text and HTML versions. Their creators probably justified this on the grounds that hardly any mail program these days doesn't support HTML, so there's no need to waste space attaching a text version too. However, doing this is a bad idea for a number of reasons:

Believe it or not, there are still some people reading mail in non-HTML-supporting readers. This includes some grizzled system administrator types, set in their ways of reading mail in text mode from a Unix prompt like they've been doing for the last 20 years or so. You don't want to anger these people... they're the ones who keep your servers running!
Some users have mail programs perfectly capable of HTML, but choose to display the plain text version instead, which they find more readable without the graphical "fluff".
Some users even spam-filter HTML-only messages, because the vast majority of them are spam. Spammers tend to send HTML messages with no plain text version (they're so in love with their snazzy graphical ads that they wouldn't dream of trying to duplicate their content in something as dull as plain text), while most regular HTML e-mail is multipart. So your single-part HTML message might not be seen by its recipient. (Unfortunately, there are some spammers that get past these filters by doing multipart messages where the plain text version is something rude like "Your mail program doesn't support HTML, so you can't read this." Like I should get a different mail program just to read their spam!)
It'll probably screw up in AOL, too... see the next article.
If you write on mailing lists, you may find that some of them reject non-plain-text messages. If your HTML e-mail is multipart, the list software will probably just strip the HTML portion and use the plain text one, maybe adding a line like [Non-text portions of this message have been removed] (which should clue you into thinking that maybe you ought to switch to sending text only so you don't get this added every time), but at least your message will get sent. If you use no-alternative HTML, it will be rejected altogether.
Even in mailing lists that accept HTML mail, there may be digest or archive versions that use only the plain text versions of messages. Your HTML-only message may get removed altogether there, and replaced with a note like [This message is not in displayable format]. If you want everybody to be able to read your writing, avoid this!

Thus, you should avoid this format. If your mail program only sends HTML mail this way, it's all the more reason' to switch to plain text.

Unfortunately, some mail programs that send multi-part messages with a plain-text version along with an HTML version do the plain-text one badly, and you never notice if your own mail program shows you only the HTML version while viewing messages. Sometimes, the plain-text message has no clear separation between quoted material and responses, if this distinction in the HTML version was made through things like colors and fonts that go away when the HTML tags are stripped. Other bizarre things sometimes show up in the plain-text version, like the word "Message" being added awkwardly at the beginning of the text because that was the TITLE element of the HTML version and the part of the mail program that creates the text version stupidly grabs it as part of the text. But, even worse, there are some messages (usually part of bulk mailings, but this doesn't mean it's just spam; it happens in legitimate bulk mailings such as subscribed-to newsletters) that have a completely empty plain-text version, so that if your mail program is configured to show plain text in preference to HTML, you see nothing at all. This is apparenly the result of a program that's set up to include both formats, but require the sender to set up the contents of each version separately (not a bad idea for bulk mailings, as it allows the sender to create well-formatted versions for each instead of having the text version created automatically, and often badly, from the HTML version), but the sender failed to supply any plain text, so that part ended up empty. If you're going to do that, you shouldn't include a plain text version at all. Some mail programs can cope with the lack of a plain text version better than an empty one; when you choose to display plain text in preference to HTML, it still displays HTML if that's all there is, but displays a plain text version (even an empty one) instead if present.

Recipients who Complain about Plain Text?

I thought I'd heard everything, pro and con, about text vs. HTML e-mail, but in this forum somebody actually said that recipients of his business-related mails got "disgusted" by plain-text e-mails he sent. That's an experience I've never had. He later clarified that he had recipients who experienced difficulties when they replied to plain-text messages; their HTML-format signature blocks came out as a mess of ugly code. This sounds to me like either a broken mail reader or a misconfigured one; perhaps it has the ability to create and specify separate signature blocks for plain text and HTML messages but the user foolishly configured the HTML version of his signature for plain-text use. Anyway, that shouldn't be the sender's problem.

Email Rejection: An Amusing Example

As I've noted, some recipients won't accept HTML-formatted e-mail or other mail with non-text attachments, because it triggers filters designed to keep out spam or viruses. Among those who bounce non-text messages are some companies' technical support and customer service departments, who will send back messages with attachments and tell you to resend them as plain text. One amusing example of such is Bonzi, which has a free download that supposedly "enhances" your PC experience (I don't recommend you install it; it's reputed to be annoying adware, and maybe "spyware" too; once it gets into your system, it won't go away, and might also be sending personal info of yours to its manufacturer). Anyway, their automated response they send to anybody who e-mails them in non-text form (which I found out because apparently some virus e-mailed itself to them forging my address as the "From" line, triggering this response to me even though I had never e-mailed them in my life), includes this passage:

BonziMAIL messages include attachments and are not accepted by our mail system. Please open your regular e-mail program and write us a message using only plain text.

So, apparently, their own e-mail program, that's included as part of the software you download from them, produces mail of a format that they, themselves, reject!

And Now for Something Even More Amusing...

Even worse than ordinary HTML e-mail is the heaps of somewhat HTML-like garbage spewed out by some mail programs, especially the ones from Microsoft. A particularly bizarre part of this is the peculiar, pointless, series of proprietary conditional comments such as , which apparently are telling the browser to insert a nonbreaking space within a paragraph if it doesn't support empty paragraphs (and just why this monstrosity of code is actually better than just inserting the damn nonbreaking space whether it's necessary or not I have no idea). A number of mail readers have a problem with this weird code, and might spit out a heap of raw junk instead of the well-formatted message the sender intended. Now, here's the weirdest instance of this I've seen yet; somebody apparently ordered a custom decorated cake by e-mail, and what they got was... follow this link (or this one) to find out.

And, speaking of Microsoft's mail programs, just when you think they couldn't get any worse, they've "downgraded" the HTML support in the newest version of Outlook (in 2007) so that it renders things less compatibly with normal browsers (even Microsoft's own one); instead, it uses the crippled MS Word HTML renderer. This has the effect of breaking a lot of fancy-formatted HTML messages for users of that crappy program, and is one more reason for senders to stick to plain text. (More info; still more here and here.)

Links

Why You Should Use Plaintext Email
HTML Email is Evil
HTML Email -- Still Evil?
HTML Email Isn't Rich
The Dying Art of Plain Text Email
E-mail is not a platform for design
Why you shouldn't make newsgroup postings in HTML
Configuring your e-mail program to use plain text
Email Style and Formats -- a (somewhat outdated) discussion of the interoperability problems caused by various mail programs' "enhancements" to mail format
RFC 2111 -- cid: and mid: URLs
This official page on the Microsoft site actually suggests reading mail as plain-text only as a workaround for a security problem with HTML e-mail.

Next: What do you call a mail reader that doesn't handle real HTML, but tries to render a limited subset of HTML tags -- even in plain text messages? AOL calls it "HTML Lite", but "Half-Assed HTML" is a better name for it.

[<== Previous] | [Up] | [Next ==>]