|
Dan's Mail Format Site:Headers: MIME[<== Previous] | [Up] | [Next ==>] In the context of Internet mail format, MIME does not refer to a silent entertainer; rather, Multipurpose Internet Mail Extensions are the method by which e-mail was transformed from plain text in the ASCII character set to something much more versatile. Here are the facts about these new features. E-Mail from Plain to FancyTraditionally, e-mail could only contain plain text. Though some of us still prefer it this way, traditional plain-text e-mail could be limiting even for those who disdain multimedia fanciness; messages were limited to the ASCII character set, which is fine for English, but not so great for the other languages of the world. And even the traditionalists who stuck to plain text most of the time would sometimes have the desire to use e-mail to send data of other sorts, like to send family pictures to Mom or a spreadsheet to the boss. To make these things possible, the MIME standard was adopted, superseding a patchwork of earlier half-baked techniques that allowed some clunky attempts to send non-text data by e-mail. This was a great success, and almost all mail programs now support it. MIME has enabled all sorts of things ranging from useful to entertaining to pointless to harmful... maybe you'll catch an e-mail virus from Mom and get a pointlessly bloated MS-Word attachment from your boss containing a memo that could and should have been done in plain text... but, as you engage in an e-mail exchange in Chinese with the Hong Kong branch office, aren't you glad that international character sets are now supported via MIME e-mail? What is MIME?The MIME standard consists of the definition of a few new message headers, which indicate what sort of content is in a message -- what content type (plain text, HTML, graphics, etc.) and how it's encoded. Some of the content types are "multipart", meaning that they define a complex message structure with more than one part, each of which has headers of its own. These parts can be nested in arbitrarily complex ways, allowing for an enormous degree of versatility in expressing structured data within an e-mail message. As we'll see later when multipart messages are described, each part has its own set of MIME headers, in addition to the headers at the beginning of the message. MIME not only revolutionized e-mail, it was also adopted as a major part of the World Wide Web; the HTTP protocol uses many of the same MIME headers as e-mail messages. The MIME HeadersHere are the headers used in a MIME message. MIME-Version
This must always be present in any MIME message, and the only recognized value for it is Since parenthesized comments are permitted in message headers, the following is a valid Mime-Version header: MIME-Version: 1.0 (produced by FooSoft 4.5) Content-Type
This header defines what type of data is being sent, using what is known as "MIME types".
A MIME type is a string that identifies a data format. MIME types always have a slash in them, separating a major type
from a subtype. For instance,
Text-based types, like Content-type: text/plain; charset=iso-8859-1 In this case, the ISO-8859-1 (Latin-1) character set is specified, giving a character range that includes many accented letters in addition to the normal ASCII characters. Content-Transfer-Encoding
Since e-mail messages (including MIME messages) are still supposed to be limited to the characters in the ASCII set,
to ensure compatibility with programs that might not be able to handle anything else, any non-ASCII things (including
text with other characters and binary files) needs to be encoded in a manner that can be transmitted in plain ASCII.
Two encodings are defined in the MIME standards, Content-Transfer-Encoding: quoted-printable Content-ID
The Content-ID: <5.31.32252.1057009685@server01.example.net>
The standards don't really have a lot to say about exactly what is in a That's just an example of how a unique content ID can be generated; different programs do it differently. It's only necessary that they remain unique, a requirement that is necessary to ensure that, even if a bunch of different messages are joined together as part of a bigger multi-part message (as happens when a message is forwarded as an attachment, or assembled into a MIME-format digest), you won't have two parts with the same content ID, which would be likely to confuse mail programs greatly.
There's a similar header called
When referenced in the form of a Web URI (the term "URL" is being deprecated by the newest proposed Web standards
in favor of "URI"), content IDs and message IDs are placed within the URI schemes cid:5.31.32252.1057009685@server01.example.net Content-Description
Content-Disposition
The
Parameters can be appended, separated by a semicolon from the main value; the most common parameter is
A sample Content-Disposition: attachment; filename="test.jpg" Multipart MIME Message Bodies
As has been mentioned, the MIME types starting with
A multipart unit has some special parameters in its Content-Type: multipart/related; boundary="----=_NextPart_32252.1057009685.31.001"; type="multipart/alternative"
The
The After the message headers, and the blank line that terminates the headers, the multipart message continues like this: This is a multi-part message in MIME format. ------=_NextPart_32252.1057009685.31.001 Content-Type: multipart/alternative; boundary="----=_NextPart_32252.1057009685.31.002" Content-Description: Message in alternative text and HTML forms ------=_NextPart_32252.1057009685.31.002 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Description: Message in plain-text form Some plain text goes here. First comes the line that says "This is a multi-part message in MIME format.", for the benefit of any non-MIME-capable reader who might be wondering what the message is. Then comes the boundary marker, as defined in the headers; you may note that it begins here with six dashes while the definition had only four; that's because the standards call for the boundary to be preceded by two dashes.
The boundary marker is followed by the MIME headers for the next part; it's unnecessary to include the
If a part is itself a multipart entity, it has its own boundary marker (which must be different from the outer one; note here that the inner boundary ends in "002" while the outer one ended in "001"). The parts of the inner multipart unit follow; here, we see the beginning of a plain text portion. The end of that part and the beginning of the next one looks like this: More plain text goes here. ------=_NextPart_32252.1057009685.31.002 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Content-Description: Message in HTML form <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html40/strict.dtd"> <html> <!-- HTML code goes here --> Once all parts are complete, the end is marked with a boundary marker followed immediately by two more dashes, at which point the outer multipart group resumes with its next part: </html> ------=_NextPart_32252.1057009685.31.002-- ------=_NextPart_32252.1057009685.31.001 Content-Type: image/gif Content-Transfer-Encoding: base64 Content-Description: Graph Content-Disposition: inline; filename="image.gif" Content-ID: <1.31.32252.1057009685@server01.example.net>
The end of the message is signalled by a top-level boundary marker, two dashes, and then ------=_NextPart_32252.1057009685.31.001-- -- End -- Non-ASCII Characters in HeadersThe MIME standards also provide a way to get characters outside the ASCII set into the headers themselves, which is useful for people whose names include accented letters, for instance. Unfortunately, such headers look like a mess when viewed in raw mode: =?iso-8859-1?Q?l'=E9te_c'est_arrive=E9!?= By the standard, an "encoded word" is a sequence of characters that begins with "=?", ends with "?=", and has two "?"s in between. (That means that this sequence of characters had better not occur accidentally in a header, or else it'll be interpreted as an encoded word by MIME-compatible programs.) After the first question mark is the name of the character encoding being used; after the second question mark is the manner in which it's being encoded into plain ASCII (Q=quoted printable, B=base64); and after the third question mark is the text itself. The above format for encoding special characters is the one specified in RFC 2047. However, it has some drawbacks such as the lack of any means of specifying particular encoding information for parameters such as filenames that are appended to MIME headers. For this, a newer and more versatile format was specified later in RFC 2231, which supports providing encoding and language code information for each parameter, and breaking up of such parameter values to multiple lines (often necessary when special characters are used which require lengthy encoding sequences). This looks like:
Content-Type: application/x-stuff This format, using asterisks following the name of the parameter, has a sequential number (0, 1, 2) indicating that the lines are part of the same parameter, then has an encoding (us-ascii), a language code (en for English), and the encoded value, all separated by single quotes. Unfortunately, many mail programs still fail to support this "new" (even though about a decade old by now) format, and will fail to correctly parse the filename. This has led some developers to cause programs to output filenames in standards-noncompliant formats that nevertheless work in commonly used programs, when special characters are needed. Links
Next: There are other headers besides the ones I've discussed so far. Here is information about some of them. [<== Previous] | [Up] | [Next ==>]
This page was first created 01 Jul 2003, and was last modified 21 Jan 2007.
|