Dan's Mail Format Site | Body | Line Length

Dan's Mail Format Site:

Body: Line Length

[<== Previous] | [Up] | [Next ==>]

Most e-mail users don't give much thought to the length of lines in their messages; they just let their mail programs wrap the lines for them in both incoming and outgoing messages, so they don't even know how long the lines actually are in the message as transmitted. However, the standards give definite rules about e-mail line length, and problems can occur when the standards are not followed. Unfortunately, the default configuration of some popular mail programs does not conform to the standards.

How Long Should Lines Be?

The standards document for the format of e-mail messages, RFC 2822 (the successor to RFC 822, the classic document that established the standards that have been followed ever since), says this about the length of lines in an e-mail message:

There are two limits that this standard places on the number of characters in a line. Each line of characters MUST be no more than 998 characters, and SHOULD be no more than 78 characters, excluding the CRLF.

Update: As of 2008, a new document, RFC 5322, has been released to update the standard; however, it retains the above wording regarding line length.

There are good reasons for these rules. E-mail is read by a variety of programs, on a variety of systems; the "lowest common denominator" is considered to be an 80-column text-mode display. Limiting lines to 78 characters ensures that they fit on such a display without lines going off the right edge or wrapping awkwardly. Thus, the "SHOULD" clause in the standard, describing what ought to be done if possible (though it can be ignored in special cases, for instance to include a long URL without breaking it in the middle). The "MUST" clause gives a "hard" upper limit of 998 characters, beyond which you stand a chance of filling the input buffer in some programs and causing serious problems.

For practical purposes, lines should be broken at even less than 78 characters, since when a message is quoted back in a reply it might have angle brackets prefixed to it. The netiquette guidelines in RFC 1855 suggest limiting lines to 65 characters. This is a very conservative value; some users don't go this far, and use 70 or 75 instead. A high number could run into problems if the message is repeatedly quoted.

Line Length of Outgoing Messages

In the beginning (back in the Stone Age), senders of e-mail achieved their desired line length by hitting ENTER at the appropriate point on each line. This seemed natural at the time, since you have to do the same thing on a typewriter. Also, programmers (who were in the majority in that geek-ruled era) are used to using text editors to edit program code, where lines must be broken manually. A big disadvantage of this, aside from having to keep noticing how far along you are in the current line so you know when to hit ENTER, is that if you go back and edit a message by adding and removing words, the line lengths will become uneven and force you to do lots of adding and dropping of line-break characters to get them back where they belong.

To solve this problem, mail programs began to automatically wrap lines when you reached the end, scanning back to the beginning of the current word to move it over to the next line. This is known as "word wrapping", and it ensures consistent line length without your having to add the line breaks yourself. You could still do a manual line break if you want, to mark a paragraph end by inserting a blank line, or to cause lines to break at particular points (e.g., for poetry or tabular data). The program might then re-wrap a paragraph if you edit it, though in this case it might accidentally remove line breaks the author put in manually and wanted to keep.

The next stage in the evolution was for mail programs to do what word processors already did (in contrast to text editors); not put in any line breaks except for "hard" line breaks typed by the writer. While a message was being typed or edited, its text (as stored in the computer's memory) would only have carriage returns or linefeeds in between paragraphs (or at other places they were explicitly inserted, like in poetry or song lyrics). Effectively, each paragraph would be one long line, but it would be shown on-screen with word wraps as appropriate, changing as you edit it. Then, when you hit the "Send" button, the appropriate line breaks would be added to produce a standards-compliant outbound message.

This worked fine, in general, but some programs started skipping the last step and sending the messages with long lines. This may have seemed to make sense to the programmers (after all, word processors save their text this way too), but it was in violation of the standards for e-mail messages. Lines would be well over 78 characters long, and sometimes even over the "hard limit" of 998 characters. But what's the problem? The mail reader at the other end can just word-wrap the long lines, can't it? It's not always that simple, as we'll soon see.

Line Length of Incoming Messages

If the sending and receiving mail programs were identical, and made the exact same assumptions regarding how to deal with line length and re-wrapping of text, then everything would work well for everybody. However, that is not the case. A wide variety of mail programs are in use, with a wide variety of presentation styles, and the only common ground they have is the set of standards the Internet community has adopted, including the ones regarding line length. While many mail programs do re-wrap long lines, they don't necessarily do it in the same exact manner. Other programs (including ones that put e-mail messages onto Web pages for archiving) don't wrap long lines at all, so they end up scrolling off endlessly to the right. Some programs, additionally, will truncate lines longer than the "hard limit" of 998 characters, so that parts of the message will be missing even if the lines are re-wrapped -- the truncation takes place before the re-wrapping begins.

Here is a screenshot showing what a message violating line-length standards might look like to a reader:

[Screenshot 1]

Actually, in that mail program, I have the option to get it to reformat paragraphs (though that still fails if lines are over 1000 characters long). Unfortunately, setting that option causes malformatting in some other cases:

[Screenshot 2]

Note the "quote symbols", which belong at the left edge of the message, but got re-wrapped in a really ugly way into the middle of the text. The result is that I need to continually go back and forth between the two modes (especially difficult while reading a mailing list digest, where the reader jumps back to the top of the digest whenever the mode is changed). However, a standards-conformant message will come out fine regardless of which setting is used. At least, in most cases (though not in the above one) Pegasus will insert quote symbols at the left edge of each line of a rewrapped quote (this will work if there is one properly-positioned quote symbol at the start of the line), so it will still look like a quote. Many other mail programs fail to do this, and rewrapped quoted long lines still just have one ">" sign at the very beginning and it's hard to see at a glance which part of the entire message is a quote (and it might even get re-wrapped into a following or preceding paragraph that isn't a quote, if a completely blank line isn't placed between them).

Among the sorts of messages that get messed up badly by automatic reformatting are those that include pieces of raw program code, tabular data, log file dumps, and ASCII art; anything where the line breaks have logical, structural significance that will be damaged by rewrapping. Technical users are more likely than others to need to send and receive data of this sort, which is one reason why we are more sensitive to this issue than average people. However, some things used even by non-techies are damaged by rewrapping, including long URLs. The reason why the RFC document only makes the 78-character limit a guideline rather than a hard limit is the fact that there are occasional things that need to be sent with longer lines. Unfortunately, when mail programs get into the habit of re-wrapping things, such things get damaged in the process.

Here's an example of how a message with long lines might come out in a Web archive, where the <PRE> element is used to display plain-text messages exactly as sent:

[Screenshot 3]

Format Flowed Text

The standards makers have actually come up with a solution to the hard-breaks vs. rewrapping problem, though unfortunately not many mail programs have yet adopted it. RFC 2646 introduces the format=flowed parameter to the text/plain MIME type. When a mail program includes this in the content-type header of a message, it signals that paragraphs can be re-wrapped in accordance with the standards of the RFC document.

There are a number of rules about how flowed-format messages should be generated and displayed (see the RFC document for details), but the main thing is that if a line ends in a space, that signals that the carriage return / linefeed at the end of the line is just a "soft break" which can be removed in order to reformat the paragraph to the reader's window size. Without a trailing space, the CR/LF is a "hard break" which should be preserved. Thus, the sending program can generate a message that complies with the traditional standard, with lines no longer than 78 characters, but still indicate which parts of the message can be reformatted as if the line breaks weren't there.

To a mail reader that doesn't understand the flowed format, the message is in standard plain text with appropriate line breaks; it looks perfectly natural. Thus, this format "degrades gracefully" for older mail readers, always an important thing for any new standard (and often ignored by those who design and implement "improvements"). To a mail reader that does understand it, it allows for flexible reformatting, while letting the sender put in hard breaks in poetry, lyrics, or formatted reports which won't be messed up. It's a "win" for everybody.

...Well, almost everybody... there are some who still find that the minor changes this format makes to plain text are intolerable when viewed in nonsupporting mail readers. In particular, format-flowed messages have extra spaces added at the beginning of lines in some situations, which are then stripped off at the other end -- if the receiving mail program supports format-flowed. If it doesn't, they stay there, and perhaps things that were intended to line up don't. (Of course, in any mail reader that uses proportionally-spaced fonts instead of fixed-width fonts, things won't line up either... one should never count on it!) See this discussion among developers of Mozilla, one of the programs that supports this format.

Quoted Printable Encoding

Some people think that the use of Quoted Printable encoding (mentioned in the MIME, character sets, and attachments sections) is a "solution" to this line-length problem, because it will put in line breaks to bring the message in line with the standards, with an equal sign (=) at the end of each such broken line to indicate that it is a "soft" line break. True, a quoted-printable message does comply with the standards for message transmission, and will avoid the possibility of long lines being truncated or otherwise messed up somewhere in between the sender and the recipient. However, once the receiving mail program decodes the encoding (assuming it supports Quoted Printable; if not, you end up with a somewhat messy, though readable, message with lots of equal signs in it), the soft breaks are taken back out again, and you're left with the standards-noncompliant long-line message format. The Quoted Printable encoding is applied (and decoded) at a different level of the protocol than the above-referenced "format flowed". It's merely a transmission encoding, decoded before the message is sent on to the part of the mail program that must display it; if the display routines don't handle long lines well, that won't change just because they're encoded with temporary breaks. On the other hand, "format flowed" is a parameter to the content type, intended to make a display suggestion which the rendering program can accept or ignore. A program that doesn't want to deal with re-wrapping of lines can display it as plain text with the breaks intact. So don't rely on Quoted Printable encoding to solve a line length problem; make sure the lines are of a reasonable length before the encoding.

What Should I Do?

To ensure that your outbound mail follows the proper line length conventions, take a look at the configuration settings of your mail program. Often, there will be an item somewhere in there that says how lines are to be wrapped; be sure it's set to wrap at 70 characters or less.

Unfortunately, there are a few mail programs that ignore this setting in some cases. Notably, Yahoo Mail always uses infinitely-long lines (well, to be mathematically correct, the lines are finite, but they can be indefinitely long!) if you choose to send messages in HTML form, the apparent rationale being that HTML ignores line breaks anyway (paragraphs get word-wrapped and are delineated with tags, not carriage returns or linefeeds). The trouble is that even the plain text version of such messages gets sent without line breaks. The solution is to always send plain-text e-mail only from these programs. Depending on what browser you're using (the mail editor options are different depending on user-agent version), make sure to select the plain-text editor rather than the "rich text" one, and/or not to check the "Use HTML tags" box when composing a message. There are other reasons why HTML e-mail is usually a bad idea, as shown in an earlier article; this adds yet another reason. I used to think Outlook Express was the same way, using infinitely-long lines when HTML format was selected, but some further experimentation eventually determined that line breaks were suppressed when Quoted Printable encoding was enabled (which was done by default in HTML messages but not plain text messages); by disabling this encoding, you can get proper line breaks even if sending in HTML form.

Links

Next: How can you use accented letters from foreign languages in your e-mail? Can you use "curly" or "smart" quotes? Why do other people's messages sometimes look like somebody swearing in a comic strip? Find out this and more in the article on character sets.

[<== Previous] | [Up] | [Next ==>]

 

This page was first created 11 May 2003, and was last modified 12 Jul 2009.
Copyright © 2003-2018 by Daniel R. Tobias. All rights reserved.

webmaster@mailformat.dan.info