RFC2047: (MIME) Part Three: Message Header Extensions for Non-ASCII Text

  • Obsoletes: rfc1521, rfc1522, rfc1590

  • Updated by: rfc2184, rfc2231

  • Category: Standards Track

  • November 1996

1. Introduction

  • Generally, an "encoded-word" is a sequence of printable ASCII characters that begins with “=?”, ends with “?=”, and has two “?”s in between. It specifies a character set and an encoding method, and also includes the original text encoded as graphic ASCII characters, according to the rules for that encoding method.

  • A mail composer that implements this specification will provide a means of inputting non-ASCII text in header fields, but will translate these fields (or appropriate portions of these fields) into encoded-words before inserting them into the message header.

  • A mail reader that implements this specification will recognize encoded-words when they appear in certain portions of the message header. Instead of displaying the encoded-word “as is”, it will reverse the encoding and display the original text in the designated character set.

  • This memo relies heavily on notation and terms defined RFC 822 and RFC 2045.

2. Syntax of encoded-words

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

charset = token    ; see section 3

encoding = token   ; see section 4

token = 1*<Any CHAR except SPACE, CTLs, and especials>

especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
            <"> / "/" / "[" / "]" / "?" / "." / "="

encoded-text = 1*<Any printable ASCII character other than "?" or SPACE>
               ; (but see "Use of encoded-words in message headers", section 5)
  • An ‘encoded-word’ may not be more than 75 characters long, including ‘charset’, ‘encoding’, ‘encoded-text’, and delimiters.

  • If it is desirable to encode more text than will fit in an ‘encoded-word’ of 75 characters, multiple ‘encoded-word’s (separated by CRLF SPACE) may be used.

  • ‘encoded-word’s are designed to be recognized as ‘atom’s by an RFC 822 parser.

  • As a consequence, unencoded white space characters (such as SPACE and HTAB) are FORBIDDEN within an ‘encoded-word’.


            =?iso-8859-1?q?this is some text?=

    would be parsed as four 'atom's, rather than as a single 'atom' (by an RFC 822 parser)
            or 'encoded-word' (by a parser which understands 'encoded-words').

    "this is some text" in the ISO-8859-1 character set.
    q: indicates that the quoted-printable encoding method is used

3. Character sets

  • The ‘charset’ portion of an ‘encoded-word’ specifies the character set associated with the unencoded text.

  • Some character sets use code-switching techniques to switch between “ASCII mode” and other modes.

4. Encodings

  • Initially, the legal values for “encoding” are “Q” and “B”.

  • The “B” encoding is identical to the “BASE64” encoding defined by RFC 2045.

  • The “Q” encoding is similar to the “Quoted-Printable” content-transfer-encoding defined in RFC 2045.

5. Use of encoded-words in message headers

  • 【定义】Encoded-Word是一种在电子邮件中用于表示非ASCII字符的方式。它使用一种特殊的编码格式将非ASCII字符转换成ASCII字符,以便在电子邮件中传输和显示。

  • Encoded-Word可以在电子邮件的头部和正文部分的某些特定位置出现,例如主题、发送者和接收者名称等位置。

  • Encoded-Word是电子邮件国际化的关键技术之一,可以让电子邮件支持不同语言的字符集。

  • Encoded-Word的格式:


6. Support of ‘encoded-word’s by mail readers

  • A mail reader must parse the message and body part headers according to the rules in RFC 822 to correctly recognize 'encoded-word's.

7. Conformance

  • 以 “=?” 开头并以 “?=” 结尾的非空格可打印 ASCII 字符串都是一个有效的 ‘encoded-word’

8. Examples


From: =?US-ASCII?Q?Keith_Moorex<moore@cs.utk.edu>
To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
CC: =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD@vm1.ulg.ac.be>
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=