RFC2047: (MIME) Part Three: Message Header Extensions for Non-ASCII Text¶
Obsoletes: rfc1521, rfc1522, rfc1590
Updated by: rfc2184, rfc2231
Category: Standards Track
November 1996
1. Introduction¶
Generally, an
"encoded-word"
is a sequence of printable ASCII characters that begins with “=?”, ends with “?=”, and has two “?”s in between. It specifies a character set and an encoding method, and also includes the original text encoded as graphic ASCII characters, according to the rules for that encoding method.A mail composer that implements this specification will provide a means of inputting non-ASCII text in header fields, but will translate these fields (or appropriate portions of these fields) into encoded-words before inserting them into the message header.
A mail reader that implements this specification will recognize encoded-words when they appear in certain portions of the message header. Instead of displaying the encoded-word “as is”, it will reverse the encoding and display the original text in the designated character set.
This memo relies heavily on notation and terms defined RFC 822 and RFC 2045.
2. Syntax of encoded-words¶
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
charset = token ; see section 3
encoding = token ; see section 4
token = 1*<Any CHAR except SPACE, CTLs, and especials>
especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
<"> / "/" / "[" / "]" / "?" / "." / "="
encoded-text = 1*<Any printable ASCII character other than "?" or SPACE>
; (but see "Use of encoded-words in message headers", section 5)
An ‘encoded-word’ may not be more than 75 characters long, including ‘charset’, ‘encoding’, ‘encoded-text’, and delimiters.
If it is desirable to encode more text than will fit in an ‘encoded-word’ of 75 characters, multiple ‘encoded-word’s (separated by CRLF SPACE) may be used.
‘encoded-word’s are designed to be recognized as ‘atom’s by an RFC 822 parser.
As a consequence, unencoded white space characters (such as SPACE and HTAB) are FORBIDDEN within an ‘encoded-word’.
示例:
错误的示例:
=?iso-8859-1?q?this is some text?=
would be parsed as four 'atom's, rather than as a single 'atom' (by an RFC 822 parser)
or 'encoded-word' (by a parser which understands 'encoded-words').
正确的示例:
=?iso-8859-1?q?this=20is=20some=20text?=
解析结果:
"this is some text" in the ISO-8859-1 character set.
说明:
q: indicates that the quoted-printable encoding method is used
3. Character sets¶
The ‘charset’ portion of an ‘encoded-word’ specifies the character set associated with the unencoded text.
Some character sets use code-switching techniques to switch between “ASCII mode” and other modes.
4. Encodings¶
Initially, the legal values for “encoding” are “Q” and “B”.
The “B” encoding is identical to the “BASE64” encoding defined by RFC 2045.
The “Q” encoding is similar to the “Quoted-Printable” content-transfer-encoding defined in RFC 2045.
5. Use of encoded-words in message headers¶
【定义】Encoded-Word是一种在电子邮件中用于表示非ASCII字符的方式。它使用一种特殊的编码格式将非ASCII字符转换成ASCII字符,以便在电子邮件中传输和显示。
Encoded-Word可以在电子邮件的头部和正文部分的某些特定位置出现,例如主题、发送者和接收者名称等位置。
Encoded-Word是电子邮件国际化的关键技术之一,可以让电子邮件支持不同语言的字符集。
Encoded-Word的格式:
=?charset?encoding?encoded-text?= 说明: charset表示字符集, encoding表示编码格式, encoded-text表示经过编码后的文本 常用的编码格式有: Quoted-Printable Base64
6. Support of ‘encoded-word’s by mail readers¶
A mail reader must parse the message and body part headers according to the rules in RFC 822 to correctly recognize
'encoded-word's
.
7. Conformance¶
以 “=?” 开头并以 “?=” 结尾的非空格可打印 ASCII 字符串都是一个有效的 ‘encoded-word’
8. Examples¶
示例:
From: =?US-ASCII?Q?Keith_Moorex<moore@cs.utk.edu>
To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
CC: =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD@vm1.ulg.ac.be>
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
=?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=