RFC2045: (MIME) Part One: Format of Internet Message Bodies¶
Obsoletes: rfc1521, rfc1522, rfc1590
Updated by: rfc2184, rfc2231, rfc5335, rfc6532
Category: Standards Track
November 1996
RFC 2045 Multipurpose Internet Mail Extensions (MIME)
“Internet media type”,也被称为”Multipurpose Internet Mail Extensions (MIME) type”
是一种标识数据类型的格式,最初用于电子邮件,现在广泛应用于Web上的HTTP和其他Internet协议中。
这种格式通常由两个部分组成:
1. 大类别(例如,文本或图像) 2. 特定于该类别的子类别(例如,纯文本或PNG图像)
一些常见的Internet media type:
text/plain:纯文本格式,不包含任何样式或格式化
text/html:HTML文档,用于在Web浏览器中呈现网页
application/json:JSON(JavaScript对象表示)格式的数据,用于在Web应用程序之间传递数据
application/xml:XML(可扩展标记语言)格式的数据,用于在Web应用程序之间传递数据
image/jpeg:JPEG图像格式,常用于存储和传输照片和其他图像
image/png:PNG图像格式,常用于图像透明度和高品质的图像存储
audio/mpeg:MP3音频格式,常用于音乐和其他音频文件
video/mp4:MP4视频格式,常用于存储和传输视频文件
1. Introduction¶
讨论了 RFC 822 在文本邮件格式上的限制和其它邮件格式的限制,以及为了克服这些限制所需的补充规范。
介绍了 MIME(Multipurpose Internet Mail Extensions)协议的一些主要机制,如 MIME-Version 头字段、Content-Type 头字段、Content-Transfer-Encoding 头字段、Content-ID 头字段和 Content-Description 头字段,并指出了这些机制的重要性。
兼容性和实用性在 MIME 开发过程中的重要性,并指出了 MIME 协议的标准化状态。
2. Definitions, Conventions, and Generic BNF Grammar¶
The section provides definitions, conventions, and a generic BNF grammar for understanding the set of documents specified in the rest of the document.
It highlights the importance of familiarity with the augmented BNF notation of RFC 822 for implementers.
The section also defines terms such as CRLF, character set, message, entity, body part, and body, which are used throughout the set of documents.
Additionally, the section explains that notes provide additional nonessential information that can be skipped by readers focused on implementing conformant solutions.
3. MIME Header Fields¶
header fields is as follows:
entity-headers := [ content CRLF ]
[ encoding CRLF ]
[ id CRLF ]
[ description CRLF ]
*( MIME-extension-field CRLF )
MIME-message-headers := entity-headers
fields
version CRLF
; The ordering of the header
; fields implied by this BNF
; definition should be ignored.
MIME-part-headers := entity-headers
[ fields ]
; Any field not beginning with
; "content-" can have no defined
; meaning and may be ignored.
; The ordering of the header
; fields implied by this BNF
; definition should be ignored.
4. MIME-Version Header Field¶
This document is an independent specification that complements RFC 822.
a formal BNF is given for the content of the MIME-Version field:
version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT
只有以下版本才兼容本spec:
MIME-Version: 1.0
When checking MIME-Version values any RFC 822 comment strings that are present must be ignored. In particular, the following four MIME-Version fields are equivalent:
MIME-Version: 1.0
MIME-Version: 1.0 (produced by MetaSend Vx.x)
MIME-Version: (produced by MetaSend Vx.x) 1.0
MIME-Version: 1.(produced by MetaSend Vx.x)0
示例¶
有一封电子邮件,包含一个HTML文档和一张JPG图片:
From: sender@example.com
To: recipient@example.com
Subject: MIME-Version Example
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=boundary-example
--boundary-example
Content-Type: text/html
<html>
<body>
<h1>Hello, World!</h1>
<p>This is an example of a MIME message.</p>
</body>
</html>
--boundary-example
Content-Type: image/jpeg
Content-Transfer-Encoding: base64
/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBwcHBw8LCwkMEQ8SEhEP
E...
备注
在这个示例中,MIME-Version被设置为1.0,表明该邮件遵循MIME规范。邮件的Content-Type头字段设置为multipart/mixed,表明该邮件包含多个部分。邮件的第一个部分是一个HTML文档,Content-Type设置为text/html。邮件的第二个部分是一张JPG图片,Content-Type设置为image/jpeg,并使用base64编码方式进行传输。邮件的各个部分之间使用boundary-example分隔符进行分隔。
5. Content-Type Header Field¶
备注
The Content-Type header field was first defined in RFC 1049. RFC 1049 used a simpler and less powerful syntax, but one that is largely compatible with the mechanism given here.
The set of meaningful parameters depends on the media type and subtype:
There are NO globally-meaningful parameters that apply to all media types.
Most parameters are associated with a single specific subtype.
However, a given top-level media type may define parameters which are applicable to any subtype of that type.
For example:
the "charset" parameter is applicable to any subtype of "text"
the "boundary" parameter is required for any subtype of the "multipart" media type
An initial set of seven top-level media types is defined in RFC 2046. Five of these are discrete types whose content is essentially opaque as far as MIME processing is concerned. The remaining two are composite types whose contents require additional handling by MIME processors.
5.1 Syntax of the Content-Type Header Field¶
a Content-Type header field value is defined as follows:
content := "Content-Type" ":" type "/" subtype
*(";" parameter)
; Matching of media type and subtype
; is ALWAYS case-insensitive.
type := discrete-type / composite-type
discrete-type := "text" / "image" / "audio" / "video" /
"application" / extension-token
composite-type := "message" / "multipart" / extension-token
extension-token := ietf-token / x-token
ietf-token := <An extension token defined by a standards-track RFC and registered with IANA.>
x-token := <The two characters "X-" or "x-" followed, with no intervening white space, by any token>
subtype := extension-token / iana-token
iana-token := <A publicly-defined extension token.
Tokens of this form must be registered with IANA as specified in RFC 2048.>
parameter := attribute "=" value
attribute := token
; Matching of attributes
; is ALWAYS case-insensitive.
value := token / quoted-string
token := 1*<any (US-ASCII) CHAR except SPACE, CTLs, or tspecials>
tspecials := "(" / ")" / "<" / ">" / "@" /
"," / ";" / ":" / "\" / <">
"/" / "[" / "]" / "?" / "="
; Must be in quoted-string,
; to use within parameter values
The type, subtype, and parameter names are not case sensitive.
comments are allowed in accordance with RFC 822 rules for structured header fields. Thus the following two forms are completely equivalent:
Content-type: text/plain; charset=us-ascii (Plain text) Content-type: text/plain; charset="us-ascii"
Two acceptable mechanisms for defining new media subtypes:
1. Private values (starting with "X-") can be defined private.
Such values cannot be registered or standardized.
2. New standard values should be registered with IANA as described in `RFC 2048`.
5.2 Content-Type Defaults¶
This default is assumed if no Content-Type header field is specified:
Content-type: text/plain; charset=us-ascii
6. Content-Transfer-Encoding Header Field¶
Many media types which could be usefully transported via email are represented, in their “natural” format, as 8bit character or binary data.
Such data cannot be transmitted over some transfer protocols.
For example, RFC 821 (SMTP) restricts mail messages to 7bit US-ASCII data with lines no longer than 1000 characters including any trailing CRLF line separator.
备注
So, “Content-Transfer-Encoding” header field is be used, to define a standard mechanism for encoding such data into a 7bit short line format
6.1. Content-Transfer-Encoding Syntax¶
encoding := "Content-Transfer-Encoding" ":" mechanism
mechanism := "7bit"(default) / "8bit" / "binary" /
"quoted-printable" / "base64" /
ietf-token / x-token
6.2. Content-Transfer-Encodings Semantics¶
Three transformations are currently defined:
1. identity,
2. the "quoted-printable" encoding(相对易读的编码(quoted-printable))
3. the "base64" encoding("密集" 或 "均匀" 的编码)
The domains are "binary", "8bit" and "7bit".
The identity encoding is just NO encoding
The quoted-printable and base64 encodings transform their input from an arbitrary domain into material in the “7bit” range, thus making it safe to carry over restricted transports.
6.3. New Content-Transfer-Encodings¶
private Content-Transfer-Encoding values is a name prefixed by “X-”, to indicate its non-standard status:
例:
"Content-Transfer-Encoding: x-my-new-encoding"
备注
Unlike media types and subtypes, the creation of new Content-Transfer-Encoding values is STRONGLY discouraged, as it seems likely to hinder interoperability with little potential benefit
6.4. Interpretation and Use¶
Content-Transfer-Encoding是一种MIME协议定义的编码机制,用于将二进制数据编码为可在Internet上传输的文本格式。
如果一个实体的类型是“multipart”,则Content-Transfer-Encoding只能是“7bit”、“8bit”或“binary”
示例:
Content-Type: text/plain; charset=ISO-8859-1
Content-transfer-encoding: base64
6.5. Translating Encodings¶
The
quoted-printable
andbase64
encodings are designed so that conversion between them is possible.
6.6. Canonical Encoding Model¶
A canonical model for encoding is presented in RFC 2049 to fix the confusion of previous version of this RFC(especially in encoding CRLF)
6.7. Quoted-Printable Content-Transfer-Encoding¶
这是一种编码(与下面的Base64对比的编码)
使用 Quoted-Printable 编码的原因是某些字符(例如非 ASCII 字符、空格等)可能会在传输过程中被错误地解释或丢失。
每个非 ASCII 字符都被转换成了 “=XX” 的形式,其中 XX 是该字符的十六进制编码。
Quoted-Printable 编码可以确保文本数据的完整性和准确性,并使其可以通过网络传输。
示例:
Hello, 你好! => Hello, =E4=BD=A0=E5=A5=BD=EF=BC=81 说明: 数字表示该字符的 Unicode 编码(UTF-8)
编码过程如下:
1. 按照 UTF-8 编码将原文转换成字节序列。
2. 对于每个字节,如果其 ASCII 码在 33 到 60 之间或 62 到 126 之间,那么直接使用该字节,否则进行下一步操作。
3. 将该字节转换为 16 进制表示,并在前面加上"=",得到该字节的 Quoted-Printable 编码。
4. 最终编码结果为原始字符和编码字符的拼接。
6.8. Base64 Content-Transfer-Encoding¶
The Base64 Content-Transfer-Encoding is a method of encoding arbitrary sequences of octets to be represented in a form that is not necessarily human-readable.
This encoding is designed to be simple and result in a data increase of approximately 33 percent.
The Base64 Alphabet:
Value Encoding Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y
图片示例:
原始:
01110100 01101000 01101001 01110011 00100000 01101001 01110011 ....
对应的字符串:
"this is"
Base64:
dGhpcyBpcw==
Base64编码的规则如下:
1. 将要编码的数据每3个字节一组,即每24个比特一组
2. 将这24个比特划分为4个6位的小组
3. 对每个6位的小组,将其转换成一个Base64字符
4. 如果数据的字节数不是3的倍数,则会进行填充,将剩余的字节以0填充,使得剩余的字节可以凑成3个字节。填充的字符为“=”,每个缺少的字节都会用一个“=”字符来表示
7. Content-ID Header Field¶
The “Content-ID” header field is used to label bodies in order to allow one body to make reference to another in a high-level user agent.
The syntax of “Content-ID” is identical to that of “Message-ID”:
id := "Content-ID" ":" msg-id
“Content-ID” header is generally optional, But In MIME media type “message/external-body”, “Content-ID” header is MANDATORY
8. Content-Description Header Field¶
The ability to associate some descriptive information with a given body is often desirable.
For example, it may be useful to mark an “image” body as “a picture of the Space Shuttle Endeavor.”
description := "Content-Description" ":" *text
9. Additional MIME Header Fields¶
MIME-extension-field := <Any RFC 822 header field which
begins with the string "Content-">
Appendix A – Collected Grammar¶
总:
MIME-message-headers := entity-headers
fields
version CRLF
; The ordering of the header fields implied by this BNF definition should be ignored.
MIME-part-headers := entity-headers
[fields]
; Any field not beginning with "content-" can have no defined meaning and may be ignored.
; The ordering of the header fields implied by this BNF definition should be ignored.
entity-headers := [ content CRLF ]
[ encoding CRLF ]
[ id CRLF ]
[ description CRLF ]
*( MIME-extension-field CRLF )
content¶
content := "Content-Type" ":" type "/" subtype
*(";" parameter)
; Matching of media type and subtype is ALWAYS case-insensitive.
type¶
type := discrete-type / composite-type
discrete-type := "text" / "image" / "audio" / "video" /
"application" / extension-token
composite-type := "message" / "multipart" / extension-token
subtype¶
subtype := extension-token / iana-token
extension-token := ietf-token / x-token
iana-token := <A publicly-defined extension token.
Tokens of this form must be registered with IANA as specified in RFC 2048.>
ietf-token := <An extension token defined by a standards-track RFC and registered with IANA.>
parameter¶
parameter := attribute "=" value
attribute := token
; Matching of attributes
; is ALWAYS case-insensitive.
value := token / quoted-string
encoding¶
encoding := "Content-Transfer-Encoding" ":" mechanism
mechanism := "7bit" / "8bit" / "binary" /
"quoted-printable" / "base64" /
ietf-token / x-token
id¶
id := "Content-ID" ":" msg-id
description¶
description := "Content-Description" ":" *text
MIME-extension-field¶
MIME-extension-field := <Any RFC 822 header field which
begins with the string "Content-">
通用¶
quoted-printable¶
quoted-printable := qp-line *(CRLF qp-line)
qp-line := *(qp-segment transport-padding CRLF)
qp-part transport-padding
qp-segment := qp-section *(SPACE / TAB) "="
; Maximum length of 76 characters
transport-padding := *LWSP-char
; Composers MUST NOT generate non-zero length transport padding,
; but receivers MUST be able to handle padding added by message transports.
qp-part := qp-section
; Maximum length of 76 characters
qp-section := [*(ptext / SPACE / TAB) ptext]
ptext¶
ptext := hex-octet / safe-char
hex-octet := "=" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")
; Octet must be used for characters > 127, =, SPACEs or TABs at the ends of lines,
; and is recommended for any character not listed in RFC 2049 as "mail-safe".
safe-char := <any octet with decimal value of 33 through 60 inclusive, and 62 through 126>
; Characters not listed as "mail-safe" in
; RFC 2049 are also not recommended.
通用¶
token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
or tspecials>
tspecials := "(" / ")" / "<" / ">" / "@" /
"," / ";" / ":" / "\" / <">
"/" / "[" / "]" / "?" / "="
; Must be in quoted-string,
; to use within parameter values
version := "MIME-Version" ":" 1*DIGIT "." 1*DIGIT
x-token := <The two characters "X-" or "x-" followed, with
no intervening white space, by any token>