RFC6184: RTP Payload Format for H.264 Video¶

Obsoletes: rfc3984
Category: Standards Track
May 2011
Company: Huawei Technologies, WorldGate Communications

1. Introduction¶

This memo specifies an RTP payload specification for the video coding standard known as ITU-T Recommendation H.264 and ISO/IEC International Standard 14496-10 (both also known as Advanced Video Coding (AVC)).
In this memo, the name H.264 is used for the codec and the standard, but this memo is equally applicable to the ISO/IEC counterpart of the coding standard.

1.1. The H.264 Codec¶

The H.264 video codec has a very broad application range that covers all forms of digital compressed video, from low bitrate Internet streaming applications to HDTV broadcast and Digital Cinema applications with nearly lossless coding.
Compared to the current state of technology, the overall performance of H.264 is such that bitrate savings of 50% or more are reported. Digital Satellite TV quality, for example, was reported to be achievable at 1.5 Mbit/s, compared to the current operation point of MPEG 2 video at around 3.5 Mbit/s [10].
The codec specification [1] itself conceptually distinguishes between a Video Coding Layer (VCL) and a Network Abstraction Layer (NAL).
The VCL contains the signal processing functionality of the codec; mechanisms such as transform, quantization, and motion-compensated prediction; and a loop filter. It follows the general concept of most of today’s video codecs, a macroblock-based coder that uses inter picture prediction with motion compensation and transform coding of the residual signal.
The NAL encoder encapsulates the slice output of the VCL encoder into Network Abstraction Layer Units (NALUs), which are suitable for transmission over packet networks or for use in packet-oriented multiplex environments. A NAL unit consists of a one-byte header and the payload byte string. The header indicates the type of the NAL unit, the (potential) presence of bit errors or syntax violations in the NAL unit payload, and information regarding the relative importance of the NAL unit for the decoding process. This RTP payload specification is designed to be unaware of the bit string in the NAL unit payload.
One of the main properties of H.264 is the complete decoupling of the transmission time, the decoding time, and the sampling or presentation time of slices and pictures.

1.2. Parameter Set Concept¶

One very fundamental design concept of H.264 is to generate self-contained packets, to make mechanisms such as the header duplication of RFC 4629 or MPEG-4 Visual's Header Extension Code (HEC) unnecessary. This was achieved by decoupling information relevant to more than one slice from the media stream. This higher-layer meta information should be sent reliably, asynchronously, and in advance from the RTP packet stream that contains the slice packets.
The combination of the higher-level parameters is called a parameter set. The H.264 specification includes two types of parameter sets: sequence parameter sets and picture parameter sets.
The sequence and picture parameter set structures contain information such as picture size, optional coding modes employed, and macroblock to slice group map.

1.3. Network Abstraction Layer Unit Types¶

The NAL unit type octet has the following format:

+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|F|NRI|  Type   |
+---------------+

F: forbidden_zero_bit.
  The H.264 specification declares a
      value of 1 as a syntax violation

NRI: nal_ref_idc
  00: indicates that the content of the NAL unit is not used to reconstruct reference pictures for inter picture prediction.
  greater than 00: indicate that the decoding of the NAL unit is required to maintain the integrity of the reference pictures.

Type: nal_unit_type
  detailed by `ITU-T Recommendation H.264, "Advanced video coding for generic audiovisual services", March 2010`.

detailed by section 5.3

2. Conventions¶

ignore

3. Scope¶

This payload specification can only be used to carry the “naked” H.264 NAL unit stream over RTP and not the bitstream format discussed in Annex B of H.264.
Likely, the first applications of this specification will be in the conversational multimedia field, video telephony or video conferencing, but the payload format also covers other applications, such as Internet streaming and TV over IP.

4. Definitions and Abbreviations¶

4.1. Definitions¶

access unit: A set of NAL units always containing a primary coded picture.
coded video sequence:
IDR access unit: An access unit in which the primary coded picture is an IDR picture.
IDR picture: A coded picture containing only slices with I or SI slice types that causes a “reset” in the decoding process.
primary coded picture:
redundant coded picture:
VCL NAL unit: A collective term used to refer to coded slice and coded data partition NAL units.
decoding order number (DON):
NAL unit decoding order:
NALU-time:
transmission order:
media-aware network element (MANE):
static macroblock:
default sub-profile:
default level:

4.2. Abbreviations¶

DON:        Decoding Order Number
DONB:       Decoding Order Number Base
DOND:       Decoding Order Number Difference
FEC:        Forward Error Correction
FU:         Fragmentation Unit
IDR:        Instantaneous Decoding Refresh
IEC:        International Electrotechnical Commission
ISO:        International Organization for Standardization
ITU-T:      International Telecommunication Union,
            Telecommunication Standardization Sector
MANE:       Media-Aware Network Element
MTAP:       Multi-Time Aggregation Packet
MTAP16:     MTAP with 16-bit timestamp offset
MTAP24:     MTAP with 24-bit timestamp offset
NAL:        Network Abstraction Layer
NALU:       NAL Unit
SAR:        Sample Aspect Ratio
SEI:        Supplemental Enhancement Information
STAP:       Single-Time Aggregation Packet
STAP-A:     STAP type A
STAP-B:     STAP type B
TS:         Timestamp
VCL:        Video Coding Layer
VUI:        Video Usability Information

5. RTP Payload Format¶

5.1. RTP Header Usage¶

Figure 1. RTP header according to RFC 3550:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X|  CC   |M|     PT      |       sequence number         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           timestamp                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           synchronization source (SSRC) identifier            |
+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
|            contributing source (CSRC) identifiers             |
|                             ....                              |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

说明:

Marker bit (M): 1 bit
Payload type (PT): 7 bits
Sequence number (SN): 16 bits
Timestamp: 32 bits

5.2. Payload Structures¶

The payload format defines three different basic payload structures:

1. Single NAL Unit Packet:
        Contains only a single NAL unit in the payload.
2. Aggregation Packet:
        Packet type used to aggregate multiple NAL units into a single RTP payload.
3. Fragmentation Unit:
        Used to fragment a single NAL unit over multiple RTP packets.

Table 1. Summary of NAL unit types and the corresponding packet types:

NAL Unit  Packet    Packet Type Name               Section
Type      Type
-------------------------------------------------------------
0        reserved                                     -
1-23     NAL unit  Single NAL unit packet             5.6
24       STAP-A    Single-time aggregation packet     5.7.1
25       STAP-B    Single-time aggregation packet     5.7.1
26       MTAP16    Multi-time aggregation packet      5.7.2
27       MTAP24    Multi-time aggregation packet      5.7.2
28       FU-A      Fragmentation unit                 5.8
29       FU-B      Fragmentation unit                 5.8
30-31    reserved                                     -

5.3. NAL Unit Header Usage¶

The structure and semantics of the NAL unit header were introduced in Section 1.3

NRI Values:

NAL Unit Type     Content of NAL Unit              NRI (binary)
----------------------------------------------------------------
 1              non-IDR coded slice                         10
 2              Coded slice data partition A                10
 3              Coded slice data partition B                01
 4              Coded slice data partition C                01

5.4. Packetization Modes¶

This memo specifies three cases of packetization modes:

o  Single NAL unit mode
o  Non-interleaved mode
o  Interleaved mode

Table 3. Summary of allowed NAL unit types for each packetization mode:

Payload Packet    Single NAL    Non-Interleaved    Interleaved
Type    Type      Unit Mode           Mode             Mode
-------------------------------------------------------------
0      reserved      ig               ig               ig
1-23   NAL unit     yes              yes               no
24     STAP-A        no              yes               no
25     STAP-B        no               no              yes
26     MTAP16        no               no              yes
27     MTAP24        no               no              yes
28     FU-A          no              yes              yes
29     FU-B          no               no              yes
30-31  reserved      ig               ig               ig

5.5. Decoding Order Number (DON)¶

5.6. Single NAL Unit Packet¶

Figure 2. RTP payload format for single NAL unit packet:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|NRI|  Type   |                                               |
+-+-+-+-+-+-+-+-+                                               |
|                                                               |
|               Bytes 2..n of a single NAL unit                 |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

5.7. Aggregation Packets¶

Figure 3. RTP payload format for aggregation packets:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|F|NRI|  Type   |                                               |
+-+-+-+-+-+-+-+-+                                               |
|                                                               |
|             one or more aggregation units                     |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Table 4. Type field for STAPs and MTAPs:

Type   Packet    Timestamp offset   DON-related fields
                 field length       (DON, DONB, DOND)
                 (in bits)          present
--------------------------------------------------------
24     STAP-A       0                 no
25     STAP-B       0                 yes
26     MTAP16      16                 yes
27     MTAP24      24                 yes

5.7.1. Single-Time Aggregation Packet (STAP)¶

Figure 4. Payload format for STAP-A:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                :                                               |
+-+-+-+-+-+-+-+-+                                               |
|                                                               |
|                single-time aggregation units                  |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 5. Payload format for STAP-B:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                :  decoding order number (DON)  |               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
|                                                               |
|                single-time aggregation units                  |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 6. Structure for single-time aggregation unit:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                :        NAL unit size          |               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
|                                                               |
|                           NAL unit                            |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 7. An example of an RTP packet including an STAP-A containing two single-time aggregation units:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          RTP Header                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|STAP-A NAL HDR |         NALU 1 Size           | NALU 1 HDR    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         NALU 1 Data                           |
:                                                               :
+               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               | NALU 2 Size                   | NALU 2 HDR    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                         NALU 2 Data                           |
:                                                               :
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 8. An example of an RTP packet including an STAP-B containing two single-time aggregation units:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          RTP Header                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|STAP-B NAL HDR | DON                           | NALU 1 Size   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| NALU 1 Size   | NALU 1 HDR    | NALU 1 Data                   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
:                                                               :
+               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               | NALU 2 Size                   | NALU 2 HDR    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       NALU 2 Data                             |
:                                                               :
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

5.7.2. Multi-Time Aggregation Packets (MTAPs)¶

Figure 9. NAL unit payload format for MTAPs:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                :  decoding order number base   |               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
|                                                               |
|                 multi-time aggregation units                  |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 10. Multi-time aggregation unit for MTAP16:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
:        NAL unit size          |      DOND     |  TS offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  TS offset    |                                               |
+-+-+-+-+-+-+-+-+              NAL unit                         |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 11. Multi-time aggregation unit for MTAP24:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
:        NAL unit size         |      DOND     |  TS offset    |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|         TS offset             |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
|                              NAL unit                         |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 12. An RTP packet including a multi-time aggregation packet of type MTAP16 containing two multi-time aggregation units:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          RTP Header                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|MTAP16 NAL HDR |  decoding order number base   | NALU 1 Size   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offset        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  NALU 1 HDR   |  NALU 1 DATA                                  |
+-+-+-+-+-+-+-+-+                                               +
:                                                               :
+               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               | NALU 2 SIZE                   |  NALU 2 DOND  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       NALU 2 TS offset        |  NALU 2 HDR   |  NALU 2 DATA  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               |
:                                                               :
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 13. An RTP packet including a multi-time aggregation packet of type MTAP24 containing two multi-time aggregation units:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          RTP Header                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|MTAP24 NAL HDR |  decoding order number base   | NALU 1 Size   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  NALU 1 Size  |  NALU 1 DOND  |       NALU 1 TS offs          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|NALU 1 TS offs |  NALU 1 HDR   |  NALU 1 DATA                  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
:                                                               :
+               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|               | NALU 2 SIZE                   |  NALU 2 DOND  |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|       NALU 2 TS offset                        |  NALU 2 HDR   |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  NALU 2 DATA                                                  |
:                                                               :
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

5.8. Fragmentation Units (FUs)¶

Figure 14. RTP payload format for FU-A:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FU indicator  |   FU header   |                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
|                                                               |
|                         FU payload                            |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 15. RTP payload format for FU-B:

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| FU indicator  |   FU header   |               DON             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-|
|                                                               |
|                         FU payload                            |
|                                                               |
|                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               :...OPTIONAL RTP padding        |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

The FU indicator octet has the following format:

+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|F|NRI|  Type   |
+---------------+

The FU header has the following format:

+---------------+
|0|1|2|3|4|5|6|7|
+-+-+-+-+-+-+-+-+
|S|E|R|  Type   |
+---------------+

6. Packetization Rules¶

The packetization modes are introduced in Section 5.2.
The packetization rules common to more than one of the packetization modes are specified in Section 6.1.
The packetization rules for the single NAL unit mode, the non-interleaved mode, and the interleaved mode are specified in Sections 6.2, 6.3, and 6.4, respectively.
本文介绍了视频传输中的一些封装规则，其中规定了单个 NAL 单元模式、非交错模式和交错模式的封装规则，并指定了除此之外均适用的规则。通过这些规则，可以使视频数据得到更好的传输。同时，文章还介绍了一些相关技术和应注意的事项。

1. Common Packetization Rules
2. Single NAL Unit Mode
3. Non-Interleaved Mode
4. Interleaved Mode

7. De-Packetization Process¶

该文档介绍了将 H.264 视频流传输到 RTP 会话中时的一些过程，特别是解包装过程。该文档提供了单个 NAL 单元和非交织 NAL 单元的解包装过程（如果有多个 NAL 单元，则需要按 NAL 单元的接收顺序进行排序）。另外，该文档还提供了交织模式的解包装过程，包括如何计算解交织缓冲区的大小和如何组织接收到的 NAL 单元以获得解码顺序。此外，该文档还提供了如何处理丢失分片数据分区（DPA）和碎片化的 NAL 单元等其他指南。

8. Payload Format Parameters¶

8.1. Media Type Registration¶

The media subtype for the ITU-T H.264 | ISO/IEC 14496-10 codec has been allocated from the IETF tree:

Media Type name:     video

Media subtype name:  H264

Required parameters: none

OPTIONAL parameters:
             profile-level-id:
             max-recv-level:
             max-mbps, max-smbps, max-fs, max-cpb, max-dpb, and max-br:
                     These parameters MAY be used to signal the capabilities of a receiver implementation.
                     max-mbps: (integer) the maximum macroblock processing rate in units of macroblocks per second.
                     max-smbps: (integer) the maximum static macroblock processing rate in units of static macroblocks per second
                     max-fs: (integer) the maximum frame size in units of macroblocks.
                     max-cpb: (integer) the maximum coded picture buffer size
                             in units of 1000 bits for the VCL HRD parameters and
                             in units of 1200 bits for the NAL HRD parameters.
                     max-dpb: (integer) the maximum decoded picture buffer size in units of 8/3 macroblocks.
                     max-br: (integer) the maximum video bitrate
                             in units of 1000 bits per second for the VCL HRD parameters and
                             in units of 1200 bits per second for the NAL HRD parameters.

             redundant-pic-cap: This parameter signals the capabilities of a receiver implementation.
             sprop-parameter-sets:
             sprop-level-parameter-sets:
             use-level-src-parameter-sets:
             in-band-parameter-sets:
             level-asymmetry-allowed:
             packetization-mode:
             sprop-interleaving-depth:
             sprop-deint-buf-req:
             deint-buf-cap:
             sprop-init-buf-time:
             sprop-max-don-diff:
             max-rcmd-nalu-size:
             sar-understood:
             sar-supported:

8.2. SDP Parameters¶

8.2.1. Mapping of Payload Type Parameters to SDP¶

The media type video/H264 string is mapped to fields in the Session Description Protocol (SDP) as follows:

o  The media name in the "m=" line of SDP MUST be video.
o  The encoding name in the "a=rtpmap" line of SDP MUST be H264 (the media subtype).
o  The clock rate in the "a=rtpmap" line MUST be 90000.
o  The OPTIONAL parameters sprop-parameter-sets and sprop-level-parameter-sets, when present,
             MUST be included in the "a=fmtp" line of SDP or
             conveyed using the "fmtp" source attribute as specified in Section 6.3 of SDP(rfc5576)
o  The OPTIONAL parameters (listed in 8.1), when present, MUST be included in the "a=fmtp" line of SDP.

Example:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E;
          packetization-mode=1;
          sprop-parameter-sets=<parameter sets data>

8.2.2. Usage with the SDP Offer/Answer Model¶

When H.264 is offered over RTP using SDP in an Offer/Answer model for negotiation for unicast usage, the following limitations and rules apply

Table 6 lists the interpretation of all the media type parameters that MUST be used for the different direction attributes:

                              sendonly --+
                           recvonly --+  |
                        sendrecv --+  |  |
                                   |  |  |
profile-level-id                   C  C  P
max-recv-level                     R  R  -
packetization-mode                 C  C  P
sprop-deint-buf-req                P  -  P
sprop-interleaving-depth           P  -  P
sprop-max-don-diff                 P  -  P
sprop-init-buf-time                P  -  P
max-mbps                           R  R  -
max-smbps                          R  R  -
max-fs                             R  R  -
max-cpb                            R  R  -
max-dpb                            R  R  -
max-br                             R  R  -
redundant-pic-cap                  R  R  -
deint-buf-cap                      R  R  -
max-rcmd-nalu-size                 R  R  -
sar-understood                     R  R  -
sar-supported                      R  R  -
in-band-parameter-sets             R  R  -
use-level-src-parameter-sets       R  R  -
level-asymmetry-allowed            O  -  -
sprop-parameter-sets               S  -  S
sprop-level-parameter-sets         S  -  S

 Legend:

 C: configuration for sending and receiving streams
 O: offer/answer mode
 P: properties of the stream to be sent
 R: receiver capabilities
 S: out-of-band parameter sets
 -: not usable (when present, SHOULD be ignored)

8.2.3. Usage in Declarative Session Descriptions¶

When H.264 over RTP is offered with SDP in a declarative style, as in RTSP or SAP, the following considerations are necessary:

o  All parameters capable of indicating both stream properties and receiver capabilities are used to indicate only stream properties.
o  A receiver of the SDP is required to support all parameters and values of the parameters provided;
        otherwise, the receiver MUST reject (RTSP) or not participate in (SAP) the session.

8.3. Examples¶

An SDP Offer/Answer exchange wherein both parties are expected to both send and receive could look like the following:

// 注1: Only the media-codec-specific parts of the SDP are shown.
// 注2: Some lines are wrapped due to text constraints.

Offerer -> Answerer SDP message¶

m=video 49170 RTP/AVP 100 99 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E; packetization-mode=0;
  sprop-parameter-sets=<parameter sets data#0>
a=rtpmap:99 H264/90000
a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
  sprop-parameter-sets=<parameter sets data#1>
a=rtpmap:100 H264/90000
a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
  sprop-parameter-sets=<parameter sets data#2>;
  sprop-interleaving-depth=45; sprop-deint-buf-req=64000;
  sprop-init-buf-time=102478; deint-buf-cap=128000

The above offer presents the same codec configuration in three different packetization formats:

payload type 98 represents single NALU mode(packetization-mode=0)
payload type 99 represents non-interleaved mode(packetization-mode=1)
payload type 100 indicates the interleaved mode(packetization-mode=2)

In the interleaved mode case(in this case, payload type=100), the interleaving parameters( sprop-interleaving-depth 等) that the offerer would use if the answer indicates support for payload type 100 are also included.
In all three cases, the parameter sprop-parameter-sets conveys the initial parameter sets that are required by the answerer when receiving a stream from the offerer when this configuration is accepted. Note that the value for sprop-parameter-sets could be different for each payload type.

Answerer -> Offerer SDP message¶

m=video 49170 RTP/AVP 100 99 97
a=rtpmap:97 H264/90000
a=fmtp:97 profile-level-id=42A01E; packetization-mode=0;
  sprop-parameter-sets=<parameter sets data#3>
a=rtpmap:99 H264/90000
a=fmtp:99 profile-level-id=42A01E; packetization-mode=1;
  sprop-parameter-sets=<parameter sets data#4>;
  max-rcmd-nalu-size=3980
a=rtpmap:100 H264/90000
a=fmtp:100 profile-level-id=42A01E; packetization-mode=2;
  sprop-parameter-sets=<parameter sets data#5>;
  sprop-interleaving-depth=60;
  sprop-deint-buf-req=86000; sprop-init-buf-time=156320;
  deint-buf-cap=128000; max-rcmd-nalu-size=3980

As the Offer/Answer negotiation covers both sending and receiving streams, an offer indicates the exact parameters for what the offerer is willing to receive, whereas the answer indicates the same for what the answerer is willing to receive.
In this case, the offerer declared that it is willing to receive payload type 98(from: Offerer -> Answerer SDP message). The answerer accepts this by declaring an equivalent payload type 97(from: Answerer -> Offerer SDP message); that is, it has identical values for the two parameters profile-level-id and packetization-mode (since packetization-mode is equal to 0 and sprop- deint-buf-req is not present).
The answerer also accepts the reception of the two configurations that payload types 99 and 100 represent.
说明：经过上面2段SDP码，Offerer和Answerer都发出了他们支持的packetization mode及相关参数，后面就是协商确定最终的选择。
The max-rcmd-nalu-size indicates that the answerer can efficiently process NALUs up to the size of 3980 bytes. However, there is no guarantee that the network supports this size.

Offer SDP/Answer SDP¶

In the following example, the offer is accepted without level downgrading (i.e., the default level, Level 3.0, is accepted), and both sprop-parameter-sets and sprop-level-parameter-sets are present in the offer.
The answerer must ignore sprop-level-parameter-sets=<parameter sets data#1> and store parameter sets in sprop-parameter-sets=<parameter sets data#0> for decoding the incoming NAL unit stream.
The offerer must store the parameter sets in sprop-parameter-sets=<parameter sets data#2> in the answer for decoding the incoming NAL unit stream.
Note that in this example, parameter sets in sprop-parameter-sets=<parameter sets data#2> must be associated with Level 3.0.

Offer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
  packetization-mode=1;
  sprop-parameter-sets=<parameter sets data#0>;                   // 序列参数集
  sprop-level-parameter-sets=<parameter sets data#1>              // 图像参数集

Answer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
  packetization-mode=1;
  sprop-parameter-sets=<parameter sets data#2>

备注

总体来说,Offer提出了使用Payload type 98来传输Baseline level 3.0的H.264视频,采用非交织模式,并提供两个parameter sets。Answer表示接受该Offer,profile、level和packetization模式相同,但只提供了序列参数集,没有提供图像参数集。这表示协商成功,但图像参数集需要从第一帧数据中解析获得。如果Offer中没有提供图像参数集,那么Answer就必须提供,否则第一帧无法解码。

Offer SDP/Answer SDP(downgrading)¶

In the following example, the offer (Baseline profile, Level 1.1) is accepted with level downgrading (the accepted level is Level 1b), and both sprop-parameter-sets and sprop-level-parameter-sets are present in the offer.
The answerer must ignore sprop-parameter-sets=<parameter sets data#0> and all parameter sets not for the accepted level (Level 1b) in sprop-level-parameter-sets=<parameter sets data#1> and must store parameter sets for the accepted level (Level 1b) in sprop-level-parameter-sets=<parameter sets data#1> for decoding the incoming NAL unit stream.
The offerer must store the parameter sets in sprop-parameter-sets=<parameter sets data#2> in the answer for decoding the incoming NAL unit stream.
Note that in this example, parameter sets in sprop-parameter-sets=<parameter sets data#2> must be associated with Level 1b.

Offer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
  packetization-mode=1;
  sprop-parameter-sets=<parameter sets data#0>;
  sprop-level-parameter-sets=<parameter sets data#1>

Answer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
  packetization-mode=1;
  sprop-parameter-sets=<parameter sets data#2>;
  use-level-src-parameter-sets=1

Offer SDP/Answer SDP(downgrading, legacy RFC 3984 implementation)¶

The answerer is a legacy RFC 3984 implementation and does not understand sprop-level-parameter-sets; hence, it does not include use-level-src-parameter-sets (which the answerer does not understand either) in the answer. Therefore, the answerer must ignore both sprop-parameter-sets=<parameter sets data#0> and sprop-level-parameter-sets=<parameter sets data#1>, and the offerer must transport parameter sets in-band.

Offer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A00B; //Baseline profile, Level 1.1
  packetization-mode=1;
  sprop-parameter-sets=<parameter sets data#0>;
  sprop-level-parameter-sets=<parameter sets data#1>

Answer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42B00B; //Baseline profile, Level 1b
  packetization-mode=1

Offer SDP/Answer SDP(without sprop-level-parameter-sets)¶

In the following example, the offer is accepted without level downgrading, and sprop-parameter-sets is present in the offer.
Parameter sets in sprop-parameter-sets=<parameter sets data#0> must be stored and used by the encoder of the offerer and the decoder of the answerer, and parameter sets in sprop-parameter-sets=<parameter sets data#1> must be used by the encoder of the answerer and the decoder of the offerer.
Note that sprop-parameter-sets=<parameter sets data#0> is basically independent of sprop-parameter-sets=<parameter sets data#1>.

Offer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
  packetization-mode=1;
  sprop-parameter-sets=<parameter sets data#0>

Answer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
  packetization-mode=1;
  sprop-parameter-sets=<parameter sets data#1>

Offer SDP/Answer SDP(without both in offer)¶

In the following example, the offer is accepted without leveldowngrading, and neither sprop-parameter-sets nor sprop-level-parameter-sets is present in the offer, meaning that there is no out-of-band transmission of parameter sets, which then have to betransported in-band.

Offer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
  packetization-mode=1

Answer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
  packetization-mode=1

Offer SDP/Answer SDP(downgrading, without both in answer)¶

In the following example, the offer is accepted with level downgrading and sprop-parameter-sets is present in the offer. As sprop-parameter-sets=<parameter sets data#0> contains level_idc indicating Level 3.0, it therefore cannot be used, as the answerer wants Level 2.0, and must be ignored by the answerer, and in-band parameter sets must be used.

Offer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
  packetization-mode=1;
  sprop-parameter-sets=<parameter sets data#0>

Answer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
  packetization-mode=1

Offer SDP/Answer SDP(downgrading, without both in offer)¶

In the following example, the offer is also accepted with level downgrading, and neither sprop-parameter-sets nor sprop-level-parameter-sets is present in the offer, meaning that there is no out-of-band transmission of parameter sets, which then have to be transported in-band.

Offer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
  packetization-mode=1

Answer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
  packetization-mode=1

Offer SDP/Answer SDP(upgrading, without both in offer)¶

In the following example, the offer is accepted with level upgrading, and neither sprop-parameter-sets nor sprop-level-parameter-sets is present in the offer or the answer, meaning that there is no out-of- band transmission of parameter sets, which then have to be transported in-band.
The level to use in the offerer-to-answerer direction is Level 3.0, and the level to use in the answerer-to-offerer direction is Level 2.0.
The answerer is allowed to send at any level up to and including Level 2.0, and the offerer is allowed to send at any level up to and including Level 3.0.

Offer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A014; //Baseline profile, Level 2.0
  packetization-mode=1; level-asymmetry-allowed=1

Answer SDP:

m=video 49170 RTP/AVP 98
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
  packetization-mode=1; level-asymmetry-allowed=1

Offer SDP/Answer SDP(MCU)¶

In the following example, the offerer is a Multipoint Control Unit (MCU) in a topology like Topo-Video-switch-MCU, offering parameter sets received (using out-of-band transport) from three other participants (B, C, and D) and receiving parameter sets from the participant A, which is the answerer.
The participants are identified by their values of canonical name (CNAME), which are mapped to different SSRC values. The same codec configuration is used by all four participants.
The participant A stores and associates the parameter sets included in <parameter sets data#B>, <parameter sets data#C>, and <parameter sets data#D> to participants B, C, and D, respectively, and uses <parameter sets data#B> for decoding NAL units carried in RTP packets originating from participant B only, uses <parameter sets data#C> for decoding NAL units carried in RTP packets originating from participant C only, and uses <parameter sets data#D> for decoding NAL units carried in RTP packets originating from participant D only.

Offer SDP:

m=video 49170 RTP/AVP 98
a=ssrc:SSRC-B cname:CNAME-B
a=ssrc:SSRC-C cname:CNAME-C
a=ssrc:SSRC-D cname:CNAME-D
a=ssrc:SSRC-B fmtp:98
  sprop-parameter-sets=<parameter sets data#B>
a=ssrc:SSRC-C fmtp:98
  sprop-parameter-sets=<parameter sets data#C>
a=ssrc:SSRC-D fmtp:98
  sprop-parameter-sets=<parameter sets data#D>
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
  packetization-mode=1

Answer SDP:

m=video 49170 RTP/AVP 98
a=ssrc:SSRC-A cname:CNAME-A
a=ssrc:SSRC-A fmtp:98
  sprop-parameter-sets=<parameter sets data#A>
a=rtpmap:98 H264/90000
a=fmtp:98 profile-level-id=42A01E; //Baseline profile, Level 3.0
  packetization-mode=1

8.4. Parameter Set Considerations¶

备注

The H.264 parameter sets are a fundamental part of the video codec and vital to its operation (see Section 1.2). 参数集对解码很重要,错误会导致解码失败; 参数集可能在何时损坏或更新不当; 确保参数集正确传输和更新

Due to their characteristics and their importance for the decoding process, lost or erroneously transmitted parameter sets can hardly be concealed locally at the receiver.
A reference to a corrupt parameter set normally has fatal results to the decoding process.
Corruption could occur, for example, due to the erroneous transmission or loss of a parameter set NAL unit but also due to the untimely transmission of a parameter set update.
A parameter set update refers to a change of at least one parameter in a picture parameter set or sequence parameter set for which the picture parameter set or sequence parameter set identifier remains unchanged.
Therefore, the following recommendations are provided as a guideline for the implementer of the RTP sender.

Parameter set NALUs can be transported using three different principles:

A.  Using a session control protocol (out-of-band) prior to the actual RTP session.
B.  Using a session control protocol (out-of-band) during an ongoing RTP session.
C.  Within the RTP packet stream in the payload (in-band) during an ongoing RTP session.

备注

建议使用A、B两种方法。如果使用C方法(in-band)需要使用可靠的方法来传输RTP，因为parameter set的丢失会导致后面编码失败。

因为参数集的更新会影响后面的编码，所以下面是使用B和C方法的一些建议:

When parameter sets are added or updated, SHOULD ensure that any parameter set is delivered prior to its usage.
When parameter sets are updated, SHOULD ensure that no NALU are needed before overwrite this parameter set.
In a multiparty session, one participant MUST associate parameter sets coming from different sources with the source identification whenever possible
Principles B and C MUST NOT both be used in the same session unless sufficient synchronization can be provided.
        Because adding or modifying parameter sets by using B and C in the same RTP session may lead to inconsistencies of the parameter sets

In case a loss of a parameter set is detected, recovery may be achieved using a Decoder Refresh Point procedure, for example, using RTCP feedback Full Intra Request (FIR)

8.5. Decoder Refresh Point Procedure Using In-Band Transport of Parameter Sets (Informative)¶

When a sender receives a request for a decoder refresh point, the encoder shall enter the fast update mode by using one of the procedures specified in Sections 8.5.1 or 8.5.2.
当发生参数集更新时,通过刷新解码器,可以确保整个解码器状态与最新的参数集匹配,避免产生解码错误
当发生网络中断或丢包时,通过在恢复点刷新解码器,可以丢弃当前解码器状态,并从恢复点开始使用最新参数进行解码,缩小丢包的影响
切换比特流时,通过刷新解码器可以确保解码器状态与新比特流匹配,成功切换streams

8.5.1. IDR Procedure to Respond to a Request for a Decoder Refresh Point¶

This section gives one possible way to respond to a request for a decoder refresh point. The encoder shall, in the order presented here:

1) Immediately prepare to send an IDR picture.

2) Send a sequence parameter set to be used by the IDR picture to be sent.

3) Send a picture parameter set to be used by the IDR picture to be sent.

4) Send the IDR picture.

5) From this point forward in time, send any other sequence or picture parameter sets that have not yet been sent in this procedure, prior to their reference by any NAL unit, regardless of whether such parameter sets were previously sent prior to receiving the request for a decoder refresh point.

8.5.2. Gradual Recovery Procedure to Respond to a Request for a Decoder Refresh Point¶

The encoder shall, in the order presented here:

1) Send a recovery point SEI message (see Sections D.1.7 and D.2.7 of `ITU-T`)

2) Repeat any sequence and picture parameter sets that were sent before the recovery point SEI message, prior to their reference by a NAL unit.

12. Informative Appendix: Application Examples¶

This payload specification is very flexible in its use, in order to cover the extremely wide application space anticipated for H.264.
However, this great flexibility also makes it difficult for an implementer to decide on a reasonable packetization scheme.
However, some preliminary usage scenarios are described here as well.

12.1. Video Telephony According to Annex A of ITU-T Recommendation H.241¶

介绍了 ITU-T 推荐的 H.241 的附录 A，即视频通话中的打包机制。
基于 H.323 的视频通话系统需要支持 H.241 的 Annex A 作为分组方案。

12.2. Video Telephony, No Slice Data Partitioning, No NAL Unit Aggregation¶

该文讨论了视频电话的实现，特别是传输参数设置的过程。在视频电话应用中，画面参数很少发生变化，所以在 SDP 协商过程中就可以发送所有必要的参数集，通常只需要一个。这样，就不需要发送任何参数集 NAL 单元，也不使用切片数据分区。编码器选择编码切片 NAL 单元的大小，以获得最佳性能，并采用内部刷新算法来清除由于数据包丢失而引起的问题。

12.3. Video Telephony, Interleaved Packetization Using NAL Unit Aggregation¶

本文介绍了基于 H.263 的 RFC 4629 数据包创建方案，通过将所有宏块分配到一个切片，然后将偶数切片组合为一个 STAP，将奇数切片组合为另一个 STAP，最终以 RTP 数据包传输。使用 STAPs 的原因是单独的切片数量太多，导致 IP / UDP / RTP 头开销太高。并且这种数据包方案有助于在有线和无线网络之间的通信。通关转换 STAPs 以及 RTP 数据包来传输 NAL 单元，有助于适应不同网络的设置。

12.4. Video Telephony with Data Partitioning¶

本文介绍了一种实现视频电话的方案，使用数据分割可在高丢包率下表现良好。数据分割需要某种形式的不均等误差保护才能发挥作用，而 RTP 环境中多数假设所有数据包的丢失概率相同。但是，可以通过接收方恢复强制（FEC）数据包等手段降低个别数据包的丢失概率。此机制不会增加系统的延迟，并且参数通过控制协议手段进行完整设置。

12.5. Video Telephony or Streaming with FUs and Forward Error Correction¶

指出应对数据包丢失最有效的方法是向前纠错（FEC）；在应用层使用 FEC，二进制码可以通过对不同数据包中具有相同位从而生成，并通过参数（n，k）说明，其中 k 是连接使用的信息分组数，n 是为 k 个信息分组而生成的总数据包数，即对于 k 个信息分组，n-k 个校验数据包将被生成；可以考虑将 NAL 单元分成若干个分片单元和基本单元，然后将纠错码应用于它们，以实现较好的纠错性能。将 FUs 与 FEC 相结合，即使编码器不生产等长的分片，也能够应用合理的 k 和 n 实现 FEC。使用 FUs 和 FEC，会产生一定开销，但总体表现与数据宏块内编码相当。

12.6. Low Bitrate Streaming¶

本文介绍了低比特率的流媒体传输方案，通过使用大的数据包和构造多个图像片段交错在一个 MTAP（多 SDP 源存取点）进行传输，以保证数据流的质量。文章提出，这种方案对于 H.264 也同样适用。对于低比特率的视频，使用大的数据包可以减少 RTP/UDP/IP 报头的开销，但是如果一个数据包丢失，会对视觉质量产生严重影响，因此建议使用 MTAP 的方案来避免丢包。

12.7. Robust Packet Scheduling in Video Streaming¶

本文介绍了 H.264 视频流的包调度技术，指出类似或更好的效果在 H.264 中是可行的。视频流客户端具有接收器缓冲区，可以存储相对较大的数据，但需要进行缓冲来维护连续播放。同时，对于重要性不同的编码图像，可以根据其在解码序列中的主观重要性进行排序，以优先发送重要的部分。通过重传机制，可以更快地恢复重要切片和分区的丢失数据。

13. Informative Appendix: Rationale for Decoding Order Number¶

13.3. Example of Robust Packet Scheduling¶

The communication system used in the example consists of the following components in the order that the video is processed from source to sink:

o camera and capturing
o pre-encoding buffer
o encoder
o encoded picture buffer
o transmitter
o transmission channel
o receiver
o receiver buffer
o decoder
o decoded picture buffer
o display