What Is MPEG Video Compression Standard?
There are many important ramifications of the technologies incorporated
into the the MPEG specifications, but what seems to get the most
press is the video compression system. Robust video compression and
transport approaches are essential for operations over emerging
ATM-based broadband ISDN. The MPEG video compression standard is discussed
in this paper.
Most physical entities convey some type of "information" and need a fixed number
of parameters towards this purpose. In many instances however, this fixed number
is prohibitively large for storage and transmission purposes. The compression
process attempts to represent the entity by employing fewer than the total
set of parameters. The compression technique can be divided into two parts.
If all the information is conveyed using the subset of the parameters, the
compression is called lossless. On the other hand, if less than the complete
information is conveyed, it is termed lossy compression.
Video signals are spatio-temporal signals or simply stated, a sequence
of time varying images. The information they convey is "visual". A monochromatic
still image can be mathematically represented by x(h,v), where x is
the intensity value at the horizontal location h and vertical location
v. The monochromatic video signal can be represented by x(h,v,t),
where x is the intensity value at the h horizontal, v vertical
and t temporal locations respectively. Figure 1 shows these representation
of the video signal.
Color video signal is merely a superposition of the intensity distribution of the
three primary color primitives(R,G,B) or equivalently of one luminance,
(Y) and two chrominance components(U,V). This is shown in the Table. 1.
||= 0.30R' + 0.59G' + 0.11B'
||= -0.15R'- 0.29G' + 0.47B'
||= 0.62R'- 0.52G' - 0.10B'
After a brief description of analog video signals, this paper analyzes the need
towards digital video and the need for video compression. Finally, the main part of this
paper alanyzes the MPEG video compression standard.
2.MPEG Compression Standard
The most common form of the video signal in use today is still analog. This signal
is obtained through a process known as scanning. In this section the analog
representation of the video signal and its disadvantages are discussed.This part
also describes the need towards digital representation of video signal. After
describing the need for compression of video signal, this paper describes the MPEG compression
technique for video signals.
2.1 Analog Video Signal
Analog signal is obtained through a process known as scanning. This is shown in
Figure 2. Scanning records the intensity values of the spatio-temporal signal only
in the h direction. This signal is coupled with the horizontal and vertical
synchronization pulses to yield the complete video signal. Scanning can be either
progressive or interlaced. Progressive scanning scans all the horozontal lines to
form the complete frame. In the interlaced scanning, the even and the odd
horizontal lines of a picture are scanned seperately yielding the two fields of
a picture. There are three main analog video standards.
In the composit standard, the luminance and the two chrominance components
are encoded together as a single signal. This is in contrast to the component
standard, where the three components are coded as three distinct signals. The
S-Video consists of seperate Y and C analog video signals.
Today, the technology is attempting to integrate the video, computer and
telecommunication industry together on a single mutimedia platform. The video signal
is required to be scalable, platform independent, able to provide interactivity, and
be robust. The analog unfortunately fails to address these requirements. Moving to
digital not only eliminates most of the above mentioned problems but also opens door
to a whole range of digital video processing techniques which can make the picture
2.2 Digital Video Signal
To digitize the spatio-temporal signal x(h,v,t), usually, the component form
of the analog signal is sampled in all three directions. Each sample point in a frame
is called a pixel. Sampling in the horizontal direction yields the pixels per
line, which defines the horizontal resolution of the picture. Vertical resolution
is controlled by sampling vertically. Temporal sampling determines the frame rate.
Digital video too has its share of bottlenecks. The most important one is the
huge bandwidth requirement. Inspite of being digital, it thus still need to
stored. The logical solution to this problem is digital video compression.
2.3 MPEG Compression Standard
Compression aims at lowering the total number of parameters required to represent
the signal, while maintaining good quality. These parameters are then coded for
transmission or storage. A result of compressing digital video is that it becomes
available as computer data, ready to transmitted over existing communication
There are many different redandancies present in the video signal data.
Spatial redandancy occurs because neighboring pixels in each individual frame of a video
signal are related. The pixels in consecitive frames of signal are also related, leading
to temporal redundancy. The human visual system does not treat all the visual information
with equal sensitivity, leading to psychovisual redundancy. Finally, not all parameters
occur with the same probability in an image. As a result, they would not require equal
number of bits to code them (Huffman coding).
There are several different compression standards around today (CCITT recomandation H. 261).
MPEG, which stands for moving pictures experts groups, is a joint coommitte of the OSI
and IEC. It has been responsible for the MPEG-1(ISO/IEC 11172) and MPEG-2(ISO/IEC 13818)
standards in the past and is
currently developing the MPEG-4 standard. MPEG standards are generic and universal. There are
three main parts in the MPEG-1 and MPEG-2 specifications, namely, Systems, Video and Audio.
The Video part defines the syntax and semantics of the compressed video bitstream. The Audio
part defines the same for audio bitstream, while the System part specifies the combination of
one or more elementary streams of video and audio, as well as other data, into a single or multiple
streams which are suitable for storage or transmision.
The MPEG-2 standard consists of a fourth part called DSMCC, which defines a set of protocols for
the retrieval and storage of MPEG data. We shall now examine the structure of a non-scalable video
bitsream in some deatil to understand the video compression.
The video bitstream consists of video sequences. Each video sequence consists of a variable number
of group of pictures(GOP). A GOP contains a variable number of pictures(p), Figure 3.
Mathematically, each picture is really an union of the pixel values of the luminance and the
two chrominance components. The picture can also be subsampled at a lower resolution in the
chrominance domain because the human eye is less sensitive to high frequency color shifts(more rods
than cones on the retina). There are three formats:
These formats are shown in Format.fig.
- 4:4:4---the chrominance and luminance planes are subsampled at the same resolution.
- 4:2:2---the chrominance planes are subsampled at half resolution in the horizontal
- 4:2:0---the chrominance information is sub-sampled at half the rate both vertically
Pictures can be devided into three main types based on their compression schemes.
I or Intra pictures.
P or Predicted pictures.
B or Bidirectional pictures.
The frames that can be predicted from previous frames are called P-frames. But what happens if transmission
errors occur in a sequnce of P-frames?. To avoid the propagation of transmission errors and to allow
periodic resynchronization, a complete frame which does not rely on information from other frames is
transmitted approximately once every 12 frames. These stand-alone frames are "intra coded" and are called
The coding technique for I pictures falls in the category of transform coding. Each picture is divided
into 8x8 non-overlapping pixels blocks. Four of these blocks are additionally arranged into a bigger
block of size 16x16, called macroblock. The DCT is applied to each 8x8 block individually, Figure 4.
This transform converts the data into series of coefficients which represent the magnitudes of the
cosine functions at increasing freqncies. The quantization process allows the high energy, low
frequency coefficients to be coded with greater number of bits, while using fewer or zero bits for
the high freuency coefficients. Retaining only a subset of the coeffients reduces the total number
of parameters needed for representation by a substantial amount. The quantization process also helps
in allowing the encoder to output bitstreams at specified bitrate.
The DCT oefficients are coded using a combination of two special coding schemes- Run length and Huffman.
The coefficients are scaned in a zigzag pattern to create a 1-D sequence. MPEG-2 provides an alternative
scanning method. The resulting 1-D sequence usually contains a large number of zeros due to DCT and
the quantization process. Each non-zero coefficients is assosiated with a pair of pointers. First, its
position in the block which is indicated by the number of zeros between itself and the prevoius non-zero
coefficient. Second, its coefficient value. Based on these two pointers, it is given a variable length
code from a lookup table. This is done in a manner so that a highly probable combination gets a code
with fewer bits, while the unlikely ones get longer codes. However, since spatial redundandancy is
limitted, the I Pictures provide only moderate compression.
The P and B pictures are where MPEG derives its maximum compression efficiency. It is done by a
technique called motion compensation(MC) based prediction, which exploits the temporal redundancy.
Since frames are closely related, it is assumed that a current picture can be modelled as a translation
of the picture at the previous time. It is possible then to accurately "predict" the data of one frame
based on the data of a previous frame. In P pictures, each 16x16 sized macroblocks is predicted from
macroblock of previously encoded I picture. Since, frames are snapshots in time of a moving object,
the macroblocks in the two frames may not correspond to the same spatial location. The encoder searches
the previous frame(for P-frames, or the frames before and after for B-frames) in half pixel increments
for other macroblock locations that are a close match to the information that is contained in the current
macroblock. The displacements in the horizontal and vertical directions of the best match macroblocks
from the cosited macroblock are called Motion vectors, Figure 5.
If no matching macroblocks
are found in the neighboring region, the macroblock is intra coded and the DCT coefficients are encoded.
if a matching block is found in the search region the coefficients are not transmitted, but a motion vector
is used instead.
The motion vectors can also be used for motion prediction in case of corrupted data, and
sophisticated decoder algorithms can use these vectors for error concealment( refer to article1).
For B pictures, MC prediction and interpolation is performed using reference frames present on either
side of it, Figure 6.
Compared to I and P, B pictures provides the maximum compression. There are other advantages of B pictures.
B pictures themselves never used for predictions and hence do not propagate errors. MPEG-2 allows for both
frame and field based MC. Field based MC is spatially useful when the video signal includes fast motion.
- Reduction of noise due to the averaging process.
- Use of future pictures for coding.
3. Concluding Remarks
In this paper, principles for video compression using MPEG standard have been discussed. I-, P- and B-frames
and encoding techniques has been described.
The MPEG compression algorithm is a clever combination of a number of diverse tools, each of which exploit
a particular data redundancy. Spatial, Temporal, Psychovisual and coding redundancies were discussed in this
paper. The end result is that the coded video needs a far lower bandwidth compared to the original, while
maintaning extreamly good quailty. Currently, the technology is gearing up towards an exciting phase with
the advent of HDTV and DVD. Video compression is a key factor in these new technologies and MPEG has become
the most sophisticated industry standard. There many advantages in choosing MPEG. The MPEG guarantees a means
for universal interoperability. It also reduces the cost of video compression codecs by triggering a mass
production of ASICS.
ISO ----International Standardization Organization.
IEC ----International Electrotechnical Commision.
GOP ----Group Of Pictures.
DCT ----Discrete Cosine Transform.
MC ----Motion Compensation.
DVD ----Digital Video Disk.
HDTV ----High Definition TeleVision.
- ISO IS 13818-1 -ITU-T Recomandation H.222.0, Information technology - Generic coding of moving picture
and associated audio information, part 1: systems,November,1994.
- ISO IS 13818-2 -ITU-T Recomandation H.222.0, Information technology - Generic coding of moving picture
and associated audio information, part 2: systems,November,1994.
- D. Le Gall. MPEG: A video compression standard for multimedia applications. Communications
of the ACM, April 1991.
- ISO WD 13818-9 -Draft ITU-T Recomandation H.222.x, Information technology- Generic Coding
of Moving pictures and associated audio information, part9: Real time Interface specifications,
- MPEG Faqs
- Introdution to MPEG
- MPEG compression by SHAN(sab@doc)
Last modified by k-c-rajh (email@example.com)