Digital Video : An Introduction
Information Systems Engineering
Department of Computing and
Department of Electrical and Electronic Engineering
Shanawaz Basith
Email : sab@doc.ic.ac.uk
2nd June 1996

MPEG : Standards, Technology and Applications


Abstract

This article looks at the MPEG-1 and MPEG-2 standards. The compression technology is explained and the data structures involved are described. The many applications of this technology are discussed.


Contents

1.Introduction
2.MPEG Compression
3.Applications of MPEG
4.Conclusions
5.References and Related Articles


1. Introduction

At the end of the eighties the audio and video industry faced the prospect of saturated markets and over capacity. What was required were new products and services that would capture consumers' imagination. The data capacity of existing digital storage and their transmission links limited its potential for further exploitation without a generic coding scheme for video and its associated audio.

The MPEG (Moving Pictures Experts Group) committee began life in late 1988 by the hand of Leonardo Chairigloione and Hiroshi Yasuda with the immediate goal of standardising video and audio for compact discs. A meeting between the International Standards Organisation (ISO) and the International Electrotechnical Commission (IEC) in 1992 resulted in a standard for audio and video coding, known as MPEG-1.

MPEG-2 became a bone fide standard in 1994 after a five day meeting of ISO and ITC in Singapore. The original application for MPEG-2 was all digital transmission of broadcast TV quality video, but now includes High Definition Television (HDTV). HDTV applications were to be covered by the MPEG-3 standard, but it was discovered that with some fine-tuning MPEG-2 could be used for this purpose and subsequently MPEG-3 was dropped.

This document aims to discuss MPEG-1 and 2. The companion document looks at the emerging MHEG standard.


2. MPEG Compression

The job of MPEG is to take analogue or digital video signals and convert them to packets of digital data that are more efficiently transported over a network. Being digital it has the following advantages : MPEG is derived from the original work by the Joint Pictures Expert Group (JPEG). The JPEG standard is for still images and is a lossy technique. It takes advantage of the nature of the human eye and removes redundant information that we just do not see.

The MPEG system consists of two layers :

I shall be focussing on the compression layer. Also the technology behind MPEG-1 and MPEG-2 are inherently the same and later some differences will be identified.

Decoding

A generalised decoding system for the audio and video streams is shown in figure 2.1. The system decoder extracts the timing information from the MPEG system stream and sends it to the other system components. The decoder also demultiplexes the video and audio streams from the system stream and passes it onto the appropriate decoder. The video and audio decoders decompress the information as specified in parts 2 and 3 of the MPEG standard respectively[2].

Figure 2.1 - General MPEG Decoding System

Figure 2.2 - MPEG Data Hierarchy

Data Hierarchy

Video Hierarchy

The standard defines a hierarchy of data structures in the video stream (see figure 2.2).

Video Sequence

Begins with a sequence header (and optionally other sequence headers), includes one or more groups of pictures and ends with an end-of-sequence-code.

Group Of Pictures (GOP)

This consists of a header and a series of one or more pictures intended to allow random access into the sequence.

Picture

This is the primary coding unit of a video sequence. A picture consists of three rectangular matrices representing luminance/brightness (Y) and two chrominance (Cb and Cr or U and V) values. Think about YCbCr as equivalent to the RGB representation. The conversion can be can be done by a linear transformation as shown in the equation below.

Slice

These are one or more adjacent macroblocks. The order of the macroblocks within a slice is from left-to-right and top-to-bottom. Slices are important for handling errors. If the bitstream contains an error, the decoder can skip to the start of the next slice. Having more slices in the bitstream allows better error hiding, but use space that could otherwise be used to improve picture quality.

Macroblock

A macroblock is built from a 2 by 2 matrix of blocks.

Block

A block is an 8-pixel by 8-line set. More technical information may be found in one of the references.

Audio Hierarchy

The MPEG standard defines a hierarchy of data structures that accept, decode and produce digital audio output. The audio stream, like the video stream consists of a series of packets. See figure 2.3. More information can be found in one of references given below.

Figure 2.3 - Audio Stream Structure

Inter-Picture Coding

Much of the information in a picture within a video sequence is similar to information in a previous or subsequent picture. The MPEG standard takes advantage of this temporal redundancy by representing some pictures in terms of their differences from other (reference) pictures. This is called inter-picture coding.

Picture Types

The MPEG standard specifically defines three types of pictures (frames) :

Lets consider these in turn.

Intra Pictures

Intra (or I) pictures, are coded using information only found in the picture itself. I-frames provide potential random access points into the compressed video data. I-frames only use transform coding and provide moderate compression ratios, typically 2-bits per coded pixel.

Predicted Pictures

Predicted (or P) pictures are coded with respect to the nearest previous I or P-frame. This technique is called forward prediction (see figure 2.4). Like I-frames, P-frames serve as a prediction reference for B-frames and future P-frames. However, P-frames use motion compensation to get a higher compression ratio then is possible for I-frames. Unlike I-frames, P-frames can propagate coding errors, because P-frames are predicted from previous reference (P or I) frames.

Figure 2.4 - Forward Prediction

Bidirectional Pictures

Bidirectional (or B) pictures use both a past and future picture as a reference. This technique is called bidirectional prediction as shown in figure 2.5. B-frames provide the most compression and do not propagate errors because they are never used as a reference. Bidirectional prediction also reduces the effects of noise by averaging two pictures.

Figure 2.5 - Bidirectional Prediction

Video Stream Composition

The MPEG algorithm allows the encoder to choose the frequency and location of I-frames. This choice is dependent on the applications need for random access and the location of scene cuts in the video sequence. In applications where random access is important, I-frames are used two times a second.

The encoder also chooses the number of B-frames between any pair of reference (I or P) frames. This choice is based on factors such as the amount of memory in the encoder and the characteristics of the material being encoded. A typical display order of frames is shown in figure 2.6.

The MPEG encoder re-orders pictures in the video stream to present the pictures to the decoder in the most efficient sequence. In particular, the reference pictures needed to reconstruct B-frames are sent before the associated B-frames. Figure 2.7 shows this ordering for the first section of the sequence in the example above.

Figure 2.6 - Typical Ordering of Frames

Figure 2.7 - Video Stream and Display Ordering

Motion Compensation

Motion compensation is a technique for enhancing the compression of P and B-frames by eliminating temporal redundancy. Motion compensation typically improves compression by a factor of three compared to intra-picture coding. Motion compensation algorithms work at a macroblock level. When a macroblock is compressed by motion compensation, the compressed file contains the spatial vector between the reference macroblock and the macroblock being coded (motion vectors) and the content difference between the reference and coded macroblocks (error terms).

Not all information in a picture can be predicted from a previous picture, e.g a scene where a door opens the visual contents of the new room cannot be predicted from the previous room. If this situation arises then the macroblock is coded using the transform technique.

The difference between B and P-frame motion compensation is that macroblocks in a P-frame use the previous reference only, while macroblocks in a B-frame are coded using any combination of previous or future reference picture. This implies that four codings are possible for each macroblock in a B-frame :

Backward prediction can be used to predict uncovered areas, that do not appear in previous pictures.

Intra-Picture (Transform) Coding

The MPEG transform coding algorithm includes these steps :

Both image blocks and prediction blocks have high spatial redundancy (e.g a single colour frame). To reduce this, the MPEG algorithm transforms 8 by 8 blocks of pixels or 8 by 8 blocks of error terms form the spatial domain to the frequency domain with the aid of the Discrete Cosine Transform (DCT). A two dimensional DCT can be performed by performing a one-dimensional DCT on the columns and then a one-dimensional DCT on the rows and combining.

Next the algorithm quantises the frequency coefficients. Quantisation is the process of approximating each frequency coefficient as one of a limited number of allowed values. The encoder chooses a quantisation matrix that determines how each frequency coefficient in the 8 by 8 block is quantised. Human perception of quantisation error is lower for high spatial frequencies (colour), so high frequencies are typically quantised more coarsely (with fewer allowed values).

Figure 2.8 - Transform Coding Operations

The combination of the DCT and quantisation results in many of the frequency coefficients being zero, especially those at high spatial frequencies. To take maximum advantage of this, the coefficients are organised in a zig-zag order to produce long runs of zeros (see figure 2.8). The coefficients are then converted to a series of run-amplitude pairs, each pair indicating a number of zero coefficients and the amplitude of a non-zero coefficient. These run-amplitude are then coded with a variable- length code, which uses shorter coders for commonly occuring pairs and longer codes for less common pairs.

Some blocks need to be more accurately coded then others, e.g blocks with a smooth intensity gradient. To deal with this inequality between blocks, the MPEG algorithm allows the amount of quantisation to be modified for each macroblock of pixels.

Differences between MPEG-1 and MPEG-2

The above list is by no means complete and you should refer to one of the references given below for a complete list.


3. Applications of MPEG

MPEG-1

Video Kiosk

The video kiosks or information kiosks, are a new opportunity for the use of video. Shops, car dealerships and banks are finding that automated information kiosks are a way to increase sales. Theses came about due to the addition of professional quality video found in MPEG-1. Information that was once laboriously displayed as slides can be brought to life with video. Using MPEG-1 and a standard hard-disc or CD-ROM, the developer can easily update their kiosk information on a regular basis. Advanced kiosk features become possible due the advent of friendly, personal help video tailored to the needs of each user. New applications are only limited by the imagination of the developer.

Video on Demand

Video on Demand (VOD) envelopes nearly all video based applications. However, the most common application of VOD is movies on demand. Initially in hotels and hospitals and eventually in our homes. All of us will have an interactive television set from which we can order movies on demand, at any given time. The missing ingredient for home use is low prices, interactive decoders (CD-I was one attempt). Given that this application is also considering the MPEG-2 standard, VOD to the home appears years away from a large scale implementation.

Video Dial Tone

The telephone and cable companies are preparing systems that will allow us to order our movies through the existing telephone infrastructure. Given the limited bandwidth of today's telephone lines. MPEG-1 becomes the ideal choice. Numerous pilot schemes are set up in early 1994 and are still being trial-ed. This application also has ramifications to the telecommunications industry and the corporate presentation market since very high quality presentations can now be produced and distributed afford-ably across standard telephone lines.

Training

The training market has historically used laser disc players to deliver high quality video. MPEG-1 is an ideal replacement for the analogue laser disc player. The advantages of MPEG are lower costs, ease of delivery, ease of updating and networking capability. The training market is a large user of video equipment and MPEG is considered a main stream product for this application.

Corporate Presentations

The presentation market evolved from 35mm slides to overheads to computer generated slide shows. As presentation software packages evolve, they are now beginning to support video. MPEG is a natural choice due to small file size, extremely high quality and ease of integration into existing presentation programs. Also almost all conference rooms now include a VCR and a television as well as a computer. Another presentation media that is growing in the U.S as well as Europe is Video-CD. This allows you to create a presentation of graphics with hot-spots or buttons and you can also include MPEG video.

Video Library

Organisations storing massive quantities of video cassettes for occassional playback, can benefit by encoding their existing and new material. Storing the MPEG files on a digital library video server allows long-term storage and multiple playback without any quality degradation, fast random access retrieval and multi-point playback.

Museums, large libraries, government agencies and news agencies using video footage, are now converting to digital video.

MPEG-2

CATV (Cable Television) CATV will use MPEG as the standard for compressing and decompressing video for distribution and for broadcasting. The need is perfect-quality video and the bandwidth is available to handle high bit rates. Because of this the industry has settled on MPEG-2 video although some are still using MPEG-1 on the interior.

DBS (Direct Broadcast Satellite)

This will use MPEG-2 audio and video for direct broadcast. DBS is a scheme of anywhere, anytime broadcasting. More information may be found at Alta Vista.

HDTV (High Definition Television)

A U.S consortium has already agreed to use MPEG-2. Refer to the related articles for further information.

Other Applications

Other applications include : Digital video tape; High Density CD; Video Conferencing and Digital Camcorders.


5. Conclusions

A lot of details have had to be left out, due to the size and nature of this report. This means that I did not explain some things in detail. However, this document should still serve as a good foundation on which to base further work.


5. References and Related Articles

References

Reference Author Score(/10)
Tristan's MPEG Pages Tristan Savatier (1996) 9
MPEG-2 Digital Video Technology and Testing
BSTS Solution Note 5963-7511E
Dragos Ruiu (Hewlett Packard - 1995) 8
Net Search Netscape (1996) 5
Unleashing a Broadcasting Revolution Roy Rubenstein (New Electronics on Campus Autumn 1995) 8
Multimedia Design Reaches a Higher Level David Thon (New Electronics on Campus Autumn 1995) 8
Never Mind the Quality, Look at the Quantity David Boothroyd (New Electronics on Campus Autumn 1995) 7
- the score unifies readability, usefulness, presentation and articulation

Related Articles

Article Author(s)
Interactive Television Keval Pindoria (khp1) and Gerald Wong Ping Hung (phgw)
Technology and Clinical Applications Amere Oakman (ao2) and Constantine Prouskas (cbp)
MPEG Image Compression and ATM Networks Arran Derbyshire (arad) and Chandrarath Kulanthai (ck4)
MHEG - A Multimedia Presentation Standard Stephen Done (srd2

Further Reading

Due to the limitations of this article a vast amount of information has had to be left out. You many want to consult these references for further information.

1. J. R. Allen et al., ``VCTV: A Video-on-Demand Market Test,'' AT&T
Technical Journal, Vol. 72, No. 1, January/February 1993, pp. 7-14.
2. ISO Committee Draft 11544, Coded representation of picture and audio
information -- Progressive bi-level image compression, ISO/IEC IS, 11544
3. Horst Hampel et al., ``Technical features of the JBIG standard for
progressive bi-level image compression,'' Signal Processing: Image
Communication, Vol. 4, No. 2, April 1992, pp. 103-110.
4. R. Hunter and A. H. Robinson, ``International digital facsimile coding
standards,'' Proceedings of the IEEE, Vol. 68, No. 7, July 1980, pp. 854-867.
5. CCITT Recommendation T.4, Standardisation of Group 3 facsimile
apparatus for document transmission, Geneva, 1980.
6. CCITT Recommendation T.6, Facsimile coding schemes and coding control
functions for Group 4 facsimile apparatus, Malaga&endashTorremolinos, 1984.
7. ISO Committee Draft 10918-1, Digital compression and coding of
continuous-tone still images -- Part 1: Requirements and guidelines, ISO/IEC DIS 10918-1, 1991.
8. R. W. Hamming, Coding and Information Theory, Prentice-Hall, Englewood
Cliffs, New Jersey, 1980, pp. 96-98.
9. A. S. Tannenbaum, Computer Networks, Prentice-Hall, Inc., Englewood
Cliffs, New Jersey, 1981.
10. N. Abramson, Information Theory and Coding, McGraw-Hill, New York, N.
Y., 1963, pp. 61-62.
11. Digital Compression and Coding of Continuous-Tone Still Images, Part 2:
Compliance Testing, ISO/IEC CD 10918-2, 1991.
12. A. N. Netravali and B. G. Haskell, Digital Pictures: Representation and
Compression, Plenum Press, New York, 1988.
13. D. A. Huffman, ``A Method for the Construction of Minimum-Redundancy
Codes,'' Proc. IRE, No. 40, September 1952, pp. 1098-1101.
14. J. Amsterdam, ``Data Compression with Huffman Coding,'' BYTE, Vol. 11,
No. 5, May 1986, pp. 99-108.
15. G. G. Langdon, Jr., ``An Introduction to Arithmetic Coding,'' IBM J.
Res. Develop., Vol. 28, No. 2, March 1984, pp. 135-149.
16. I. H. Witten, R. M. Neal, and J. G. Cleary, ``Arithmetic Coding for
Data Compression,'' Communications of the ACM, Vol. 30, No. 6, June 1987, pp. 520-540.
17. N. Ahmed, T. Natarajan, and K. R. Rao, ``Discrete Cosine Transform,''
IEEE Transactions on Computers, Vol. C-23, No. 1, January 1974, pp. 90-93.
18. R. J. Clarke, Transform Coding of Images, Academic Press, Orlando,
Florida, 1985.
19. H. Lohscheller, ``A subjectively adapted image communication system,''
IEEE Transactions on Communications, Vol. COM-32, December 1984, pp. 1316-1322.
20. M. Bierling and R. Thoma, ``Motion Compensating Field Interpolation
Using a Hierarchically Structured Displacement Estimator,'' Signal Processing, Vol. 11, No. 4, Dec. 1986, pp. 387-404.
21. ISO Committee Draft 11172, Information Technology-Coding of moving
pictures and associated audio for digital storage media up to about 1.5 Mbit/s, 1993
22. D. J. LeGall, ``MPEG: A Video Compression Standard for Multimedia
Applications,'' Communications of the ACM, Vol. 34, No. 4, April 1991, pp. 47-58.
23. R. K. Jurgen, ``Digital Video,'' IEEE Spectrum, Vol. 29, No. 3, March
1992, pp. 24&endash30.
24. A. Puri, ``Video Coding Using the MPEG-1 Compression Standard,'' Proc.
International Symposium: Society for Information Display, Boston,
Massachusetts, May 1992, pp. 123-126.
25. A. Puri and R. Aravind, ``Motion-Compensated Video Coding with Adaptive
Perceptual Quantisation,'' IEEE Transactions on Circuits and Systems
for Video Technology, Vol. CSVT-1, December 1991, pp. 351-361.
26. W. Lee, R.J. Gove, C.J. Read, and Y. Kim, ``UWGSP5: A Highly-Integrated Multimedia
System,'' accepted for IEEE Multimedia Magazine, April 1994.
27. W. Lee, J. Golston, R.J. Gove, and Y. Kim, ``Real-time MPEG Video Codec on a
Single-chip Multiprocessor,'' Digital Video Compression on Personal Computers:
Algorithms and Technologies, Proc. SPIE, Vol. 2187, Feb 1994
28. K. Guttag, R.J. Gove, and J.R. Van Aken, ``A Single-Chip Multiprocessor For Multimedia:
The MVP,'' IEEE Computer Graphics & Applications, Nov 1992, pp. 53--64
29. R.J. Gove, ``MVP: A Highly-Integrated Video Compression Chip,'' Proc. of Data
Compression Conference, March, 1994.
30. Didier J. Le Gall, "The MPEG video compression algorithm," Signal Processing:
Image Communication, Vol. 4, No. 2, April 1992, pp.129-140.


© Shanawaz Basith - Last Updated : 2nd June 1996