Video signals are spatio-temporal signals or simply stated, a sequence of time varying images. The information they convey is "visual". A monochromatic still image can be mathematically represented by x(h,v), where x is the intensity value at the horizontal location h and vertical location v. The monochromatic video signal can be represented by x(h,v,t), where x is the intensity value at the h horizontal, v vertical and t temporal locations respectively. Figure 1 shows these representation of the video signal.
|Y||= 0.30R' + 0.59G' + 0.11B'|
|U||= -0.15R'- 0.29G' + 0.47B'|
|V||= 0.62R'- 0.52G' - 0.10B'|
Today, the technology is attempting to integrate the video, computer and telecommunication industry together on a single mutimedia platform. The video signal is required to be scalable, platform independent, able to provide interactivity, and be robust. The analog unfortunately fails to address these requirements. Moving to digital not only eliminates most of the above mentioned problems but also opens door to a whole range of digital video processing techniques which can make the picture sharper.
Digital video too has its share of bottlenecks. The most important one is the huge bandwidth requirement. Inspite of being digital, it thus still need to stored. The logical solution to this problem is digital video compression.
There are many different redandancies present in the video signal data.
Spatial redandancy occurs because neighboring pixels in each individual frame of a video signal are related. The pixels in consecitive frames of signal are also related, leading to temporal redundancy. The human visual system does not treat all the visual information with equal sensitivity, leading to psychovisual redundancy. Finally, not all parameters occur with the same probability in an image. As a result, they would not require equal number of bits to code them (Huffman coding).
There are several different compression standards around today (CCITT recomandation H. 261). MPEG, which stands for moving pictures experts groups, is a joint coommitte of the OSI and IEC. It has been responsible for the MPEG-1(ISO/IEC 11172) and MPEG-2(ISO/IEC 13818) standards in the past and is currently developing the MPEG-4 standard. MPEG standards are generic and universal. There are three main parts in the MPEG-1 and MPEG-2 specifications, namely, Systems, Video and Audio. The Video part defines the syntax and semantics of the compressed video bitstream. The Audio part defines the same for audio bitstream, while the System part specifies the method of combining into a single stream, one or more video and audio elementary streams previously, Fig. 1. The MPEG-2 standard consists of a fourth part called DSMCC, which defines a set of protocols for the retrieval and storage of MPEG data. We shall now examine the structure of a non-scalable video bitsream in some deatil to understand the video compression.
The video bitstream consists of video sequences. Each video sequence consists of a variable number of group of pictures(GOP). A GOP contains a variable number of pictures(p), Figure 3.
The DCT oefficients are coded using a combination of two special coding schemes- Run length and Huffman. The coefficients are scaned in a zigzag pattern to create a 1-D sequence. MPEG-2 provides an alternative scanning method. The resulting 1-D sequence usually contains a large number of zeros due to DCT and the quantization process. Each non-zero coefficients is assosiated with a pair of pointers. First, its position in the block which is indicated by the number of zeros between itself and the prevoius non-zero coefficient. Second, its coefficient value. Based on these two pointers, it is given a variable length code from a lookup table. This is done in a manner so that a highly probable combination gets a code with fewer bits, while the unlikely ones get longer codes. However, since spatial redundandancy is limitted, the I Pictures provide only moderate compression.
The P and B pictures are where MPEG derives its maximum compression efficiency. It is done by a
technique called motion compensation(MC) based prediction, which exploits the temporal redundancy.
Since frames are closely related, it is assumed that a current picture can be modelled as a translation
of the picture at the previous time. It is possible then to accurately "predict" the data of one frame
based on the data of a previous frame. In P pictures, each 16x16 sized macroblocks is predicted from
macroblock of previously encoded I picture. Since, frames are snapshots in time of a moving object,
the macroblocks in the two frames may not correspond to the same spatial location. The encoder searches
the previous frame(for P-frames, or the frames before and after for B-frames) in half pixel increments
for other macroblock locations that are a close match to the information that is contained in the current
macroblock. The displacements in the horizontal and vertical directions of the best match macroblocks
from the cosited macroblock are called Motion vectors, Figure 5.
If no matching macroblocks are found in the neighboring region, the macroblock is intra coded and the DCT coefficients are encoded. if a matching block is found in the search region the coefficients are not transmitted, but a motion vector is used instead.
For B pictures, MC prediction and interpolation is performed using reference frames present on either side of it, Figure 6.
The first step in compression is to translate the information in the picture into the frequncy domain. The red, green and blue intensity information in each pixel is translated into Y and (U,V). The pixels are grouped together into rectangular areas called blocks, and groups of blocks called macroblocks. These blocks are then translated into frequency information using the DCT. The pixels of the blocks are scanned in a zigzag order to increase the runs of zero c
The MPEG compression algorithm is a clever combination of a number of diverse tools, each of which exploit
a particular data redundancy. Spatial, Temporal, Psychovisual and coding redundancies are discussed in this
paper. The end result is that the coded video needs a far lower bandwidth compared to the original, while
maintaning extreamly good quailty. Currently, the technology is gearing up towards an exciting phase with
the advent of HDTV and DVD. Video compression is a key factor in these new technologies and MPEG has become
the industry standard.
There many advantages in choosing MPEG.
Guarantees a means for universal interoperability.
Reduces the cost of video compression codecs by triggering a mass production of ASICS.