Lossless Compression.
Lossy Compression.Video signals are spatio-temporal signals or simply stated, a sequence of time varying images. The information they convey is "visual". A monochromatic still image can be mathematically represented by x(h,v), where x is the intensity value at the horizontal location h and vertical location v. The monochromatic video signal can be represented by x(h,v,t), where x is the intensity value at the h horizontal, v vertical and t temporal locations respectively. Figure 1 shows these representation of the video signal.
| Y | = 0.30R' + 0.59G' + 0.11B' |
| U | = -0.15R'- 0.29G' + 0.47B' |
| V | = 0.62R'- 0.52G' - 0.10B' |
Table. 1
The most common form of the video signal in use today is still analog. This signal
is obtained through a process known as scanning. In this section the analog
representation of the video signal and its disadvantages are discussed.This part
also describes the need towards digital representation of video signal. After
describing the need for compression of video signal, this paper describes the MPEG compression
technique for video signals.
Composite.
Component.
S-Video.Today, the technology is attempting to integrate the video, computer and telecommunication industry together on a single mutimedia platform. The video signal is required to be scalable, platform independent, able to provide interactivity, and be robust. The analog unfortunately fails to address these requirements. Moving to digital not only eliminates most of the above mentioned problems but also opens door to a whole range of digital video processing techniques which can make the picture sharper.
Digital video too has its share of bottlenecks. The most important one is the huge bandwidth requirement. Inspite of being digital, it thus still need to stored. The logical solution to this problem is digital video compression.
There are many different redandancies present in the video signal data.
spatial.
Temporal.
Psychovisual.
Coding.
Spatial redandancy occurs because neighboring pixels in each individual frame of a video
signal are related. The pixels in consecitive frames of signal are also related, leading
to temporal redundancy. The human visual system does not treat all the visual information
with equal sensitivity, leading to psychovisual redundancy. Finally, not all parameters
occur with the same probability in an image. As a result, they would not require equal
number of bits to code them (Huffman coding).
There are several different compression standards around today (CCITT recomandation H. 261). MPEG, which stands for moving pictures experts groups, is a joint coommitte of the OSI and IEC. It has been responsible for the MPEG-1(ISO/IEC 11172) and MPEG-2(ISO/IEC 13818) standards in the past and is currently developing the MPEG-4 standard. MPEG standards are generic and universal. There are three main parts in the MPEG-1 and MPEG-2 specifications, namely, Systems, Video and Audio. The Video part defines the syntax and semantics of the compressed video bitstream. The Audio part defines the same for audio bitstream, while the System part specifies the combination of one or more elementary streams of video and audio, as well as other data, into a single or multiple streams which are suitable for storage or transmision. The MPEG-2 standard consists of a fourth part called DSMCC, which defines a set of protocols for the retrieval and storage of MPEG data. We shall now examine the structure of a non-scalable video bitsream in some deatil to understand the video compression.
The video bitstream consists of video sequences. Each video sequence consists of a variable number of group of pictures(GOP). A GOP contains a variable number of pictures(p), Figure 3.
I or Intra pictures.
P or Predicted pictures.
B or Bidirectional pictures.The DCT oefficients are coded using a combination of two special coding schemes- Run length and Huffman. The coefficients are scaned in a zigzag pattern to create a 1-D sequence. MPEG-2 provides an alternative scanning method. The resulting 1-D sequence usually contains a large number of zeros due to DCT and the quantization process. Each non-zero coefficients is assosiated with a pair of pointers. First, its position in the block which is indicated by the number of zeros between itself and the prevoius non-zero coefficient. Second, its coefficient value. Based on these two pointers, it is given a variable length code from a lookup table. This is done in a manner so that a highly probable combination gets a code with fewer bits, while the unlikely ones get longer codes. However, since spatial redundandancy is limitted, the I Pictures provide only moderate compression.
The P and B pictures are where MPEG derives its maximum compression efficiency. It is done by a
technique called motion compensation(MC) based prediction, which exploits the temporal redundancy.
Since frames are closely related, it is assumed that a current picture can be modelled as a translation
of the picture at the previous time. It is possible then to accurately "predict" the data of one frame
based on the data of a previous frame. In P pictures, each 16x16 sized macroblocks is predicted from
macroblock of previously encoded I picture. Since, frames are snapshots in time of a moving object,
the macroblocks in the two frames may not correspond to the same spatial location. The encoder searches
the previous frame(for P-frames, or the frames before and after for B-frames) in half pixel increments
for other macroblock locations that are a close match to the information that is contained in the current
macroblock. The displacements in the horizontal and vertical directions of the best match macroblocks
from the cosited macroblock are called Motion vectors, Figure 5.
If no matching macroblocks
are found in the neighboring region, the macroblock is intra coded and the DCT coefficients are encoded.
if a matching block is found in the search region the coefficients are not transmitted, but a motion vector
is used instead.
For B pictures, MC prediction and interpolation is performed using reference frames present on either side of it, Figure 6.
In this paper, principles for video compression using MPEG standard have been discussed. I-, P- and B-frames and encoding techniques has been described. The MPEG compression algorithm is a clever combination of a number of diverse tools, each of which exploit a particular data redundancy. Spatial, Temporal, Psychovisual and coding redundancies were discussed in this paper. The end result is that the coded video needs a far lower bandwidth compared to the original, while maintaning extreamly good quailty. Currently, the technology is gearing up towards an exciting phase with the advent of HDTV and DVD. Video compression is a key factor in these new technologies and MPEG has become the most sophisticated industry standard. There many advantages in choosing MPEG. The MPEG guarantees a means for universal interoperability. It also reduces the cost of video compression codecs by triggering a mass production of ASICS.
article2