Video Networking

Video Transmission Over Broadband Networks


New networking technologies are now paving the way for broadband integrated services. This is providing an environment for the widespread availability of a wide range of true video networking services. The ATM networking standard and the MPEG video compression standards are becoming increasingly important in this area. What are these technologies and how do they relate to the new interactive video service, 'video on demand'?

By A.Derbyshire and K.C.Rajh, 17th June 1996

Contents


The integration of network services

The telephone has developed into an efficient long distance communication tool. It provides the capability to communicate with other people all over the world by voice. The main assets of telephone networks are availability, real time response and user-friendliness. These networks have experienced growth in terms of size and use and the significant development of digital signalling and computer data communications.

The availability asset of the public telephone networks made them an attractive option for basic data communications. Systems to transmit computer data over these analogue networks were developed, but this was soon recognised as being a non-optimal solution to data communications. This is because the limited bandwidth and high noise attributes of telephone networks, although adequate for voice transmissions, can not provide reliable high rate data transfer.

As computer systems advanced, the interconnection demands increased in terms of transfer rate, and soon the telephone network dimensions were exceeded. For high speed data communications, specialised data networks were established with networking technology specifically designed to transport data. This service specific dimensioning of networks formed a general trend throughout communications. All services, from voice, through data and to television distribution, had their own networks.

This disadvantages of this situation are that:

In 1984, a solution was suggested to these problems by the international telecommunications standards body of the time - the Consultative Committee for International Telecommunications and Telegraphy (CCITT). A network concept was proposed which would integrate different service types onto a single network. This was known as the Integrated Systems Digital Network (ISDN).
The CCITT stated that:
`an ISDN is a network ... that provides end-to-end digital connectivity to support a wide range of services, including voice and non-voice services, to which users have access by a limited set of standard multi-purpose user-network interfaces'

The Integrated Services Digital Network

As stated, ISDN is a digital system, designed to transport voice and other services by digital signalling. The ISDN standard interfaces provide a series of channels of different characteristic rates.

Channel Bit rate (kbit/s) Interface
B 64 Basic access
H0 384 Primary rate access
H11 1536 Primary rate access
H12 1920 Primary rate access
D16 16 Basic access
D64 64 Primary rate access

The standard interface, called basic access, comprises of two 64kbit/s channels and a 16kbit/s signalling D channel. Another interface is offered also, called primary rate access, which provides bit rates up to 2Mbit/s. The channels can be mixed and duplicated to give a range of gross bit rates. The basic rate was specifically dimensioned to handle voice transmissions as digital data, and it was derived as:
	Bandwidth of voice signal    = 3.4 kHz
	Nyquist sampling frequency   = 8 kHz
	Sampling resolution          = 8 bit

	Bit rate                     = 64 kbit/s
The ISDN concept was first recommended by the CCITT in 1984, and then enhanced in 1988. The aim of the ISDN concept was to provide:

The ISDN transfer mode

The term transfer mode refers to the way in which information is transmitted and switched through a network. An important enhancement of ISDN over standard telephone networks is that it has an extra transfer mode which has been tailored for handling data communications as well as a circuit switching transfer mode.

The original telephone networks used circuit switching to establish the connections in the network. This system connects the users together via switches across the network to form a circuit. Once a connection is established, it remains throughout the duration of the communications session, therefore a constant amount of network resources are allocated to the users.

This has generally remained as the switching mode used for speech despite the fact that conversations are up to 50% silent each way. A technique to reassign this silence is used for expensive links (e.g. satellite links) and is known as Time Assignment by Speech Interpellation. However, this is not cost effective for most links and so circuit switching has remained in telephone networks.

Typically, data signals are generated in sparsely separated bursts, with silent periods exceeding more than 50% of the total connection time. A transfer mode to optimise the use of network resources under these circumstances was devised known as packet switching. With this system, a digital data stream is broken up into pieces known as packets, which are then launched individually onto the network. Packets will not be sent during periods of silence, so the important consequence is that network resources are only used when data is actually being transferred.

The 64 kbit/s ISDN was designed to offer both circuit switching and packet switching services to cater for voice and other types of services. Speech packetisation was experimented with in the 1970's (by the Advanced Research Projects Agency (ARPA) ) but overall it was learned that although it is feasible in principle, significant modifications to existing protocols were necessary in order to achieve an acceptable quality of voice service. This is because varying delays are introduced as packets travel through the network paths and are processed at the switches. These delays cause problems concerning the real-time nature and correct reconstruction of the speech streams.

Supporting network services efficiently

ISDN was designed to provide packet switching to give it the ability to handle bursty data applications efficiently. This idea can be extended more generally by considering the natural information rate of network applications.

For a digital network, the application source will typically generate a bit stream of a given rate. For instance, a source could be a person speaking in a conversation, where the appropriate hardware will generate a bit stream with a bit rate in the order of 64 kbit/s. However, a bit stream may not necessarily convey a constant amount of information. In terms of speech, no information will be conveyed at all during silence, so the natural information rate is not constant.

The important extension to this idea is that the natural information rate of a source is a stochastic process with respect to time. The figure below shows such a stochastic process s(t), which represents the fluctuations of the natural bit rate of a source.

Fig. 1.0. Natural bit rate fluctuation in time

Acknowledgement to the source : 'Asynchronous Transfer Mode, Solution for Broadband ISDN' - second edition by Martin De Prycker.

De Prycker states that three important parameters of this process are: The information from a source can be obtained from a generated bit stream by encoding only non-redundant data. Systems to do this are known as compression systems, because they compress the size of the generated data down towards the minimal natural information size. The result is that a source has variable bit rate of mean value E[ s(t) ] and burstiness B, which are characteristic to it.

Packet switching architectures are well suited to this because packets need only be sent when non-redundant data is available, i.e. packets can be sent with the fluctuating information bit rate of a source. A constant bit rate channel with a fixed bandwidth usage, will either can transfer the information inefficiently or with a quality reduction. This is due to a trade-off between two situations:

These principles are important to the concept of the efficient use of network channels, especially for services that are braodband, in which case a network bandwidth cannot be affordably wasted.

Broadband integrated services

ISDN is an important concept because it is a step towards the integration of network services, but the system has a relatively limited bandwidth which makes it only capable providing transmission rates of up to 2 Mbit/s. By taking a step back from ISDN and making a general observation of network services it is possible to appreciate the scope of ISDN. The following table gives some examples of services, their typical transmission networks and their quantative parameters.

Service Network E[ s(t) ] Burstiness B
Voice PSTN 32 kbit/s 2
Interactive data PSTN 1 - 100 kbit/s 10
Bulk data ISDN / SMDS 1 - 10 Mbit/s 1 - 10
Video Terestrial / Cable TV 1.5-15 Mbit/s 2 - 3

It is clear that ISDN would only support the services in the range of voice to medium rate data transmissions. ISDN was not originally designed to transport high speed data and moving images (video), but in the 1980's the demand for these services started to be recognised. Such high speed services are known as broadband services because of the large amount of network bandwidth which is required for transmission. The CCITT put forward the recommendation of 'Broadband - ISDN' (B-ISDN) in an attempt to develop an integrated network capable of supporting a larger range of services.

B-ISDN

CCITT defined broadband with reference to ISDN as:
... a service or system requiring transmission channels capable of supporting rates greater than the primary rate.
It was first proposed that B-ISDN was to be an enhanced version of ISDN achieved by just adding broadband channels and broadband user-network interfaces to the existing ones. However, concerns arose as to the suitability of this concept. This is because the channels would have had to of been dimensioned fairly rigidly to the then contemporary broadband services, which would have made the channels potentially unsuitable for any unforseen future services. Also, the CCITT could not come to a decision as to the traffic type orientation of the channels, i.e. whether they should have a circuit-type or burst-type traffic orientation.

The following factors influenced the overall design of B-ISDN:

It was identified that B-ISDN should be an extremely flexible network capable of catering for the entire range of contemporary and potential future services. CCITT gave the following recommendation, which is a natural extension from ISDN:
'A key element of service integration is the provision of a wide range of services to a broad variety of users utilising a limited set of connection types and multipurpose user-network interfaces.'
The problems of the first technical concepts of B-ISDN were never resolved, since a different principle was put forward to be the solution for B-ISDN. CCITT abandoned the original concepts and eventually stated:
Asynchronous transfer mode (ATM) is the transfer mode for implementing B-ISDN...
ATM was this different principle, different because it was effectively wiping the ISDN slate clean and starting afresh with a more flexible solution. It was designed as a universal transfer mode with the capability to provide solutions to the following B-ISDN requirements:

Broadband networking and video

All these developments are significant to the move towards the transmission of moving pictures as video signals through networks. The B-ISDN using ATM, is designed to be able to provide widespread interactive video services as well as the distribution services common to the existing television distribution networks. This means that B-ISDNs could provide a full range of video services to homes and businesses, with the potential accessibility of the current telephone system.

The CCITT classified interactive and distribution services generally as follows:

Interactive Services

Distribution Services

Out of these services, video type services pose the greatest challenges to networking because of their sensitivity to delays and high bit rates (and therefore broadband nature). Video services are characterised as follows:

Video service Generated bit rate (Mbit/s) Average natural bit rate E[s(t)] (Mbit/s) Burstiness B
Video-telephony 1 - 10 0.2 - 2 5
Video-conferencing 10 - 100 1 - 10 1 - 5
Videotex/video retrieval 10 - 100 1 - 10 1 - 20
TV services 50 - 100 1.5 - 15 2 - 3
HDTV services 100+ 15 - 150 1 - 2

With video networking, speed is of the essence. In order to transport these services effectively and use network bandwidth efficiently, a fast variable bit rate network transfer mode and a powerful video compression system are required. The generally accepted solutions are the ATM network concept, and MPEG compression. Significantly, these two solutions combine in essence to good effect, giving a feasible form for flexible video networking.

The Asynchronous Transfer Mode

ATM was developed in the early 1980's with the objective of providing a transfer mode with very high throughput. It is based on the principle of fast packet switching, which is simply an evolved form of conventional packet switching - essentially ATM is not a totally new scheme. The ATM system concept has been developed in line with advances in modern digital technologies, namely optical fibre links and high speed electronics, which are the foundations of high speed communications.

Technological influences on the development of the ATM system concept

The switching and link technology of a network ultimately determines the speed at which data can be moved through the network. For a packet switching architecture, packets are constructed from the source data at the sender node and then are lauched onto the network. Each packet will move through the network and is routed correctly by the intermediate network nodes (or switches). On receipt of a packet, a node will perform some processing on the packet header, which will contain routing information and error checking data. The packet will then be sent to the next appropriate node on it's route and eventually, it will reach the destination node where it will be reconstructed with the other packets into the information stream.

It is the routing and error checking concepts performed by the nodes and the associated technologies which determine the overall transmission delays in the network. The noise suseptability (an therefore the error suseptability) and bandwidth of link technologies have a big influence on the way in which such concepts are designed. All this determines a networks ability to handle delay sensitive real-time applications (e.g. video) and the overall transfer rate. Such ATM system concepts have developed to be used with advanced networking technologies.

High speed electronics

At present, semi-conductor based electronics must be used to construct the node processing elements. Current systems will be able to process packets to give a transfer rate in the order of 108 bits per second (100+ Mbit/s). Electronic systems are currently considered to be a bottleneck in networks consisting of electronic and optical technologies.

Optical technologies

Fibre optic cables provide a very wide transmission bandwidth, greater than that of electrical cables. They are also less suseptable to channel noise than electrical cable, and so offer a lower probability of bit errors occuring. Current systems are able to provide enough bandwidth for transmission also in the order of 108 bits per second, although the theorectical limits of optical systems extend upto 1012 bits per second. All optical networks systems where even the switching is performed by optical elements (e.g. optical logic/computers) are currently being experimented with by AT&T. This would remove the electronic bottlenecks of networking and provide very high transfer rates.

The ATM system concept - fast packet switching

ATM packets are small and are referred to as cells. They have been dimensioned to a size suitable for delay sensistive real-time applications which was finalised to be a total of 53 bytes, comprising of a 5 byte header and a 48 byte information payload. Small packets keep processing delays (queuing delays) in network nodes to a minimum, which guarantees a low overall delay. The packet length is fixed in order to keep system complexity minimal, where the motivation is also minimising processing delays.

The error control concept of ATM was modified from previous packet switching architectures. Such architectures would perform some degree of error control at every intermediate network node between the edges of the network (i.e. the communicating nodes). The suseptability of the older links to noise meant that such substantial error control systems were necessary. However, modern high-quality links (e.g optical) are less error prone, so ATM was designed with error control functions only at the edges of the network in order to reduce delays due to error control in the intermediate network nodes.

ATM is connection-orientated which means that before information can be transferred across the network, a logical/virtual connection must be set up. During setup, the necessary resources for the session will be allocated, but only if they are available. If not, then the connection is refused, which is a similar connection mode to circuit switching. With this scheme, packet flow control need not be performed on the packet queues at each network node to check for overloading. This is because the resource checks will have been performed in advance for all the connections so overloading is unlikely. Overall, the absence of flow control reduces the complexity of the nodes and therefore delays.

It is important to note that the concepts of ATM refer only to switching and multiplexing techniques. B-ISDN is intented to actually be transmitted synchronously at the physical level (e.g. by SONET/SDH). It is the cell streams which are mixed asynchronously by time-division multiplexing as described by the ATM concept. As well as being an important high speed transfer mode, it should be noted that ATM is extremely flexible and efficient, as demanded by the B-ISDN concept. This makes it a desirable networking standard for all services from voice to video.

The B-ISDN protocol reference model

This model is a representation of the information flows in ATM networks. The model is a series of planes, each of which contain a layered architecture. The model is shown schematically below:

Fig. 2.0. The B-ISDN protocol reference model



The function and associated information of the planes is as follows:

Plane Function Information flow
User Transfer of user application data User application
Control Call and connection control Signalling related to calls and connections
Management ( layer / plane management) Network supervision Network status and performance


A full description of the planes is out of scope here, but for more details see "Standardisation for ATM and Related B-ISDN Technologies or the references given in the bibliography section below.

Outline of the user plane

The user plane is divided into 4 parts, the physical layer, the ATM layer, the ATM adaption layer (AAL) and the higher layers. Each layer uses the services of the lower layers and in turn provides services to the layer above it. This layered structure was adopted to allow for indepence in the design and inplementation of each layer (e.g. for the physical layer, ATM could be implemented on copper cable or optical independently).

The higher layers

The user services are supported in these layers. Four classes of service have been identified:

The ATM adaption layer

The AAL converts application specific data into ATM data units in order to provide support for user applications. All AAL functions are performed at the edges of the network, and all AAL information is carried within the ATM cell information fields transparently by the ATM layer.

The ATM layer

This layer takes and returns data from the AAL and generates and interprets the cells respectively. The layer is service independent, it transports the data according to the protocol information in the cell headers transparently.

The physical layer

This is the lowest level layer, and it is responcible for the transmission of ATM celss as bitstreams across a physical medium.

MPEG Compression For Digital Video.

What is MPEG?

The need for digital video compression resulted in several compression standards in the past. The MPEG (Moving Picture Experts Group) committee began its life in late 1988 with the immediate goal of standadizing video and audio for compact discs. The MPEG is joint committee of the ISO and IEC. It has been responsible for MPEG-1 and MPEG-2 standards in the past and is currently developing the MPEG-4 standard. The MPEG standards are generic and universal in the sense that they merely specify a compressed bitstream syntax. There are three main part of the MPEG-1 and MPEG-2 specifications, namely, systems, video and audio. The video part defines the syntax and semantics of the compressed video bitstream. The audio part defines the same thing for the audio bitstream, while the system part addresses the problem of mutiplexing the audio and video streams into a single system stream with all necessary timing information.

Table -1 shows the typical bit rates, The purpose of the standard, and screen size for different standards specified by the MPEG.

Standard Bit rate Purpose Screen size
MPEG-1 1.14 to 3.0 mbps Delivery of video for a CD-ROM 352x240 pixels at 30 for NTSC
MPEG-2 6 to 8 mbps Broadcast quality compressed video 720x480 pixels at 60 field per second for NTSC and 720x576 pixels at 50 field per second for HDTV
MPEG-4 64 kbits/s Low bit rate video phones, interactive databases, interactive newspapers etc Under development.
Table -1

The MPEG-3 was targeted for HDTV. However it was discovered that with some tweaking MPEG-2 would work for HDTV.

Formal description of the video signal.

Video signals are spatio-temporal signals or simply stated, a sequence of time varying images. A monochromatic video signal can be mathematically represented by x(h,v,t), where x is the intensity value at the h horizontal, v vertical and t temporal locations respectively. The color video signal is a superposition of the three main color primitives (R,G,B) or equivalently of one luminance(Y) and two chominance components(U,V). To digitize the spatio-temporal signal x(h,v,t), usually, the component form of the analog signal is sampled in all three directions. Each sample point in a frame is called a pixel. The sampling process yields the complete set of parameters necessary to represent a digital video signal. For example sampling in the hrizontal direction yields the pixels per line parameter, which defines the horizontal resolution.

The MPEG video Compression.

There are many imporatnt ramifications of the technologies incoporated into the MPEG specifications, but what seems to get the most press is the video compression system. The basic job of MPEG is to take analog or digital video signals and convert them to packets of digital information that are more efficient to transport on modern networks. MPEG compress the video into much less information, consuming less transmission bandwidth.

Video Signal Hierarchy.


figure-3 The hierarchy of video signal.

At the top level of the hierarchy, the video bitstream consists of video sequences. MPEG-1 allows only progressive sequences, while MPEG-2 allows both progresssive and interlaced sequences. Each video sequences contains a variable number of GOP. A GOP contains a variable number of pictures(P). A picture can either be a frame picture or a field picture. In a frame picture, the two fields (Y, (U,V) ) are coded together to form a frame, while field picture is a coded version of the individual field. In MPEG the video frames/pictures are broken down into 8x8 pixels regions called blocks. Four of these blocks can be put together to create a 16x16 macroblock. The macroblocks are then grouped together into runs of macroblocks called slices. The slice structure allows the receiver to resynchronize at the begining of a slice in case of data corruption because each slice begins with a unique header, figure-3.

Inside each GOP, two kinds of coding is permitted; Intra frame coding and inter frame coding. The intra coding of frame proceeds without any references to other frames exploiting only its spatial redundancy. The intra coded frames(I-frames) privides the access point to the coded sequence.

The inter coding of a frame uses motion compensated pridiction from previous or subsequent frame, in order to exploit not only spatial but also temporal redundancy. In MPEG algorithm two kinds of inter coded frames are distinguished. P-frames, that are motion compensated from a past I or P-frame, and B-frame that require both past and future reference frames for motion compensation. Since B-frames uses both past and future frames for prediction, the highest degree of compression are obtained for B-frames but they cannot be used as a reference for prediction, figure-4 .


figure-4 The P and B frame prediction.

Inter Frame coding ( Motion Compensation(MC) )

The MPEG derives its maximum compression from P and B-frames. This is done by a technique called motion compensation(MC) based prediction, which exploits temporal redundancy. Since frames are closely related, it is assumed that a current picture can be modelled as a transilation of the picture at a previous time. It is possible then to accurately predict the data of one frame based on the data of a previous frame. The encoder searches the previous frame(for P-frames, or the frames before and after for B-frames) in half pixel increments for other macroblock locations that are close match to the information that is contained in the current macroblock, figure-5. The displacement in the horizontal and vertical directions of the match macroblock from the cosited macroblock are called motion vectors. If a matching block is found in the search region the motion vectors are encoded. If no matching is found in the neighboring region, the macroblock is intra-coded and the DCT coefficients are encoded. For B pictures, MC prediction and interpolation is performed using reference frames present on either side of it. B pictures themselves never used for prediction and hence do not propagate errors.

figure-5 The search window for motion compentation.

Intra (I-frame) coding

The I-frame coding involves the following steps.

Direct Cosine Transform.

The first step in the compression is to transilate the information in the picture into the frequency domain. The R,G and B intensity information in each pixel is transilated into Y and (U,V). Each picture is divided into 8x8 pixel blocks. Four of these blocks are additinally arranged into a bigger block of size 16x16, called macroblocks, figure-6.

figure-6 Intra coding..

The DCT is applied to each 8x8 block individually to transform the data into frequency information. This transformation converts the data into a series of coefficients which represent the magnitudes of cosine functions at increasing frequncies. The low frequncy coefficients contain more energy than the high frequncy ones.

Quantization and coding.

The high frequency(low energy) coefficients can afford to be dropped because the eye lacks the ability to detect high frequency changes. This means that the high energy(low frequency) coefficients can be coded with a greater number of bits, while using fewer or zero bits for low enrgy coefficients. The quantization step drops off some of the least significant bits of information, making some of the coefficients go to zero (high frequency cofficients). These coefficients are then entropy-coded. Entroy coding converts the coefficients into variable bit length codes, with the most common coefficients being coded with the fewest number of bits (Huffman coding).

As we have seen that P-frame(and B-frame) coding predicts frames from the previous frames(past and future frames for B-frame). This implies that any error in the P-frames(remember B-frames never used for prediction.) will be propagated through the transmission. To avoid the propagation of errors and to allow periodic resynchronization, I-frames(stand-alone frames) are transmitted approximately once every 12 frames.

The Movies below describes the effect of error forwarding.
Click here for the normal video.
Click here to see the error forwarding on the p-frames.

MPEG-2 the successer of MPEG-1

The MPEG-1 supports strictly the progressive video. Unfortunately, today's TV scanning pattern is interlaced. This introduce a duality in block coding; do local redundancy areas(blocks) exists exclusively in a field or frame? This and other advanced requirements (eg. Support for HDTV) lead to the need for a new standard. The MPEG-2 video standard specifies the coded bit stream for high quality video. As a compatible extension, MPEG-2 video builds on the completed MPEG-1 video standard, by supporting interlaced video formats and a number of other advanced features. The following lists some of the MPEG-2 features. But This is not a complete list by any means.

A very important goal of MPEG-2 was to extend the video formats capable of being carried(scalability). This is useful in application areas such as video communication, video on ATM, HDTV with embedded TV etc. Scalability enables different decoders to construct different versions of the same video source by using sub-set of the total encoded bitstream. This include spatial scalability, temporal scalability, SNR scalability, and datapartitioning.

Spatial scalability.

Specifies carrying a video signal in a two part format that lets inexpensive decoders extract a low-resolution signal, and with additional processing in more capable decoders extract a higher resolutio picture using more data and bandwidth. This provides a smooth transition to the HDTV system maintaining compatibility with existing standard TV systems.

Temporal scalability.

MPEG-2 allows one signal to be transmitted and displayed at different frame rate.

Signal-to-Noise-Ratio (SNR) scalability.

Allows one encoded signal to be compatible with diffrent levels of decoding quality.

1.4.4 Data partitioning scalability.

Allows the MPEG-signal to be transmitted over a two priority channel with one channel containing critical information such as the DC values, motion vectors etc. The other channel carries less critical information such as higher order coefficients etc. This two-part transmission can be well adapted to systems like ATM where cell loss priority bit in the cell header can be used to mark cell discard priority.

Another additional feature of MPEG-2 format is the concept of Levels and Profiles. This serves the need of different kinds applications. This is aided by defining several levels of decoders, and several profiles of video sources. The level define limits on the algorithmic complexity that may be used in the video signal. The profile define the resolution and the quantity of the video. Table-2 shows the levels and their applications.

LEVEL Application
Simple Intended for software decoders(without B-frames)
main Cable TV and satellite uplink compression
spatial HDTV
SNR Saptial with SNR scalbility
High SNR with 4:4:4 chrominance in the macroblock

Table-2

MPEG system part.

The system documents(ISO/IEC 13818-2 for MPEG-2) specifies two systems, one for multiplexing together the video, audio and data of single program to be transmitted in a relatively error free envirement into the program stream, and another system, the transport stream,which can be used for broadcast, VOD and cable TV. The figure-7 describes the system part.

Program stream specification descibe how to encode the data from the multiple set of video, audio, and data ( Packetized Elementary Streams:PES) within a single set of varialble-length packets. These packets have headers that specifies timing information and buffer information for the decoder. The header also include the program, such as frame and audio sampling rate.

figure-7. The MPEG encoder.


The transport stream(major extension of MPEG-1) defines a packetized protocol for multiplexing multiple MPEG compressed programs into a packetized fixed length(188 bytes) format for transmission on digital networks. It also include some sophisticated timing information, jitter correction etc.. The additional timing information and small fixed sized packets allows a whole range of new application for MPEG-2. The 188 byte TS packets can map very well into 48 byte ATM cell payloads, allowing MPEG-2 to be used in switched video architecture.

The packetization and mutiplexing is done in such a manner as to maintain synchronization among all of the elementary streams associated with the same program input. A common system time clock(STC) time base is used throughout the encoding process and the system layer function periodically samples the STC itself, encodes the samples as time-stamps, and embeds them into the bit stream as part of the system layer syntax. Presentation time, in relation to the STC for coded video and audio frames are also encoded as time-stamps and embeded as part of the system layer.

Decoder.

figure-9 MPEG-2 decoder.

In the decoding process,the system layer function recovers the embedded STC-stamps, called program clock reference in the case of transport stream systems layer format, and demultiplexes and reassembles the elementary streams. The PES are passed to the appropriate compression layer decoding function and decoded. The recovered time-stamps are used to adjust the local time-base to match the time-base used to encode the program. The uncompressed digital streams are finally converted to the appropriate display. The figure-9 shows a typical MPEG-2 decoder.

The time-stamps can be used for effect such as slow motion and pausing the video to achieve VCR-like control. The TS specification also make it feasible to create a gradual transition from an analog TV broadcast architecture to a digital TV architecture. The digital channels can co-exist with traditional analog basic services, and a user can still access basic services without a TV set(by adding a set-top-box as shown in the diagram).

MPEG over ATM.

The new bandwidth availability together with the new video compression techniques for digital television will allow to provide a new kind of TV environment,that will be characterized by the wide set of available video services. These services will include: Above all the most attracting video service is the Video-On-Demand(VOD). Before we describe the VOD services in detail, we will look at the problem of mapping MPEG-2 PES into ATM cells.

So how do we map MPEG packets into ATM cells, "dammit"?

figure-10 MPEG -> ATM cell transform.
An area of potential incompatibility is the mapping of the MPEG(MPEG-2) packets into ATM cells. There are several categories of the systems used in existing systems today. Some use the AAL-1 constant bit rate
AAL(ATM adaptation layer) to map one 188 byte transport stream(TS) packet into four ATM cell payloads with one byte of overhead per cell. Other schemes use the ATM AAL-5 data adaptation layer to concatenate and map one, two or more MPEG TS packets into five, eight or more ATM cells. The figure-10 illustrates this point. The ATM forum has chosen AAL-5.

Video On Demand (VOD).

The VOD service allow the user to select the video information. Two levels of VOD services can be identified; pure video on demand(PVOD) and near video on demand(NVOD). PVOD is closer to the ideal case. we can also define further hierarchies, distinguished by the degree of control and interactivity of the end user. NVOD(user has no controll during the delivery of video) involves an unavoidable delay and predictable delay, between the choice and the program delivery(eg. delayable video). There are different ways to provide selection capabilities, thereof identifying different VOD services. For the purpose of this paper we will discuss the issues concern a specific PVOD service, the Video Rental. We will pay particular attention to the Digital Storage Media Control Commands(DSM CC) impact on the network. The video rental service allows the user to control the chosen movie by typical VCR commands. The most important issue of the radio rental service is the interaction between the user and the agent delivering video information. This user capability has to be taken into account in the user network interface, in the service architecture and in the transmission resources design. The NVOD and the PVOD services require the upstream customer control channel, so that the user is able to control the presentation of video information with functions similar to VCR commands. Another important issue concerns the transmission resources. In the long term the fibre will reach each user allowing to support services by means of ATM. It must also be carefully considered the network architecture for the delivery of VOD.

So What are the service architectures needed for VOD service ?............

Three possible service architectures are proposed(depending on the video database location); fully centralized, fully distributed and quasi-centralized.

Fully centralized service architecture.
In this architecture the video information databases are placed in the network core. Because these databases provide the video service to many users they optimize memory resources to store the movies. The main problems with this architecture are:

figure-11 Centralized service architecture.

The figure-11 is an example of the centralized architecture, where it is proposed to connect video databases to a central switch. This is an ATM based switch that is connected to the local offices by large number of loops. The users are connected to local offices. There are three level of memory hierarchies are used. The library(read only) is a very large and long acces memory. The copier memory is a medium size and fast access memory. The stop-start buffer is small per-user memory. After a request the appropriate movie is loaded in a player. Then the switch setup a connection from the player directly to the user loop. This method is only used for less popular movies. For other movies, the system may cache the movie in the copier memory and the switch sets up connection from the copier memory to the user(same movie is transmitted in more copies). To allow the user to use VCR like commands each user is given a buffer(stop-start buffer). The major disadvantage of this architecture is the high bandwidth resource required to transmit the movies to users. The advantage is that every disc is shared by number of users.

Fully distributed service architecture.
In this architecture the video information database located in the network access zone proving the video service to few users. The advantage of this architecture is that it reduces the transmission cost. The main problem with this is that the same movie must be buffered in different service centers.

Quasi-centralized service architecture.
In this architecture the video database are both in network core(CSC:Core Service Center.) and in the access zone(USC:User Service Center.). The figure-12 shows the quasi-centralized architecture. The less frequently requested movies are stored in the CSCs. When a user requests to access to one of these movie, the movie will be tranported, first to USC and then to the user. So the acces dealy is high. Because of this, the popular movies are buffered in USC. In this case the number of users connect to the service center is not very large, some copies of the same movie are buffered, and then acces delay is not high. If a user requests a movie that is not in his USC then the USC make a request to the CSC connected with it. At this request the CSC starts to transmit the requested movie to the USC. Therefore with this architechture only the less popular or less requested movies suffer a high access delay and take a very large bandwidth. In the diagram below the USC is called Central Office(CO) and the CSC is called Information Warehouse(IWH).


figure-12


After carefull consideration, the Quasi-centralized architechture seems to be the better one interms of bandwidth/storage costs. This architecture can also include the three level memory hierarchies. This is shown in figure-13.


figure-13

The video entities are distributed on three different levels each corresponding to a different type of node. There are three different types of nodes; zonal nodes, area nodes, regional nodes. When the user asks to view a movie, the zonal node searches it, and if the requested movie is not available, it is searched first in some other zonal nodes,and then in an area node or in the regional node. The movies must be reallocated as their pupularity changes. Consequently, a definition of a rule to update the database is necessary.

Digital Media Controll Command(DSM CC).

The Digital Storage Media Command Controll (DSM CC) is essential in interactive video services. Works are in progress to make the DSM CC as an integral part of MPEG-2. DSM CC provides to general application a set of commands for eslablishing or deleting a network connection and for performing a communication between a client and a sever across a network. The MPEG data flow must be independent of the particular kind of DSM, of the local or remote DSM location and of the network protocol. Since DSM CC only defines command syantax and semantic, different servers may generate different MPEG bitstrsms in response to a same DSM CC sequence.

Currently there are two classes of DSM CC operations are defined. They are;

In this paper will consider a sub-set of the DSM CC Client-Server primitives: So called Stream playback primitives which provides the user with the typical VCR commands. The stream playback primitives imply unexpected peaks or silence periods in the MPEG-2 transferred bitstreams that reflect upon the bit rate carried by the tranport stream in the network. The table below lists the stream playback primitives with a short description of their functions.

Primitives. Functions.
dsm stream open Active a stream by its name.
dsm stream play Send normal play stream.
dsm stream puase Stop sending stream.
dsm stream scanforward Send the stream at a forward speed other than normal play.
dsm stream scanreverse Send a stream at a reverse speed.
dsm stream jump Jump to a time or stream position, relative to current position or absoloue from the begining.
dsm stream status Obtain the status of a stream.
dsm stream close de-activate a video.


The DSM stream open, status, close primitives are realted only to siganlling traffic. The DSM play primitive invocation causes the transfer of an MPEG-2 transport stream conforming to a paricular Profile and Level. Hence the characteristics of a data traffic due to this command are:

The DSM stream pause must produce a "freeze" frame in the user terminal. From the MPEG-2 decoder specification, we note that frame memories exist in the decoder. This structure could allow local freeze frame simply stopping the transmission of the MPEG-2 data flow by the server. Consequently, synchronization techniques have to be defined in order to assure the clock(time stamps) alignment between encoding and decoding. The MPEG-2 standard does not provide this facility explicitly( in addition a generic video server usually does not include a real time encoder). Therefore the alignment techniques must be done through a suitable post processing method. Alternatively, the server could generate a data stream(conforming to the MPEG-2 syntax and to profile and level) that represent a still frame. This could be realized transmitting continuously an ad hoc generated GOP. The basic strategy may be to intra code the selected frame so that the transmitted GOP contains empty P and B-frames. If the freeze involves a P or B-frame, the solution depends on the server post-processing capability. In fact, as the still frame has to be coded in intra mode, the server could decode the current GOP and recode the B or P still frame in intra mode.

The effects produced by a DSM stream scanforward/reverse primitive invocation are the change of the sequence prsentation speed and direction. In this case, an ad hoc sequence(GOPs) has to be generated to realize the command. In the case of Reverse versus the server has to decode the sequence and generate a new one for a scan presentation. Otherwise the server can generate a reversed sequence of intra frames, in coded form, from the original one.

In the case of fast forward the server can generate a new sequence recoding the original one after discarding a numberof P and B-frames. This can lead to a new coded sequence having GOPs structure similar to the "original" one. In this case the frames of the scan sequence are more spaced out temporally, however it does not significantly change the traffic static but require a real time encoder and decoder as post processing facility.

The DSM stream jump primitive invocation causes the jump of the MPEG-2 transport stream to a particular time stamp or position. If the jump position correspods to a P or B frame, the solution depends on the server coding capability. In fact the server could decode the relative GOP and restart the transmisssion generating a new GOP begining from the frame in jumping position. If the server is not able to perform this operation it can simply transmit the GOP containing the jumping position. The influence on the traffic of the DSM jump primitive should be negligible.


Concluding remarks...

Communications networks have gradually evolved towards the concept of the integration of services. This coupled with advances in networking technologies has created the possibility for the integrated networking of a wide range of services. The global networking standards now point to B-ISDN, the universal network solution concept. This has lead to the development of the fast packet switching transfer mode ATM, which is now accepted to be the optimum solution to B-ISDN. These technologies have provided the chance for the realisation of real-time broadband services, most notibly video networking. Coupled with this, advanced compression schemes to reduce the volume of video data have been developed, and the MPEG systems have been adopted as the standard.

The ATM concept is flexible enough to accomadate MPEG compressed bit streams efficiently, so making video transfer over networks feasible. Many video networking services, including interactive services are planned. VOD is a significant new service in which a lot of research has been placed, and which is considered to be a realistic prospect of the not so distant future. All these developments are set to revolutionise the communications arena, the technology is virtually established. All that remains is for the social, economical and political forces to will this into being.


Bibliography

Usefulness rating out of 10.
JOURNALS
  1. ISO IS 13818-1 -ITU-T Recomandation H.222.0, Information technology - Generic coding of moving picture and associated audio information, part 1: systems,November,1994.
    Usefulness: 7
  2. ISO IS 13818-2 -ITU-T Recomandation H.222.0, Information technology - Generic coding of moving picture and associated audio information, part 2: systems,November,1994.
    Usefulness: 7
  3. D. Le Gall. MPEG: A video compression standard for multimedia applications. Communications of the ACM, April 1991.
    Usefulness: 9
  4. ISO WD 13818-9 -Draft ITU-T Recomandation H.222.x, Information technology- Generic Coding of Moving pictures and associated audio information, part9: Real time Interface specifications, November 1994.
    Usefulness: 5
  5. Advance Television Research Consotium, "Advanced Digital Television: Prototype Description", FCC WPI certification Document,feb 1992.
    Usefulness: 3
  6. H. Sun and W. Kwok, "Adaptive Concealment for Block-based compression video".
    Usefulness: 8
  7. W. Kwok and H .Sun, "Multi-directional Interpolation for Spatial Error Concealment," to be submitted.
    Usefulness: 7
  8. H .Sun and j.Zdepski ,"Adaptive Error Concealment Algorithm or MPEG compressed video", SPIE Proc. Visual Comm. and Image Processing 92, Vol. 1818, Nov. 18-20, 1992, pp. 814-824.
    Usefulness: 8
  9. F. Kishino, K. Manabe, Y. Hayashi, and H. Yasuda, "Variable Bit-rate Coding of Video Signals for ATM Networks," IEEE J.Selected Areas in Comm., Vol 7., No. 5 ,june 1989.
    Usefulness: 3
  10. "The Lines Unleashed" in Byte - May 1996, plus articles: "The New WAN" by Salvatore Salamone, "The Price of WAN Connectivity" by Liza Henderson and "Playing the ATM Card" by Lane F. Cooper.
    Usefulness: 9
BOOKS
  1. ASYNCHRONOUS TRANSFER MODE Solution for Broadband ISDN 2nd Edition, Martin de Prycker, Ellis Horwood Publishing.
    Usefulness: 9
  2. "Asynchronous transfer mode: the ultimate broadband solution?" by M. Jeffrey, Electronics & Communications Engineering Journal June 1994.
    Usefulness: 5
  3. "Management of patient records with NHS local area supported by ATM" by Constantinos TSIBANIS, Department of Computer Science at Bristol.
    Usefulness: 3
  4. ATM Switching Systems, Thomas M. Chen and Stephen S. Liu, Artech House Publishing.
    Usefulness: 7
  5. Integrated Broadband Networks - An Introduction to ATM-Based Networks, Rainer Handel and Manfred N.Huber, Addison-Wesley Publishing.
    Usefulness: 8
  6. "ISDN" by Thomas A. Fine from "Recent advances in networking"
    Usefulness: 4
  7. "ATM Asynchronous Transfer Mode" by Brendan McKeon from "The OSI Reference Model"
    Usefulness: 5
INTERNET
  1. Is ATM the future of all global communications?
  2. Why has communications evolved towards the ATM concept?
  3. ATM transport and cell-loss concealment techniques for MPEG video.
  4. What is MPEG video compression standard?


Acronyms

B-ISDN Broadband Integrated Services Digital Network
MPEG Moving Picture Experts Group
ATM Asynchronous Transfer Mode
LAN Local Area Network
WAN Wide Area Network
HDTV High Definition Television
AAL ATM Adaptation Layer
ISO International Standadisation Organisation
IEC International Electrotechnical Committee
GOP Group Of Pictures
MC Motion Compensation
DCT Direct Cosine Transform
VOD Video On Demand
PS Program Stream
TS Transport Stream
STC System Time Clock
PTS Presentation Time Stamps
PES Program Elementary Streams
DSM CC Digital Storage Media Control Command
PVOD Pure Video On Demand
NVOD Near Video On Demand
CSC Core Servicxe centre
USC User Service Centre
CCITT Consultative Committee for International Telecommunications and Telegraphy
ARPA Advanced Research Projects Agency
SMDS Switched Multimegabit Data Services
NTSC National Television Standards Commoittee
SDM Synchronous Digital Hierarchy
AT &T American Telephone and Telegraph
SONET Synchronous Optical Network.