Video Networking
Video Transmission Over Broadband Networks
New networking technologies are now paving the way for broadband integrated
services. This is providing an environment for the widespread availability
of a wide range of true video networking services. The ATM networking
standard and the MPEG video compression standards are becoming increasingly
important in this area. What are these technologies and how do they relate
to the new interactive video service, 'video on demand'?
By A.Derbyshire and K.C.Rajh, 17th June 1996
Contents
The integration of network services
The telephone has developed into an efficient long distance
communication tool. It provides the capability to communicate with
other people all over the world by voice. The main assets of telephone
networks are availability, real time response and user-friendliness.
These networks have experienced growth in terms of size and use
and the significant development of digital signalling and computer
data communications.
The availability asset of the public telephone networks made them an
attractive option for basic data communications. Systems to transmit
computer data over these analogue networks were developed, but
this was soon recognised as being a non-optimal solution to data
communications. This is because the limited bandwidth and high
noise attributes of telephone networks, although adequate for voice
transmissions, can not provide reliable high rate data transfer.
As computer systems advanced, the interconnection demands
increased in terms of transfer rate, and soon the telephone network
dimensions were exceeded. For high speed data communications,
specialised data networks were established with networking technology
specifically designed to transport data. This service specific
dimensioning of networks formed a general trend throughout
communications. All services, from voice, through data and to television
distribution, had their own networks.
This disadvantages of this situation are that:
- Each network requires it's own implementation and maintenance
phases. Therefore, any associated costs are duplicated, leading to
dis-economies.
- Specific service dimensioning reduces a networks ability to
adapt to changes in it's designated service. This reduces it's ability to
embrace future developments and advances in the service.
- Service specialisation prevents the networks from providing
other services efficiently (e.g. computer data over the telephone networks
is inefficient). This means that any free resources on a network will be
wasted because they cannot be assigned efficiently to other services.
In 1984, a solution was suggested to these problems by the
international telecommunications standards body of the time - the
Consultative Committee for International Telecommunications and
Telegraphy (CCITT). A network concept was proposed which would
integrate different service types onto a single network. This was
known as the Integrated Systems Digital Network (ISDN).
The CCITT stated that:
`an ISDN is a network ... that provides end-to-end digital connectivity
to support a wide range of services, including voice and non-voice
services, to which users have access by a limited set of standard
multi-purpose user-network interfaces'
The Integrated Services Digital Network
As stated, ISDN is a digital system, designed to transport voice and
other services by digital signalling. The ISDN standard interfaces
provide a series of channels of different characteristic rates.
| Channel |
Bit rate (kbit/s) |
Interface |
| B |
64 |
Basic access |
| H0 |
384 |
Primary rate access |
| H11 |
1536 |
Primary rate access |
| H12 |
1920 |
Primary rate access |
| D16 |
16 |
Basic access |
| D64 |
64 |
Primary rate access |
The standard interface, called basic access, comprises of two 64kbit/s
channels and a 16kbit/s signalling D channel. Another interface is
offered also, called primary rate access, which provides bit rates up to
2Mbit/s. The channels can be mixed and duplicated to give a range of
gross bit rates. The basic rate was specifically dimensioned to handle
voice transmissions as digital data, and it was derived as:
Bandwidth of voice signal = 3.4 kHz
Nyquist sampling frequency = 8 kHz
Sampling resolution = 8 bit
Bit rate = 64 kbit/s
The ISDN concept was first recommended by the CCITT in 1984, and
then enhanced in 1988. The aim of the ISDN concept was to provide:
- A common user-network access interface, capable of providing
access to a variety of services
- Service integration over a single network
- Enhanced channel bandwidth capability to support higher
signalling rates
- The capability to handle new services
The ISDN transfer mode
The term transfer mode refers to the way in which information
is transmitted and switched through a network. An important
enhancement of ISDN over standard telephone networks is that it has
an extra transfer mode which has been tailored for handling data
communications as well as a circuit switching transfer mode.
The original telephone networks used circuit switching to establish
the connections in the network. This system connects the users
together via switches across the network to form a circuit. Once a
connection is established, it remains throughout the duration of the
communications session, therefore a constant amount of network
resources are allocated to the users.
This has generally remained as the switching mode used for speech
despite the fact that conversations are up to 50% silent each way. A
technique to reassign this silence is used for expensive links
(e.g. satellite links) and is known as Time Assignment by Speech
Interpellation. However, this is not cost effective for most links and
so circuit switching has remained in telephone networks.
Typically, data signals are generated in sparsely separated bursts,
with silent periods exceeding more than 50% of the total connection
time. A transfer mode to optimise the use of network resources
under these circumstances was devised known as packet
switching. With this system, a digital data stream is broken up
into pieces known as packets, which are then launched
individually onto the network. Packets will not be sent during periods
of silence, so the important consequence is that network resources
are only used when data is actually being transferred.
The 64 kbit/s ISDN was designed to offer both circuit switching
and packet switching services to cater for voice and other types of
services. Speech packetisation was experimented with in the 1970's
(by the Advanced Research Projects Agency (ARPA) ) but overall
it was learned that although it is feasible in principle, significant
modifications to existing protocols were necessary in order to
achieve an acceptable quality of voice service. This is because
varying delays are introduced as packets travel through the network
paths and are processed at the switches. These delays cause
problems concerning the real-time nature and correct reconstruction
of the speech streams.
Supporting network services efficiently
ISDN was designed to provide packet switching to give it the ability
to handle bursty data applications efficiently. This idea can be extended
more generally by considering the natural information rate of network
applications.
For a digital network, the application source will typically generate a
bit stream of a given rate. For instance, a source could be a person
speaking in a conversation, where the appropriate hardware will
generate a bit stream with a bit rate in the order of 64 kbit/s. However,
a bit stream may not necessarily convey a constant amount of
information. In terms of speech, no information will be conveyed at
all during silence, so the natural information rate is not constant.
The important extension to this idea is that the natural information
rate of a source is a stochastic process with respect to time. The
figure below shows such a stochastic process s(t), which represents
the fluctuations of the natural bit rate of a source.
Fig. 1.0. Natural bit rate fluctuation in time
Acknowledgement to the source : 'Asynchronous Transfer Mode,
Solution for Broadband ISDN' - second edition by Martin De Prycker.
De Prycker states that three important parameters of this process are:
- S = peak value of the natural information rate
- E[ s(t) ] = mean value of the natural information rate
- B = burstiness of the source = S / E[ s(t) ]
The information from a source can be obtained from a generated bit
stream by encoding only non-redundant data. Systems to do this
are known as compression systems, because they compress
the size of the generated data down towards the minimal natural
information size. The result is that a source has variable bit rate of
mean value E[ s(t) ] and burstiness B, which are characteristic to it.
Packet switching architectures are well suited to this because
packets need only be sent when non-redundant data is available,
i.e. packets can be sent with the fluctuating information bit rate of a
source. A constant bit rate channel with a fixed bandwidth usage, will
either can transfer the information inefficiently or with a quality
reduction. This is due to a trade-off between two situations:
- If the transfer rate is equal to (or greater than) the peak
(or generated) bit rate, then channel bandwidth will be wasted
when the natural information bit rate is lower than the transfer
rate since some redundant bits will be sent.
- If the transfer rate is less than the peak rate, then a
quality reduction will occur when the natural information rate
exceeds the transfer rate since some information bits will not
be sent.
These principles are important to the concept of the efficient use
of network channels, especially for services that are braodband,
in which case a network bandwidth cannot be affordably wasted.
Broadband integrated services
ISDN is an important concept because it is a step towards the
integration of network services, but the system has a relatively
limited bandwidth which makes it only capable providing
transmission rates of up to 2 Mbit/s. By taking a step back from
ISDN and making a general observation of network services it
is possible to appreciate the scope of ISDN. The following
table gives some examples of services, their typical transmission
networks and their quantative parameters.
| Service |
Network |
E[ s(t) ] |
Burstiness B |
| Voice |
PSTN |
32 kbit/s |
2 |
| Interactive data |
PSTN |
1 - 100 kbit/s |
10 |
| Bulk data |
ISDN / SMDS |
1 - 10 Mbit/s |
1 - 10 |
| Video |
Terestrial / Cable TV |
1.5-15 Mbit/s |
2 - 3 |
It is clear that ISDN would only support the services in the range of
voice to medium rate data transmissions. ISDN was not originally
designed to transport high speed data and moving images (video),
but in the 1980's the demand for these services started to be
recognised. Such high speed services are known as broadband
services because of the large amount of network bandwidth which
is required for transmission. The CCITT put forward the recommendation
of 'Broadband - ISDN' (B-ISDN) in an attempt to develop an integrated
network capable of supporting a larger range of services.
B-ISDN
CCITT defined broadband with reference to ISDN as:
... a service or system requiring transmission channels capable of
supporting rates greater than the primary rate.
It was first proposed that B-ISDN was to be an enhanced version of
ISDN achieved by just adding broadband channels and broadband user-network
interfaces to the existing ones. However, concerns arose as to the
suitability of this concept. This is because the channels would have
had to of been dimensioned fairly rigidly to the then contemporary
broadband services, which would have made the channels potentially
unsuitable for any unforseen future services. Also, the CCITT could
not come to a decision as to the traffic type orientation of the
channels, i.e. whether they should have a circuit-type or burst-type
traffic orientation.
The following factors influenced the overall design of B-ISDN:
- An emerging demand for broadband networking.
- An emerging availability of high speed transmission
switching technologies
- Advances in software applications
- The advantages of integrating services into one universal
network
- The need for flexibility with respect to handling different
service types and qualities
It was identified that B-ISDN should be an extremely flexible
network capable of catering for the entire range of contemporary
and potential future services. CCITT gave the following
recommendation, which is a natural extension from ISDN:
'A key element of service integration is the provision of a wide range
of services to a broad variety of users utilising a limited set of
connection types and multipurpose user-network interfaces.'
The problems of the first technical concepts of B-ISDN were never
resolved, since a different principle was put forward to be the solution
for B-ISDN. CCITT abandoned the original concepts and eventually stated:
Asynchronous transfer mode (ATM) is the transfer mode for
implementing B-ISDN...
ATM was this different principle, different because it was effectively
wiping the ISDN slate clean and starting afresh with a more flexible
solution. It was designed as a universal transfer mode with the
capability to provide solutions to the following B-ISDN requirements:
- The ability to handle services of significantly different
bit rates, and therefore different bandwidth
requirements.
- The ability to support variable bit rate traffic
efficiently
- The ability to cope with both delay and loss sensitive
applications
Broadband networking and video
All these developments are significant to the move towards the
transmission of moving pictures as video signals through networks. The
B-ISDN using ATM, is designed to be able to provide widespread
interactive video services as well as the distribution
services common to the existing television distribution networks. This
means that B-ISDNs could provide a full range of video services to homes
and businesses, with the potential accessibility of the current telephone
system.
The CCITT classified interactive and distribution services generally as
follows:
Interactive Services
- Conversational
- Speech
- Mutual exchange of data
- Mailing of documents/images/sound
- Video-conferencing
- Hi-speed LAN/MAN
- Retrieval
- Video on demand
- Remote software/data library
- Tele-shopping
Distribution Services
- Electronic publishing
- TV programme distribution
- HDTV programme distribution
- Information services
Out of these services, video type services pose the greatest challenges
to networking because of their sensitivity to delays and high bit rates
(and therefore broadband nature). Video services are characterised as
follows:
| Video service |
Generated bit rate (Mbit/s) |
Average natural bit rate E[s(t)] (Mbit/s) |
Burstiness B |
| Video-telephony |
1 - 10 |
0.2 - 2 |
5 |
| Video-conferencing |
10 - 100 |
1 - 10 |
1 - 5 |
| Videotex/video retrieval |
10 - 100 |
1 - 10 |
1 - 20 |
| TV services |
50 - 100 |
1.5 - 15 |
2 - 3 |
| HDTV services |
100+ |
15 - 150 |
1 - 2 |
With video networking, speed is of the essence. In order to transport
these services effectively and use network bandwidth efficiently, a
fast variable bit rate network transfer mode and a powerful video
compression system are required. The generally accepted solutions are
the ATM network concept, and MPEG compression. Significantly,
these two solutions combine in essence to good effect, giving a feasible
form for flexible video networking.
The Asynchronous Transfer Mode
|
ATM was developed in the early 1980's with the objective of providing
a transfer mode with very high throughput. It is based on the principle
of fast packet switching, which is simply an evolved form of
conventional packet switching - essentially ATM is not a totally new scheme.
The ATM system concept has been developed in line with advances in modern
digital technologies, namely optical fibre links and high speed electronics,
which are the foundations of high speed communications.
|
Technological influences on the development of the ATM system concept
The switching and link technology of a network ultimately determines the
speed at which data can be moved through the network. For a packet
switching architecture, packets are constructed from the source data at the
sender node and then are lauched onto the network. Each packet will move
through the network and is routed correctly by the intermediate network nodes
(or switches). On receipt of a packet, a node will perform some processing on
the packet header, which will contain routing information and error
checking data. The packet will then be sent to the next appropriate node on
it's route and eventually, it will reach the destination node where it will
be reconstructed with the other packets into the information stream.
It is the routing and error checking concepts performed by the nodes and the
associated technologies which determine the overall transmission delays in
the network. The noise suseptability (an therefore the error suseptability)
and bandwidth of link technologies have a big influence on the way in which
such concepts are designed. All this determines a networks ability to handle
delay sensitive real-time applications (e.g. video) and the overall transfer
rate. Such ATM system concepts have developed to be used with advanced
networking technologies.
High speed electronics
At present, semi-conductor based electronics must be used to construct the
node processing elements. Current systems will be able to process packets
to give a transfer rate in the order of 108 bits per second
(100+ Mbit/s). Electronic systems are currently considered to be a bottleneck
in networks consisting of electronic and optical technologies.
Optical technologies
Fibre optic cables provide a very wide transmission bandwidth, greater than
that of electrical cables. They are also less suseptable to channel noise than
electrical cable, and so offer a lower probability of bit errors occuring.
Current systems are able to provide enough bandwidth for transmission also in
the order of 108 bits per second, although the theorectical limits
of optical systems extend upto 1012 bits per second. All optical
networks systems where even the switching is performed by optical elements
(e.g. optical logic/computers) are currently being experimented with by AT&T.
This would remove the electronic bottlenecks of networking and provide very
high transfer rates.
The ATM system concept - fast packet switching
ATM packets are small and are referred to as cells. They have been
dimensioned to a size suitable for delay sensistive real-time applications which
was finalised to be a total of 53 bytes, comprising of a 5 byte header and a 48
byte information payload. Small packets keep processing delays (queuing
delays) in network nodes to a minimum, which guarantees a low overall delay.
The packet length is fixed in order to keep system complexity minimal, where the
motivation is also minimising processing delays.
The error control concept of ATM was modified from previous packet
switching architectures. Such architectures would perform some degree of error
control at every intermediate network node between the edges of the network
(i.e. the communicating nodes). The suseptability of the older links to noise
meant that such substantial error control systems were necessary. However,
modern high-quality links (e.g optical) are less error prone, so ATM was designed
with error control functions only at the edges of the network in order to reduce
delays due to error control in the intermediate network nodes.
ATM is connection-orientated which means that before information can be
transferred across the network, a logical/virtual connection must be set up.
During setup, the necessary resources for the session will be allocated, but
only if they are available. If not, then the connection is refused, which is a
similar connection mode to circuit switching. With this scheme, packet flow
control need not be performed on the packet queues at each network node to check
for overloading. This is because the resource checks will have been performed
in advance for all the connections so overloading is unlikely. Overall, the
absence of flow control reduces the complexity of the nodes and therefore
delays.
It is important to note that the concepts of ATM refer only to switching
and multiplexing techniques. B-ISDN is intented to actually be transmitted
synchronously at the physical level (e.g. by SONET/SDH). It is the cell streams
which are mixed asynchronously by time-division multiplexing as
described by the ATM concept. As well as being an important high speed transfer
mode, it should be noted that ATM is extremely flexible and efficient, as
demanded by the B-ISDN concept. This makes it a desirable networking standard
for all services from voice to video.
The B-ISDN protocol reference model
This model is a representation of the information flows in ATM networks. The
model is a series of planes, each of which contain a layered architecture. The
model is shown schematically below:
Fig. 2.0. The B-ISDN protocol reference model
The function and associated information of the planes is as follows:
| Plane |
Function |
Information flow |
| User |
Transfer of user application data |
User application |
| Control |
Call and connection control |
Signalling related to calls and connections |
| Management ( layer / plane management) |
Network supervision |
Network status and performance |
A full description of the planes is out of scope here, but for more details
see
"Standardisation for ATM and Related B-ISDN Technologies or the references
given in the bibliography section below.
Outline of the user plane
The user plane is divided into 4 parts, the physical layer, the ATM layer,
the ATM adaption layer (AAL) and the higher layers. Each layer uses the
services of the lower layers and in turn provides services to the layer
above it. This layered structure was adopted to allow for indepence in the
design and inplementation of each layer (e.g. for the physical layer, ATM
could be implemented on copper cable or optical independently).
The higher layers
The user services are supported in these layers. Four classes of service
have been identified:
- Class A - connection oriented constant bit rate, e.g. speech
- Class B - connection oriented with timing between the source and
destination, e.g. video
- Class C - connection oriented without timing, e.g. data
- Class D - connectionless, e.g. data from LANs and MANs
The ATM adaption layer
The AAL converts application specific data into ATM data units in order to
provide support for user applications. All AAL functions are performed at the
edges of the network, and all AAL information is carried within the ATM cell
information fields transparently by the ATM layer.
The ATM layer
This layer takes and returns data from the AAL and generates and interprets
the cells respectively. The layer is service independent, it transports
the data according to the protocol information in the cell headers
transparently.
The physical layer
This is the lowest level layer, and it is responcible for the transmission
of ATM celss as bitstreams across a physical medium.
MPEG Compression For Digital Video.
What is MPEG?
The need for digital video compression resulted in several compression standards in the past.
The MPEG (Moving Picture Experts Group) committee began its life in late 1988 with the immediate
goal of standadizing video and audio for compact discs. The MPEG is joint committee of the ISO and
IEC. It has been responsible for MPEG-1 and MPEG-2 standards in the past and is currently developing
the MPEG-4 standard. The MPEG standards are generic and universal in the sense that they merely
specify a compressed bitstream syntax. There are three main part of the MPEG-1 and MPEG-2 specifications,
namely, systems, video and audio. The video part defines the syntax and semantics of the compressed
video bitstream. The audio part defines the same thing for the audio bitstream, while the system part
addresses the problem of mutiplexing the audio and video streams into a single system stream with all
necessary timing information.
Table -1 shows the typical bit rates, The purpose of the standard, and screen size for different standards specified
by the MPEG.
| Standard |
Bit rate |
Purpose |
Screen size |
| MPEG-1 |
1.14 to 3.0 mbps |
Delivery of video for a CD-ROM |
352x240 pixels at 30 for NTSC |
| MPEG-2 |
6 to 8 mbps |
Broadcast quality compressed video |
720x480 pixels at 60 field per second for NTSC and
720x576 pixels at 50 field per second for HDTV |
| MPEG-4 |
64 kbits/s |
Low bit rate video phones, interactive databases, interactive newspapers etc |
Under development. |
Table -1
The MPEG-3 was targeted for HDTV. However it was discovered that with some tweaking MPEG-2 would
work for HDTV.
Formal description of the video signal.
Video signals are spatio-temporal signals or simply stated, a sequence of time varying images.
A monochromatic video signal can be mathematically represented by x(h,v,t), where x is the
intensity value at the h horizontal, v vertical and t temporal locations respectively. The color
video signal is a superposition of the three main color primitives (R,G,B) or equivalently of
one luminance(Y) and two chominance components(U,V). To digitize the spatio-temporal signal
x(h,v,t), usually, the component form of the analog signal is sampled in all three directions.
Each sample point in a frame is called a pixel. The sampling process yields the complete set of
parameters necessary to represent a digital video signal. For example sampling in the hrizontal
direction yields the pixels per line parameter, which defines the horizontal resolution.
The MPEG video Compression.
There are many imporatnt ramifications of the technologies
incoporated into the MPEG specifications, but what seems to get the most press is the video compression
system.
The basic job of MPEG is to take analog or digital video signals and convert them to packets of
digital information that are more efficient to transport on modern networks. MPEG compress the
video into much less information, consuming less transmission bandwidth.
Video Signal Hierarchy.
figure-3 The hierarchy of video signal.
At the top level of the hierarchy, the video bitstream consists of video sequences. MPEG-1 allows only
progressive sequences, while MPEG-2 allows both progresssive and interlaced sequences. Each video sequences
contains a variable number of GOP. A GOP contains a variable number of pictures(P). A picture can either be
a frame picture or a field picture. In a frame picture, the two fields (Y, (U,V) ) are coded together to form
a frame, while field picture is a coded version of the individual field. In MPEG the video frames/pictures
are broken down into 8x8 pixels regions called blocks. Four of these blocks can be put together to create
a 16x16 macroblock. The macroblocks are then grouped together into runs of macroblocks called slices. The slice structure allows
the receiver to resynchronize at the begining of a slice in case of data corruption because each slice begins with a unique header, figure-3.
Inside each GOP, two kinds of coding is permitted; Intra frame coding and inter frame coding. The intra coding
of frame proceeds without any references to other frames exploiting only its spatial redundancy. The intra coded
frames(I-frames) privides the access point to the coded sequence.
The inter coding of a frame uses motion compensated pridiction from previous or subsequent frame, in order to
exploit not only spatial but also temporal redundancy. In MPEG algorithm two kinds of inter coded frames are
distinguished. P-frames, that are motion compensated from a past I or P-frame, and B-frame that require both
past and future reference frames for motion compensation. Since B-frames uses both past and future frames for prediction,
the highest degree of compression are obtained for
B-frames but they cannot be used as a reference for prediction, figure-4 .
figure-4 The P and B frame prediction.
Inter Frame coding ( Motion Compensation(MC) )
The MPEG derives its maximum compression from P and B-frames. This is done by a technique called motion
compensation(MC) based prediction, which exploits temporal redundancy. Since frames are closely related, it is
assumed that a current picture can be modelled as a transilation of the picture at a previous time. It is possible
then to accurately predict the data of one frame based on the data of a previous frame. The encoder searches the previous
frame(for P-frames, or the frames before and after for B-frames) in half pixel increments for other macroblock
locations that are close match to the information that is contained in the current macroblock, figure-5. The displacement
in the horizontal and vertical directions of the match macroblock from the cosited macroblock are called motion vectors. If
a matching block is found in the search region the motion vectors are encoded. If no matching is found in the neighboring
region, the macroblock is intra-coded and the DCT coefficients are encoded. For B pictures, MC prediction and interpolation
is performed using reference frames present on either side of it. B pictures themselves never used for prediction and hence
do not propagate errors.
figure-5 The search window for motion compentation.
Intra (I-frame) coding
The I-frame coding involves the following steps.
- Direct Cosine Transform(DCT).
- Quantization.
- Coding.
Direct Cosine Transform.
The first step in the compression is to transilate the information in the picture into the frequency domain. The
R,G and B intensity information in each pixel is transilated into Y and (U,V). Each picture is divided into 8x8 pixel blocks.
Four of these blocks are additinally arranged into a bigger block of size 16x16, called macroblocks, figure-6.
figure-6 Intra coding..
The DCT is applied to each 8x8 block individually to transform the data into frequency information.
This transformation converts the data into a series of coefficients which represent
the magnitudes of cosine functions at increasing frequncies. The low frequncy coefficients contain more energy than the
high frequncy ones.
Quantization and coding.
The high frequency(low energy) coefficients can afford to be dropped because the eye lacks the ability to detect high frequency changes.
This means that the high energy(low frequency) coefficients can be coded with a greater number of bits, while using fewer or zero bits
for low enrgy coefficients.
The quantization step drops off some of the least significant bits of information, making some of the coefficients go to zero
(high frequency cofficients).
These coefficients are then entropy-coded. Entroy coding converts the coefficients into variable bit length codes, with
the most common coefficients being coded with the fewest number of bits (Huffman coding).
As we have seen that P-frame(and B-frame) coding predicts frames from the previous frames(past and future frames for B-frame).
This implies that any error in the P-frames(remember B-frames never used for prediction.) will be propagated through the transmission.
To avoid the propagation of errors and to allow periodic resynchronization, I-frames(stand-alone frames) are transmitted approximately
once every 12 frames.
The Movies below describes the effect of error forwarding.
Click here for the normal video.
Click here to see the error forwarding on the p-frames.
MPEG-2 the successer of MPEG-1
The MPEG-1 supports strictly the progressive video. Unfortunately, today's TV scanning pattern is interlaced. This introduce
a duality in block coding; do local redundancy areas(blocks) exists exclusively in a field or frame? This and other advanced requirements
(eg. Support for HDTV) lead to the need for a new standard.
The MPEG-2 video standard specifies the coded bit stream for high quality video. As a compatible extension, MPEG-2 video builds
on the completed MPEG-1 video standard, by supporting interlaced video formats and a number of other advanced features.
The following lists some of the MPEG-2 features. But This is not a complete list by any means.
- Support for different aspect ratios(eg. wide-screen, HDTV) and frame sizes up to 16 k x 16k.
- Macroblock formats for 4:2:2 and 4:4:4 chrominance formats.
- Scalable picture information in a variety of formats.
- Concealment motion vectors used by the decoder to conceal transmission errors.
- Transport and Program stream packet structures for vodeo or audio digital networks.
- Audio expanded to include surround channel and altern language channels.
A very important goal of MPEG-2 was to extend the video formats capable of being carried(scalability).
This is useful in application areas such as video communication, video on ATM, HDTV with embedded TV etc. Scalability
enables different decoders to construct different versions of the same video source by using sub-set of the total
encoded bitstream. This include spatial scalability, temporal scalability, SNR scalability, and datapartitioning.
Spatial scalability.
Specifies carrying a video signal in a two part format that lets inexpensive decoders extract a low-resolution signal, and with
additional processing in more capable decoders extract a higher resolutio picture using more data and bandwidth.
This provides a smooth transition to the HDTV system maintaining compatibility with existing standard TV systems.
Temporal scalability.
MPEG-2 allows one signal to be transmitted and displayed at different frame rate.
Signal-to-Noise-Ratio (SNR) scalability.
Allows one encoded signal to be compatible with diffrent levels of decoding quality.
1.4.4 Data partitioning scalability.
Allows the MPEG-signal to be transmitted over a two priority channel with one channel containing critical information
such as the DC values, motion vectors etc. The other channel carries less critical information such as higher order
coefficients etc. This two-part transmission can be well adapted to systems like ATM where cell loss priority bit in the cell
header can be used to mark cell discard priority.
Another additional feature of MPEG-2 format is the concept of Levels and Profiles. This serves the need of different kinds
applications. This is aided by defining several levels of decoders, and several profiles of video sources. The level
define limits on the algorithmic complexity that may be used in the video signal. The profile define the resolution and the
quantity of the video. Table-2 shows the levels and their applications.
| LEVEL |
Application |
| Simple |
Intended for software decoders(without B-frames) |
| main |
Cable TV and satellite uplink compression |
| spatial |
HDTV |
| SNR |
Saptial with SNR scalbility |
| High |
SNR with 4:4:4 chrominance in the macroblock |
Table-2
MPEG system part.
The system documents(ISO/IEC 13818-2 for MPEG-2) specifies two systems, one for multiplexing together the video, audio and data of single program to be
transmitted in a relatively error free envirement into the program stream, and another system, the transport stream,which can be used for
broadcast, VOD and cable TV. The figure-7 describes the system part.
Program stream specification descibe how to encode the data from the multiple set of video, audio, and data ( Packetized Elementary Streams:PES)
within a single set of
varialble-length packets. These packets have headers that specifies timing information and buffer information for the decoder. The header
also include the program, such as frame and audio sampling rate.
figure-7. The MPEG encoder.
The transport stream(major extension of MPEG-1) defines a packetized protocol for multiplexing multiple MPEG compressed programs into
a packetized fixed length(188 bytes) format for transmission on digital networks. It also include some sophisticated timing
information, jitter correction etc.. The additional timing information and small fixed sized packets allows a whole range of new
application for MPEG-2. The 188 byte TS packets can map very well into 48 byte ATM cell payloads, allowing MPEG-2 to be used in switched
video architecture.
The packetization and mutiplexing is done in such a manner as to maintain synchronization among all of the elementary streams associated
with the same program input. A common system time clock(STC) time base is used throughout the encoding process and the system
layer function periodically samples the STC itself, encodes the samples as time-stamps, and embeds them into the bit stream as part
of the system layer syntax. Presentation time, in relation to the STC for coded video and audio frames are also encoded as time-stamps
and embeded as part of the system layer.
Decoder.
figure-9 MPEG-2 decoder.
|
In the decoding process,the system layer function recovers the embedded STC-stamps, called program clock reference in the case of transport stream
systems layer format, and demultiplexes and reassembles the elementary streams. The PES are passed to the appropriate compression layer
decoding function and decoded. The recovered time-stamps are used to adjust the local time-base to match the time-base used to encode
the program. The uncompressed digital streams are finally converted to the appropriate display. The figure-9 shows a typical MPEG-2
decoder.
|
The time-stamps can be used for effect such as slow motion and pausing the video to achieve VCR-like control. The TS specification
also make it feasible to create a gradual transition from an analog TV broadcast architecture to a digital TV architecture. The digital
channels can co-exist with traditional analog basic services, and a user can still access basic services without a TV set(by adding a
set-top-box as shown in the diagram).
MPEG over ATM.
The new bandwidth availability together with the new video compression techniques for digital television will allow to
provide a new kind of TV environment,that will be characterized by the wide set of available video services.
These services will include:
- TV(basic, pay-per-view etc.)
- Service navigator.
- Interactive entertainment.
- Video-on-demand.
- Home shopping.
- Interactive single and multiuser games.
- Digital multimedia libraries and
- electronic versions of newspapers, magazines, TV program guidelines,and yellow pages.
Above all the most attracting video service is the Video-On-Demand(VOD).
Before we describe the VOD services in detail, we will look at the problem of mapping MPEG-2 PES into ATM cells.
So how do we map MPEG packets into ATM cells, "dammit"?
figure-10 MPEG -> ATM cell transform.
An area of potential incompatibility is the mapping of the MPEG(MPEG-2) packets into ATM cells. There are several categories of the systems
used in existing systems today. Some use the AAL-1 constant bit rate AAL(ATM adaptation layer)
to map one 188 byte transport stream(TS) packet into four ATM cell payloads
with one byte of overhead per cell. Other schemes use the ATM AAL-5 data adaptation layer to concatenate and map one, two or more
MPEG TS packets into five, eight or more ATM cells. The figure-10 illustrates this point. The ATM forum has chosen AAL-5.
Video On Demand (VOD).
The VOD service allow the user to select the video information. Two levels of VOD services can be identified; pure video on demand(PVOD)
and near video on demand(NVOD). PVOD is closer to the ideal case. we can also define further hierarchies, distinguished by the degree
of control and interactivity of the end user. NVOD(user has no controll during the delivery of video) involves an unavoidable delay and
predictable delay, between the choice and the program delivery(eg. delayable video).
There are different ways to provide selection capabilities, thereof identifying different VOD services.
- Teleshopping:> user can consult a video catelogue and select products.
- Video rental:>User can choose a movie from a library and interact by typical VCR commands.
For the purpose of this paper we will discuss the issues concern a specific PVOD service, the Video Rental. We will pay particular
attention to the Digital Storage Media Control Commands(DSM CC) impact on the network.
The video rental service allows the user to control the chosen movie by typical VCR commands.
The most important issue of the radio rental service is the interaction between
the user and the agent delivering video information. This user capability has to be taken into account in the user network interface,
in the service architecture and in the transmission resources design. The NVOD and the PVOD services require the upstream customer
control channel, so that the user is able to control the presentation of video information with functions similar to VCR commands.
Another important issue concerns the transmission resources. In the long term the fibre will reach each user allowing to support
services by means of ATM. It must also be carefully considered the network architecture for the delivery of VOD.
So What are the service architectures needed for VOD service ?............
Three possible service architectures are proposed(depending on the video database location); fully centralized, fully distributed and
quasi-centralized.
Fully centralized service architecture.
In this architecture the video information databases are placed in the network core. Because these databases provide the video service
to many users they optimize memory resources to store the movies. The main problems with this architecture are:
- Very high transmission cost to transport the video.
- Large signalling traffic in the network.
figure-11 Centralized service architecture.
The figure-11 is an example of the centralized architecture, where it is proposed to connect video databases to a central switch.
This is an ATM based switch that is connected to the local offices by large number of loops. The users are connected to local
offices. There are three level of memory hierarchies are used. The library(read only) is a very large and long acces memory.
The copier memory is a medium size and fast access memory. The stop-start buffer is small per-user memory. After a request the
appropriate movie is loaded in a player. Then the switch setup a connection from the player directly to the user loop. This method
is only used for less popular movies. For other movies, the system may cache the movie in the copier memory and the switch sets up
connection from the copier memory to the user(same movie is transmitted in more copies). To allow the user to use VCR like commands
each user is given a buffer(stop-start buffer). The major disadvantage of this architecture is the high bandwidth resource required
to transmit the movies to users. The advantage is that every disc is shared by number of users.
Fully distributed service architecture.
In this architecture the video information database located in the network access zone proving the video service to few users.
The advantage of this architecture is that it reduces the transmission cost. The main problem with this is that the same movie
must be buffered in different service centers.
Quasi-centralized service architecture.
In this architecture the video database are both in network core(CSC:Core Service Center.) and in the access zone(USC:User Service Center.).
The figure-12 shows the
quasi-centralized architecture. The less frequently requested movies are stored in the CSCs. When a user requests to access to one of
these movie, the movie will be tranported, first to USC and then to the user. So the acces dealy is high. Because of this, the popular
movies are buffered in USC. In this case the number of users connect to the service center is not very large, some copies of the
same movie are buffered, and then acces delay is not high. If a user requests a movie that is not in his USC then the USC make a request
to the CSC connected with it. At this request the CSC starts to transmit the requested movie to the USC. Therefore with this architechture
only the less popular or less requested movies suffer a high access delay and take a very large bandwidth.
In the diagram below the USC is called Central Office(CO) and the CSC is called Information Warehouse(IWH).
figure-12
After carefull consideration, the Quasi-centralized architechture seems to be the better one interms of bandwidth/storage costs.
This architecture can also include the three level memory hierarchies. This is shown in figure-13.
figure-13
The video entities are distributed on three different levels each corresponding to a different type of node. There are three different
types of nodes; zonal nodes, area nodes, regional nodes. When the user asks to view a movie, the zonal node searches it, and if the
requested movie is not available, it is searched first in some other zonal nodes,and then in an area node or in the regional node. The
movies must be reallocated as their pupularity changes. Consequently, a definition of a rule to update the database is necessary.
Digital Media Controll Command(DSM CC).
The Digital Storage Media Command Controll (DSM CC) is essential in interactive video services. Works are in progress to make the
DSM CC as an integral part of MPEG-2. DSM CC provides to general application a set of commands for eslablishing or deleting
a network connection and for performing a communication between a client and a sever across a network.
The MPEG data flow must be independent of the particular kind of DSM, of the local or remote DSM location and of the
network protocol. Since DSM CC only defines command syantax and semantic, different servers may generate different MPEG bitstrsms in response
to a same DSM CC sequence.
Currently there are two classes of DSM CC operations are defined. They are;
- User-to-Network operations: These operations include commands for establishing, managing and deleting a network connection.
These operations are performed by two calsses of primitives. They are,
- User-to-network configuration primitives;
- User-to-network primitives;
These can be grouped together in three different sequences;
- Client initiated command sequences;
- Server initiated command sequences;
- Network initiated command sequences.
- User-to-User operations: This involve the communication between a client and a server across a network(eg. Play a video stream).
These operations are performed by two classes of primitives;
- Application download primitives;
- Client-Server primitives.
In this paper will consider a sub-set of the DSM CC Client-Server primitives: So called Stream playback primitives which provides
the user with the typical VCR commands.
The stream playback primitives imply unexpected peaks or silence periods in the MPEG-2 transferred bitstreams that reflect upon
the bit rate carried by the tranport stream in the network.
The table below lists the stream playback primitives with a short description of their functions.
| Primitives. |
Functions. |
| dsm stream open |
Active a stream by its name. |
| dsm stream play |
Send normal play stream. |
| dsm stream puase |
Stop sending stream. |
| dsm stream scanforward |
Send the stream at a forward speed other than normal play. |
| dsm stream scanreverse |
Send a stream at a reverse speed.
|
| dsm stream jump |
Jump to a time or stream position, relative to current position or absoloue from the begining. |
| dsm stream status |
Obtain the status of a stream. |
| dsm stream close |
de-activate a video. |
The DSM stream open, status, close primitives are realted only to siganlling traffic. The DSM play primitive invocation causes the transfer
of an MPEG-2 transport stream conforming to a paricular Profile and Level.
Hence the characteristics of a data traffic due to this command are:
- The video coding procedure defind by the profile;
- The video coding procedure defined by the level;
- The content of the video sequence;
- The coding strategies adopted by the encoder.
The DSM stream pause must produce a "freeze" frame in the user terminal. From the MPEG-2 decoder specification, we note that frame memories
exist in the decoder. This structure could allow local freeze frame simply stopping the transmission of the MPEG-2 data flow by the server.
Consequently, synchronization techniques have to be defined in order to assure the clock(time stamps) alignment between encoding and
decoding. The MPEG-2 standard does not provide this facility explicitly( in addition a generic video server usually does not include a real
time encoder). Therefore the alignment techniques must be done through a suitable post processing method.
Alternatively, the server could generate a data stream(conforming to the MPEG-2 syntax and to profile and level) that represent a still frame.
This could be realized transmitting continuously an ad hoc generated GOP. The basic strategy may be to intra code the selected frame so that
the transmitted GOP contains empty P and B-frames. If the freeze involves a P or B-frame, the solution depends on the server post-processing
capability. In fact, as the still frame has to be coded in intra mode, the server could decode the current GOP and recode the B or P still
frame in intra mode.
The effects produced by a DSM stream scanforward/reverse primitive invocation are the change of the sequence prsentation speed and direction.
In this case, an ad hoc sequence(GOPs) has to be generated to realize the command. In the case of Reverse versus the server has to decode the
sequence and generate a new one for a scan presentation. Otherwise the server can generate a reversed sequence of intra frames, in coded form,
from the original one.
In the case of fast forward the server can generate a new sequence recoding the original one after discarding a numberof P and B-frames.
This can lead to a new coded sequence having GOPs structure similar to the "original" one. In this case the frames of the scan sequence
are more spaced out temporally, however it does not significantly change the traffic static but require a real time encoder and decoder
as post processing facility.
The DSM stream jump primitive invocation causes the jump of the MPEG-2 transport stream to a particular time stamp or position. If the jump
position correspods to a P or B frame, the solution depends on the server coding capability. In fact the server could decode the relative
GOP and restart the transmisssion generating a new GOP begining from the frame in jumping position. If the server is not able to perform
this operation it can simply transmit the GOP containing the jumping position. The influence on the traffic of the DSM jump primitive
should be negligible.
Concluding remarks...
Communications networks have gradually evolved towards the concept of the integration of services. This coupled
with advances in networking technologies has created the possibility for the integrated networking of a wide range of
services. The global networking standards now point to B-ISDN, the universal network solution concept.
This has lead to the development of the fast packet switching transfer mode ATM, which is now accepted to be
the optimum solution to B-ISDN. These technologies have provided the chance for the realisation of real-time broadband services,
most notibly video networking. Coupled with this, advanced compression schemes to reduce the volume of video data
have been developed, and the MPEG systems have been adopted as the standard.
The ATM concept
is flexible enough to accomadate MPEG compressed bit streams efficiently, so making video transfer over networks
feasible. Many video networking services, including interactive services are planned. VOD is a significant new
service in which a lot of research has been placed, and which is considered to be a realistic prospect of the
not so distant future. All these developments are set to revolutionise the communications arena, the technology
is virtually established. All that remains is for the social, economical and political forces to will this into
being.
Bibliography
Usefulness rating out of 10.
JOURNALS
- ISO IS 13818-1 -ITU-T Recomandation H.222.0, Information technology - Generic coding
of moving picture and associated audio information, part 1: systems,November,1994.
Usefulness: 7
- ISO IS 13818-2 -ITU-T Recomandation H.222.0, Information technology - Generic coding
of moving picture and associated audio information, part 2: systems,November,1994.
Usefulness: 7
- D. Le Gall. MPEG: A video compression standard for multimedia applications.
Communications of the ACM, April 1991.
Usefulness: 9
- ISO WD 13818-9 -Draft ITU-T Recomandation H.222.x, Information technology- Generic
Coding of Moving pictures and associated audio information, part9: Real time Interface
specifications, November 1994.
Usefulness: 5
- Advance Television Research Consotium, "Advanced Digital
Television: Prototype Description", FCC WPI certification
Document,feb 1992.
Usefulness: 3
- H. Sun and W. Kwok, "Adaptive Concealment for Block-based
compression video".
Usefulness: 8
- W. Kwok and H .Sun, "Multi-directional Interpolation for Spatial Error
Concealment," to be submitted.
Usefulness: 7
- H .Sun and j.Zdepski ,"Adaptive Error Concealment Algorithm or
MPEG compressed video", SPIE Proc. Visual Comm. and Image
Processing 92, Vol. 1818, Nov. 18-20, 1992, pp. 814-824.
Usefulness: 8
- F. Kishino, K. Manabe, Y. Hayashi, and H. Yasuda, "Variable
Bit-rate Coding of Video Signals for ATM Networks," IEEE
J.Selected Areas in Comm., Vol 7., No. 5 ,june 1989.
Usefulness: 3
- "The Lines Unleashed" in Byte - May 1996, plus articles: "The New WAN" by Salvatore
Salamone, "The Price of WAN Connectivity" by Liza Henderson and "Playing the ATM Card" by Lane F. Cooper.
Usefulness: 9
BOOKS
- ASYNCHRONOUS TRANSFER MODE Solution for Broadband ISDN 2nd Edition, Martin de Prycker,
Ellis Horwood Publishing.
Usefulness: 9
- "Asynchronous transfer mode: the ultimate broadband solution?" by M. Jeffrey, Electronics & Communications
Engineering Journal June 1994.
Usefulness: 5
- "Management of patient records with NHS local area supported by ATM" by Constantinos TSIBANIS, Department
of Computer Science at Bristol.
Usefulness: 3
- ATM Switching Systems, Thomas M. Chen and Stephen S. Liu, Artech
House Publishing.
Usefulness: 7
- Integrated Broadband Networks - An Introduction to ATM-Based
Networks, Rainer Handel and Manfred N.Huber, Addison-Wesley
Publishing.
Usefulness: 8
- "ISDN" by Thomas A. Fine from "Recent advances in networking"
Usefulness: 4
- "ATM Asynchronous Transfer Mode" by Brendan McKeon from "The OSI Reference Model"
Usefulness: 5
INTERNET
- Is ATM the future of all global
communications?
- Why has communications evolved towards
the ATM concept?
- ATM transport and cell-loss concealment
techniques for MPEG video.
- What is MPEG video compression
standard?
Acronyms
| B-ISDN
| Broadband Integrated Services Digital Network
|
| MPEG
| Moving Picture Experts Group
|
| ATM
| Asynchronous Transfer Mode
|
| LAN
| Local Area Network
|
| WAN
| Wide Area Network
|
| HDTV
| High Definition Television
|
| AAL
| ATM Adaptation Layer
|
| ISO
| International Standadisation Organisation
|
| IEC
| International Electrotechnical Committee
|
| GOP
| Group Of Pictures
|
| MC
| Motion Compensation
|
| DCT
| Direct Cosine Transform
|
| VOD
| Video On Demand
|
| PS
| Program Stream
|
| TS
| Transport Stream
|
| STC
| System Time Clock
|
| PTS
| Presentation Time Stamps
|
| PES
| Program Elementary Streams
|
| DSM CC
| Digital Storage Media Control Command
|
| PVOD
| Pure Video On Demand
|
| NVOD
| Near Video On Demand
|
| CSC
| Core Servicxe centre
|
| USC
| User Service Centre
|
| CCITT
| Consultative Committee for International Telecommunications and Telegraphy
|
| ARPA
| Advanced Research Projects Agency
|
| SMDS
| Switched Multimegabit Data Services
|
| NTSC
| National Television Standards Commoittee
|
| SDM
| Synchronous Digital Hierarchy
|
| AT &T
| American Telephone and Telegraph
|
| SONET
| Synchronous Optical Network.
|