MULTIPLE CHANNEL
ARCHITECHTURE

by
Shahrin Imran

CONTENTS:







INTRODUCTION

Multiple Channel Architechture (MCA) is a new optical interconnection strategy for parallel processing. The architechture seeks to exploit the high bandwidth available in optical commmunications by supporting multiple virtual buses or selectable channel on a single fiber. Each virtual bus corresponds to a different assigned channel frequency. Arbitrary interconnection patterns and machine partitions can be emulated via appropiate channel assignment. Furthermore, simultaneous execution of parallel tasks are also possible. Here frequency division multiplexing(FDM) is used to provide multiple channels. Currently, over 1000 high bandwidth (100-200 Mb/s) independant channels can be supported on a single optical fiber......in the future it is likely to reach 1 Gb/s!

In this article we shall study the basic overview of the MCA architechture and compare them with previous interconnections methods. Then we look at how parallel processing is made possible by the MCA. Finally we shall conclude by outlining the advantages MCA has over these systems.






Contents | Introduction | Problems With Current Interconnection Techniques | MCA System Organization
MCA Resource Allocation | MCA Modes of Operation | Advantages of the MCA | Conclusions




Problems With Current Interconnection Techniques

Enormous work has been done in developing interconnection technologies to balance performance and cost in an effort to get as close to an ideal network. Here we will examine various attempts from both electrical () and optical () domain. We will then write down the characteristics required for an ideal system to overcome these problems.


Mesh Network (electrical)

figure 1

Two dimensional Mesh Network is very effective in nearest neighbour communication. The difficulty here is that the network size constructed must be an even power of two to maintain consistent mesh properties. Therefore, although large systems can be created, there is a constraint in size increments. Further, partitioning is also limited to sizes mentioned above.
In the case of a faulty node will result in an entire column to be bypassed. Hence good processor elements(PE) will unecessarily be taken of line --> weak fault tolerence.
However the most limiting factor is its large diameter which implies large access time to retreive data as well as slow interprocessor communication.


Hypercube (electrical)

figure 2

The problem with the hypercube structure is that it is not readily extensible since the degree of the nodes changes when the size of the structure increases.


Simple Global Bus (electrical)

The most common means of interconnecting relatively small number of processors. Limitations include a degree of simultaneity of 1/N (where N is the number of nodes) and saturation occurs when the amount of communication demanded is large.


Multiple Bus Structure (electrical)

figure 3

Although the availability of multiple busses provide alternatives for nodes, the system is thus more expensive because all nodes must have receivers for each bus. Furthermore, if the amount of memory banks is large, then the number of these banks accessible by a node is limited to the number of busses available.


Aquarious (electrical)

figure 4

Here the system is multidimensional. Memories and processors are packed together in a single node. However the diameter is dependant on the number of dimensions. Complications thus occurs when broadcasting data on multiple busses because the data has to traverse various dimensions. Hence there is increase latency with increasing dimensions.


Multistage Interconnection Network (MIN) (electrical)

figure 5

This system provides greater flexibility than previous methods but with substantial cost. Firstly is the wiring costs of switchboxes incurred for large networks. In addition there is only a single path from a given source to a given destination and no arbitrary connections --> bad fault tolerence. Futhermore, extensions and partitioning are only in increments of power of two.


Active Optical Splitters and Active Combiners (optical)

figure 6a : Splitters

figure 6b : Combiners

Made of switching elements put together to form a binary tree structure. A good design but unfortunately requires 2N(N-1) switching elements, making it quadratic in N. (N is the number of nodes)


Fast Optical Cross-Connect (FOX) (optical)

figure 7

Here the maximum number of nodes is limited to the exact number of channels. This is because frequencies are statically allocated to receivers. Thus there is a waste in bandwidth because idle processors and memories are allocated communication bandwidth that can't be used by other tasks.


Coherent Wavelength Switch (optical)

figure 8

Here a filter containing information from n different channels is broadcasted to n separate tunable receivers via an optical splitter. Unfortunately the number of tunable lasers and receivers required is N times the number of stages. Furthermore, because of the static circuit switched nature of these connections, periods of idle transmission will waste some bandwidth.



Based upon the limitations observed above, we can summarise the parameters/characteristics required for an ideal network to overcome these problems.


What MCA does try, is to inhibit all these characteristics into its architechture.






Contents | Introduction | Problems With Current Interconnection Techniques | MCA System Organization
MCA Resource Allocation | MCA Modes of Operation | Advantages of the MCA | Conclusions




MCA System Organization


figure 9 : MCA system block diagram

The MCA (fig.9) consists of fiber optic dual cabal broadband communication system with processors, memories and I/O devices acting as nodes on the network. Each node is attached to the network via tunable laser transmitters and tunable heterodyne receivers. A passive star coupler is used to evenly divide laser transmitter power to all receivers. By selecting proper frequencies we can arbitrarily partition processors, memories and I/Os.

Each node address consists of three parts: channel frequency, a function code and a physical address field. To find a node we must first tune to the channel frequency. Next, we determine the type of node (processor, memory or I/O) via the function code. Finally we apply the physical address field to specify a memory line address, I/O block or address of a processor.


figure 10 : MCA processor node

The processor node is as in figure 10. The processor should be at least 32-bit to allow high speed computations. The memory management unit controls access to network and manages the local cache. The cache which stores local data uses a least recently used (LRU) and write back policy. The auxiliary function controller performs tasks to support parallel processing. Finally the dual port RAM allows overlap of CPU and auxiliary function thereby allowing maximum concurrency.


figure 11 : MCA memory node

The memory node is as in figure 11. An excellent feature is that due to buffering provided for both the transmitter and receiver, memory nodes can be transmitting results of previous request, retreiving contents of present request and receiving next memory request simultaneously.

There are three types of memory addresses: instructions, local data and shared data. Instructions are strictly read only. Local data is stored in cache while shared data isn't to prevent cache coherent problems.


figure 12 : MCA I/O node

The I/O node is as in figure 12. The high speed buffer provides disk block caching and allows prefetching of data. This enables disk block access to be nearly as fast as memory blocks.

The passive star is made of fused binocular taper couplers which divides the transmitting light(say from the left side) to every receiving optical fiber(on the right).


figure 13 : MQW-DBR laser structure - active region, phase control region and DBR region

MQW-DBR lasers are used in the MCA transmitter, figure 13. Laser oscillation is provided by biasing the active region. The phase and DBR regions are biased below threshold. Tuning is done by adjusting the current in the DBR region which subsequently alters the light emitted to the desired frequency.

The MCA receiver also uses MQW-DBR lasers but biased below threshold. When exposed to a light source of the correct frequency, photodetection occurs in the active layer. This causes an external voltage across the forward bias diode.
The use of the same structure for both the transmitter and receiver greatly simplifies the coupling of the MCA.

Each node can switch channel (frequency) during execution by dynamically changing the injection current to the laser. Additionally transmission and reception can be performed on different channels (because nodes are not assigned to a fix channel).

The bandwidth of the filter is divided into many high speed data channels or virtual busses. Data is transfered serially over the medium using a message based protocol and CSMA/CD arbitration. By using CSMA/CD, multiple processes operating on different processing nodes can coexist.






Contents | Introduction | Problems With Current Interconnection Techniques | MCA System Organization
MCA Resource Allocation | MCA Modes of Operation | Advantages of the MCA | Conclusions




MCA Resource Allocation

Resource allocation is a very important aspect to consider in a computer system. It is essential that resources are fully utilised and optimised during execution. Bad techniques can severely effect the effeciency and throughput of the system.

MCA allows partitioning and allocation of communication bandwidth by task. Here bandwidth is better utilised because bandwidth size is allocated according to how large a multiprocessor task is. High priority tasks can also combine with tasks with low communication requirements on the same virtual bus.

Communication between nodes is established by assigning the source of data and the consumer of the data the same frequency and address space. Allocation of resources is done by assigning resources (memory, processors, etc.) to specific channel addresses and is monitored by the operating system (OS) via dynamically changeable allocation tables. Furthermore, resources within tasks are reconfigurable during execution for better utilization.

MCA allows multiple tasks to share a memory or I/O node and multiple uniprocessor tasks to be assigned to the same processor mode. To handle multiple tasks, operating system functions are divided among processing elements (PE). "Supervisor" processors will then look over each task i.e.each group of PEs, according to the functions and control given to each group.






Contents | Introduction | Problems With Current Interconnection Techniques | MCA System Organization
MCA Resource Allocation | MCA Modes of Operation | Advantages of the MCA | Conclusions




MCA Modes of Operation

Flynn M.J.[1966] categorized computers according to the parallelism in the instruction and data streams into four modes:

1. Single instruction stream, single data stream (SISD, the uniprocessor)

2. Single instruction stream, multiple data stream (SIMD)

3. Multiple instruction stream, single data stream (MISD)

4. Multiple instruction stream, multiple data stream (MIMD)

The uniprocessor model, SISD, is easily accomplished in MCA by assigning a single processing node, several memory nodes and some available I/O nodes to the same frequency. To exploit the parallel capabilities of MCA it is more advantage to use the SIMD and MIMD modes of operation.
A single SIMD instruction would require multiple and parallel computations using different datas. On the MCA this is done by assigning a single processor, several memory nodes and some available I/O nodes to multiple frequencies (c.f. SISD).
An SIMD instruction thus requires it to be broadcasted through multiple channels.

An MIMD process on the other hand requires multiple processors executing multiple tasks. Each processor executing its own program on local data. Obviously this form of operation requires synchronization for all its processors. This is done in the MCA using barrier synchronization. Here time-division multiplexed channels are used. Assume that n processors are required to halt at a barrier. Each processing node can be programmed to transmit a "1" bit during an assigned time slot when the processor reaches the barrier. As soon as all n "1" bits are detected all processors would be known to have met the barrier.

The capability to switch modes during execution would improve performance considerably over both pure SIMD and pure MIMD modes of operation. MIMD has an advantage over SIMD in algorithms requiring many conditional (if-then) branches. However, SIMD data transfers require much less overhead than MIMD transfers that needs interupts and interprocessor communications. The auxiliary function and dual port RAM (refer MCA System Organization) allows quick transitions between SIMD and MIMD modes of operation. Furthermore, a processor could have a seperate channel for SIMD and MIMD.

The MCA allows processors to be partitioned into two channel groups. This is particularly very useful in executing conditional statements. For instance consider "if" statements. The processors evaluating "true" could remain on the same channel for their instruction stream whereas those processors evaluating "false" would switch channels, receiving their instructions from an alternate processor.

Pipeline implentations could also be easily performed in the MCA. A group of processors switched to SIMD mode can be controlled by a Processsor Q placed on their SIMD instruction channel. At the end of processing, Processor Q could go to another group of processors while a new Processor R began to to issue in instructions to the processors on this SIMD channel. A third processor could then take over the channel and run a third algorithm on the data.






Contents | Introduction | Problems With Current Interconnection Techniques | MCA System Organization
MCA Resource Allocation | MCA Modes of Operation | Advantages of the MCA | Conclusions




Advantages of the MCA

By analyzing the MCA architechture and comparing them with previous interconnection technologies, it is clear that MCA offers many advantages.







Contents | Introduction | Problems With Current Interconnection Techniques | MCA System Organization
MCA Resource Allocation | MCA Modes of Operation | Advantages of the MCA | Conclusion




Conclusions

The MCA is a new parallel architecture that seeks to exploit the high bandwidth available in fiber optics systems by creating multiple high-speed optical buses that correspond to assigned channel frequencies. It is able to assign approximately 1000 high speed data channels for interconnections among processing elements, memories and I/O devices. This is done by using widely tunable lasers and other optical components. Furthermore, it can adapt to changing topology requirements of parallel programs while supplying large amounts of processing power. The transmission and receiving mechanism are also simplified because the same structure is used (MQW-DBR).

However, a suitable operating system or compiler for the architechture would be difficult to create due to the flexible and dynamic nature of the system. This system is also suitable only for moderately large systems i.e. it would be wasteful for small systems as they would not be able to fully utilize the abundance of bandwidth available in the interconnection system.






Contents | Introduction | Problems With Current Interconnection Techniques | MCA System Organization
MCA Resource Allocation | MCA Modes of Operation | Advantages of the MCA | Conclusions




REFERENCES


1. Tom S. Wailes and David G. Meyer
"Multiple Channel Architechture"
IEEE 1990, pg 315-323.

2. Tom S. Wailes and David G. Meyer
"A New Optical Interconnection Strategy For Massively Parallel Computers"
Journal Of Lightwave Technology, vol.9, No.12, December 1991.

3. Yi-Mo Zhang, Xiao-Qing He, Ge Zhou, Wen-Yao Liu and Zhan-Ping Yin
"Optical Fiber Interconnection System for Massively Parallel Processor Arrays"
IEEE 1995, pg 57-62.

4. John L. Hennessey and David A. Patterson
"Future Directions"
Computer Architechture: A Quantitative Approach, pg 570-574, 1990.