by
Shahrin Imran

Multiple Channel Architechture (MCA) is a new optical interconnection strategy for
parallel processing. The architechture seeks to exploit the high bandwidth available
in optical commmunications by supporting multiple virtual buses or selectable
channel on a single fiber. Each virtual bus corresponds to a different assigned
channel frequency. Arbitrary interconnection patterns and machine partitions can be
emulated via appropiate channel assignment. Furthermore, simultaneous execution of
parallel tasks are also possible.
Here frequency division multiplexing(FDM) is used to provide multiple channels.
Currently, over 1000 high bandwidth (100-200 Mb/s) independant channels can be
supported on a single optical fiber......in the future it is likely to reach 1 Gb/s!
In this article we shall study the basic overview of the MCA
architechture and compare them with previous interconnections methods. Then we
look at how parallel processing is made possible by the MCA.
Finally we shall conclude by outlining the advantages MCA has over these systems.
INTRODUCTION

Problems With Current Interconnection Techniques
Enormous work has been done in developing interconnection technologies to balance
performance and cost in an effort to get as close to an ideal network. Here we
will examine various attempts from both electrical (
) and
optical (
) domain. We will then
write down the characteristics required for an ideal system to overcome these problems.
Mesh Network (electrical)

figure 1
Two dimensional Mesh Network is very effective in nearest neighbour communication.
The difficulty here is that the network size constructed must be an even power of
two to maintain consistent mesh properties. Therefore, although large systems can be
created, there is a constraint in size increments. Further,
partitioning is also limited to sizes mentioned
above.
In the case of a faulty node will result in an entire column to be bypassed.
Hence good processor elements(PE) will unecessarily be taken of line --> weak fault
tolerence.
However the most limiting factor is its large
diameter which implies large access time to
retreive data as well as slow interprocessor communication.
Hypercube (electrical)

figure 2
The problem with the hypercube structure is that it is not readily
extensible since the
degree of the nodes changes when the size of the
structure increases.
Simple Global Bus
(electrical)
The most common means of interconnecting relatively small number of processors.
Limitations include a
degree of simultaneity of 1/N (where N is the
number of nodes) and
saturation occurs when the amount of
communication demanded is large.
Multiple Bus Structure
(electrical)

figure 3
Although the availability of multiple busses provide alternatives for nodes, the
system is thus more expensive because all nodes must have receivers for each bus.
Furthermore, if the amount of memory banks is large, then the number of these banks
accessible by a node is limited to the number of busses available.
Aquarious (electrical)

figure 4
Here the system is multidimensional. Memories and processors are packed together in
a single node. However the diameter is dependant
on the number of dimensions. Complications thus occurs when broadcasting data on
multiple busses because the data has to traverse various dimensions. Hence there is
increase latency with increasing dimensions.
Multistage Interconnection Network (MIN)
(electrical)

figure 5
This system provides greater flexibility than previous methods but with substantial
cost. Firstly is the wiring costs of switchboxes incurred for large networks. In
addition there is only a single path from a given source to a given destination and
no arbitrary connections --> bad fault tolerence. Futhermore,
extensions and
partitioning are only in increments of power of
two.
Active Optical Splitters and Active
Combiners (optical)
![]() figure 6a : Splitters |
![]() figure 6b : Combiners |
Made of switching elements put together to form a binary tree structure.
A good design but unfortunately requires 2N(N-1) switching elements, making it
quadratic in N. (N is the number of nodes)
Fast Optical Cross-Connect (FOX)
(optical)

figure 7
Here the maximum number of nodes is limited to the exact number of channels. This
is because frequencies are statically allocated to receivers. Thus there is a waste in
bandwidth because idle processors and memories are allocated communication bandwidth
that can't be used by other tasks.
Coherent Wavelength Switch
(optical)

figure 8
Here a filter containing information from n different channels is broadcasted to n
separate tunable receivers via an optical splitter. Unfortunately the number of
tunable lasers and receivers required is N times the number of stages. Furthermore,
because of the static circuit switched nature of these connections, periods of idle
transmission will waste some bandwidth.
Based upon the limitations observed above, we can summarise the
parameters/characteristics required for an ideal network to overcome these problems.
What MCA does try, is to inhibit all these characteristics into its architechture.

MCA System Organization

The MCA (fig.9) consists of fiber optic dual cabal broadband communication system with processors, memories and I/O devices acting as nodes on the network. Each node is attached to the network via tunable laser transmitters and tunable heterodyne receivers. A passive star coupler is used to evenly divide laser transmitter power to all receivers. By selecting proper frequencies we can arbitrarily partition processors, memories and I/Os.
Each node address consists of three parts: channel frequency, a function
code and a physical address field. To find a node we must first tune to the
channel frequency. Next, we determine the type of node (processor, memory or I/O) via
the function code. Finally we apply the physical address field to specify a memory
line address, I/O block or address of a processor.

The processor node is as in figure 10. The processor should be at least 32-bit to
allow high speed computations. The memory management unit controls access to network
and manages the local cache. The cache which stores local data uses a
least recently used (LRU) and
write back policy. The auxiliary function controller
performs tasks to support parallel processing. Finally the dual port RAM allows
overlap of CPU and auxiliary function thereby allowing maximum concurrency.

The memory node is as in figure 11. An excellent feature is that due to buffering provided for both the transmitter and receiver, memory nodes can be transmitting results of previous request, retreiving contents of present request and receiving next memory request simultaneously.
There are three types of memory addresses: instructions, local data
and shared data. Instructions are strictly read only. Local data is stored in
cache while shared data isn't to prevent cache coherent problems.

The I/O node is as in figure 12. The high speed buffer provides disk block caching and allows prefetching of data. This enables disk block access to be nearly as fast as memory blocks.
The passive star is made of fused binocular taper
couplers which divides the transmitting light(say from the left side) to every
receiving optical fiber(on the right).

MQW-DBR lasers are used in the MCA transmitter, figure 13. Laser oscillation is provided by biasing the active region. The phase and DBR regions are biased below threshold. Tuning is done by adjusting the current in the DBR region which subsequently alters the light emitted to the desired frequency.
The MCA receiver also uses MQW-DBR lasers but biased below threshold. When exposed
to a light source of the correct frequency, photodetection occurs in the active
layer. This causes an external voltage across the forward bias diode.
The use of the same structure for both the transmitter and receiver greatly simplifies
the coupling of the MCA.
Each node can switch channel (frequency) during execution by dynamically changing the injection current to the laser. Additionally transmission and reception can be performed on different channels (because nodes are not assigned to a fix channel).
The bandwidth of the filter is divided into many high speed data channels or
virtual busses. Data is transfered serially over the medium using a message based
protocol and CSMA/CD arbitration. By using CSMA/CD,
multiple processes operating on different processing nodes can coexist.

MCA Resource Allocation
Resource allocation is a very important aspect to consider in a computer system. It is essential that resources are fully utilised and optimised during execution. Bad techniques can severely effect the effeciency and throughput of the system.
MCA allows partitioning and allocation of communication bandwidth by task. Here bandwidth is better utilised because bandwidth size is allocated according to how large a multiprocessor task is. High priority tasks can also combine with tasks with low communication requirements on the same virtual bus.
Communication between nodes is established by assigning the source of data and the consumer of the data the same frequency and address space. Allocation of resources is done by assigning resources (memory, processors, etc.) to specific channel addresses and is monitored by the operating system (OS) via dynamically changeable allocation tables. Furthermore, resources within tasks are reconfigurable during execution for better utilization.
MCA allows multiple tasks to share a memory or I/O node and multiple
uniprocessor tasks to be assigned to the same processor mode. To handle multiple
tasks, operating system functions are divided among processing elements (PE).
"Supervisor" processors will then look over each task i.e.each group of PEs, according
to the functions and control given to each group.

MCA Modes of Operation
Flynn M.J.[1966] categorized computers according to the parallelism in the instruction
and data streams into four modes:
1. Single instruction stream, single data stream (SISD, the uniprocessor)
2. Single instruction stream, multiple data stream (SIMD)
3. Multiple instruction stream, single data stream (MISD)
4. Multiple instruction stream, multiple data stream (MIMD)
The uniprocessor model, SISD, is easily accomplished in MCA by assigning a single
processing node, several memory nodes and some available I/O nodes to the same
frequency. To exploit the parallel capabilities of MCA it is more advantage to use the
SIMD and MIMD modes of operation.
A single SIMD instruction would require multiple and
parallel computations using different datas. On the MCA this is done by assigning a
single processor, several
memory nodes and some available I/O nodes to multiple frequencies (c.f. SISD).
An SIMD instruction thus requires it to be broadcasted through multiple channels.
An MIMD process on the other hand requires multiple processors executing multiple tasks. Each processor executing its own program on local data. Obviously this form of operation requires synchronization for all its processors. This is done in the MCA using barrier synchronization. Here time-division multiplexed channels are used. Assume that n processors are required to halt at a barrier. Each processing node can be programmed to transmit a "1" bit during an assigned time slot when the processor reaches the barrier. As soon as all n "1" bits are detected all processors would be known to have met the barrier.
The capability to switch modes during execution would improve performance considerably over both pure SIMD and pure MIMD modes of operation. MIMD has an advantage over SIMD in algorithms requiring many conditional (if-then) branches. However, SIMD data transfers require much less overhead than MIMD transfers that needs interupts and interprocessor communications. The auxiliary function and dual port RAM (refer MCA System Organization) allows quick transitions between SIMD and MIMD modes of operation. Furthermore, a processor could have a seperate channel for SIMD and MIMD.
The MCA allows processors to be partitioned into two channel groups. This is particularly very useful in executing conditional statements. For instance consider "if" statements. The processors evaluating "true" could remain on the same channel for their instruction stream whereas those processors evaluating "false" would switch channels, receiving their instructions from an alternate processor.
Pipeline implentations could also be easily performed in the MCA. A group of
processors switched to SIMD mode can be controlled by a Processsor Q placed on their
SIMD instruction channel. At the end of processing, Processor Q could go to
another group
of processors while a new Processor R began to to issue in instructions to the
processors on this SIMD channel. A third processor could then take over the channel
and run a third algorithm on the data.

Advantages of the MCA
By analyzing the MCA architechture and comparing them with previous interconnection technologies, it is clear that MCA offers many advantages.

Conclusions
The MCA is a new parallel architecture that seeks to exploit the high bandwidth available in fiber optics systems by creating multiple high-speed optical buses that correspond to assigned channel frequencies. It is able to assign approximately 1000 high speed data channels for interconnections among processing elements, memories and I/O devices. This is done by using widely tunable lasers and other optical components. Furthermore, it can adapt to changing topology requirements of parallel programs while supplying large amounts of processing power. The transmission and receiving mechanism are also simplified because the same structure is used (MQW-DBR).
However, a suitable operating system or compiler for the architechture would be
difficult to create due to the flexible and dynamic nature of the system. This system
is also suitable only for moderately large systems i.e. it would be wasteful for small
systems as they would not be able to fully utilize the abundance of bandwidth
available in the interconnection system.

2. Tom S. Wailes and David G. Meyer
"A New Optical Interconnection Strategy For Massively Parallel Computers"
Journal Of Lightwave Technology, vol.9, No.12, December 1991.
3. Yi-Mo Zhang, Xiao-Qing He, Ge Zhou, Wen-Yao Liu and Zhan-Ping Yin
"Optical Fiber Interconnection System for Massively Parallel Processor Arrays"
IEEE 1995, pg 57-62.
4. John L. Hennessey and David A. Patterson
"Future Directions"
Computer Architechture: A Quantitative Approach, pg 570-574, 1990.