INTERCONNECT STRATEGIES
Designing a Non-Blocking, Multi-Stage Switching Network
Multi-stage switching networks using commercially available crosspoint switch ICs combine a fully non-blocking architecture with true scalability for I/O-intensive design requirements.
JOHN BERGEN, FAIRCHILD SEMICONDUCTOR
Communication systems designers have often used crossbar-switching schemes in telecommunications, networking, digital signal processing and multi-processing systems. These switching schemes are implemented in many different architectures depending on design requirements, such as the number of signals, predictability of connections and overall system cost. The basic building block in these switching schemes is an M input by N output (M x N) crosspoint switch, which has the ability to spatially connect any one of the M inputs to any one of the N outputs.
In practice, engineers can use crosspoint switches to build a complete switching network, but it is not feasible to fully interconnect every I/O using one enormous crosspoint switch matrix. To solve this problem, designers have turned to using multi-stage switching networks that deliver a fully non-blocking architecture with a reasonable, if not minimal number, of crosspoint switch ICs.
Crosspoint Switch Background
Telecommunication system designers have utilized crossbar switches since they were first implemented as electromechanical assemblies in telephone switching offices in the early 1900s. System designers can construct non-blocking crossbar switch matrices by using crosspoint switch fabric ICs, mostly built in advanced VLSI technology. Similar in concept to their electromechanical predecessors, closing the “switch” at the appropriate crosspoint in the matrix creates a connection between an input and an output. Crosspoint switch matrices are found in a variety of telecommunication systems such as digital cross-connects, media gateways and add-drop multiplexers.
Designers use high-performance crosspoint switches for many reasons. For one, they are non-blocking, ensuring that all inputs can find an uncongested path to an output and erasing the bandwidth limitations of one-at-a-time connections. Additionally, crosspoint switches have the flexibility of connecting any input to any output, and are easily scalable for the construction of larger switches from smaller switch elements. “Rearrangeability” is also a key architectural characteristic that makes crosspoint switches useful. This attribute allows one connection path to be changed without affecting the connections for the other paths.
There are two basic methods to implement crossbar switching arrays in semiconductor devices. Custom ASICs, FPGAs and some standard products often rely on an N-way multiplexer (Figure 1). In this architecture there is a multiplexer at each output port to select data from the input ports. Many semiconductor vendors have built crossbar switch products using this easy-to-understand methodology. However, these devices are limited in terms of architectural flexibility (fixed number of inputs and outputs) and performance, and they are difficult to implement efficiently in silicon.

The other implementation uses a crosspoint array (Figure 2) that has a switching element at each input/output intersection. This method offers a high-density crosspoint array with greater flexibility than traditional crossbar switches. This architecture is not limited with fixed input and output structures as each I/O pin can be configured as either an input or output. This type of implementation makes it feasible to build large switch matrices in a cost-effective manner.

Crosspoint Switch Networks
The ability to construct larger switch networks from smaller switch elements is fundamental to the theory of switching architectures. However, there is a basic limitation to building larger crosspoint switch networks. When the switch fabric is doubled, it requires quadrupling the number of chips. For example, to scale a 64-input by 64-output (64 x 64) switch up to a 128 x 128 switch requires four 64 x 64 switches. As a result, this methodology can become unfeasible because so many devices are needed to build a larger interconnect network.
For example, one way to create larger networks is to cross-couple crosspoint switches in a large single-stage configuration as shown in Figure 3. This arrangement shows four 256 x 256 crosspoint switch ICs combined to make a 512 x 512 crosspoint switch. Any of the 512 signals from the left side can be connected to one (or more) of the 512 signals on the right side. Moreover, the signals on the left and the right can be either an input or an output, in any combination, and the input-to-output delay through the switch is the same as the pin-to-pin delay inside a single IC.

Today’s designers face the challenge of creating even larger switch networks with thousands of inputs and outputs. It is not practical to create so large a network using a single stage, so the designer must find a technique to split up the signals into a manageable set of inputs per card while preserving the performance of the network.
Multi-Stage Networks
Multi-stage crossbar switching schemes are commonly used in telecommunications and computer networking to overcome the limitations of individual chips or boards. Scalable and expandable, a multi-stage network provides several benefits. Using a number of devices, multi-stage networks can be completely non-blocking, provide good broadcast and multicast features, and build in redundancy and reliability with no single point of failure in the system. Additionally, designers can expand these systems incrementally by adding one more modules to the existing design.
These switching schemes can be implemented in many different architectures depending on design requirements, such as the number of signals, predictability of connections and cost. One of the most popular designs is the three-stage Clos network (named after a researcher at AT&T Bell Labs), which is commonly used in a variety of telecommunications and networking systems. The basic building block in these switching schemes is an N-input by N-output (N x N) crosspoint switch, which has the ability to spatially connect any one of the inputs to one or more of the outputs.
In this arrangement, any pairing of signals between the left side and the right side can be achieved. It has 100% predictability since every connection made through the switch travels through three stages of crosspoint switches, resulting in uniform and predictable delays. A Clos network is a fully non-blocking switch matrix without a single point of failure since there are multiple paths from each input to each output (Figure 4).

Design Example
A designer needs to implement a 1024 input x 1024 output switch using commercially available crosspoint switches. In order to effectively implement this switch, the designer can choose either a single-stage network or a three-stage Clos network configuration.
Designers are encouraged to explore the websites of the various commercially available crosspoint switch vendors, where interactive design and evaluation tools like Fairchild’s “SwitchSelector” (www.fairchildsemi.com/products/interface/switchselector/switch.jsp) may be used to:
- Provide the engineer with an easy-to-use tool for creating a switch design without having to fully understand the features of the available crosspoint switching devices.
- Allow the engineer to evaluate different implementations for cost, performance, type of chips, number of chips, or approximate required PCB area.
These user-friendly design and evaluation tools require the user to enter the number of inputs and outputs for the size of the switch matrix desired. The user can also select between the different product families, I/O technologies, available data rates and other performance parameters. For a given set of design inputs, the design tool will recommend the different options available, allowing the engineer to select the most appropriate method for the application.
In our example of designing a 1024 input by 1024 output switch matrix, let’s say that we have decided to consider any and all crosspoint switch families—utilizing TTL, LVTTL, LVDS or LVPECL technology—and that data rate does not matter, in order to maximize the number of potential solutions that will be available. The required information was entered, and the online design tool responded as shown in Figure 5 with the following five different switch architecture options:
Method 1: Bit switch, single group; with total number of required ports <= available ports.
Method 2: Bit switch, single group; with inputs <= outputs; inputs <= available ports.
Method 3: Bit switch, single group; with inputs >= outputs; outputs <= available ports.
Method 4: Large NxN arrays from small nxn arrays; with inputs = or similar to outputs.
Method 5: Three-stage network; with inputs = outputs.

Since each architecture is not suitable for all sizes of switch matrices, the design tool only generates solutions for suitable architectures. In this case, Methods Four and Five (Figure 5) are the only applicable solutions.
The first solution option that the design tool came up with for this application is a single-stage switch architecture, shown as Method Four in Figure 5. The design tool indicates that this implementation of a 1024 x 1024 non-blocking switch fabric would require a total of 16 individual chips in size 792 thermally enhanced ball grid array (TBGA) packages. In addition to more than 50 square inches of board space, placing sixteen chips on a single card would require a large card with a tremendous I/O connector, thus presenting significant PCB routing challenges. While this type of a single-stage network does have its advantages, a more efficient solution can be achieved by using a multi-stage network. The second viable implementation offered by design tool, shown as Method Five in Figure 5, is a Clos network.

As Figure 6 shows, a 1024 x 1024 Clos network can be achieved using a variety of devices. The third choice from the top in the chart in Figure 6 suggests using four examples of a particular switch type in each stage, for a total of 12 devices. This choice provides a solution that uses fewer devices than the single-stage solution suggested by Method 4 (16 devices) and offers the benefits of a Clos network described previously. A block diagram of this solution is shown in Figure 7.

The drawback to this solution is that each signal must go through all three stages of the network instead of just a single stage in the other solution, essentially tripling the propagation time. It is up to the user to weigh the pros and cons of each solution when making a design decision.
While designs such as digital cross-connects, add/drop multiplexers and backplanes are traditional applications for multi-stage crosspoint switch networks, there are many new applications in emerging markets. For example, a recently introduced media gateway uses crosspoint devices to form the switch fabric. This system can switch TDM and voice-over-IP (VoIP) traffic across packet networks. The crosspoint switch adds to the system’s flexibility that can offload TDM and VoIP traffic across a range of interfaces and backbone networks.
Today’s designers have a number of options available to them to solve their switching problems. While crosspoint switches have increased in port count through the years, there is still not a single chip that is large enough to meet all design requirements. Multi-stage networks allow designers to build very large scalable switch networks. The selection of devices and implementation must be based on the specific design requirements of the application, a process that includes a thorough consideration of a variety of product types, features and performance.
Fairchild Semiconductor
South Portland, ME.
(800) 341-0392.
[www.fairchildsemi.com].


Adlink
Elma