Expanding PCI Express Horizons with Low-Cost Switches

Just over two years after the release of the PCI Express (PCIe) core specification
1.0a, products based on this new technology are already on the market. This rapid
market adoption is a testament to the viability of this next-generation interconnect
standard and its suitability to the need of the market. The first products incorporate
links to video components with graphics chips and system controllers that include
native support for PCIe technology

This quick implementation was partly due to the complete backward compatibility
of the software with PCI and the quantum leap forward in bandwidth. In the near
future a wide variety of servers, storage devices and, ultimately, communication
systems will rapidly design in this new interconnect technology, which includes
bridging, endpoint and switching devices. Within this wide array of PCIe offerings,
one critical device is the PCIe switch that allows the replacement of the shared
multi-drop bus of PCI with new point-to-point links offered by switches. New system
architectures will require flexible solutions with high port and lane count switches.

Other requirements for these designs include peer-to-peer switching, multiple
virtual channel support and non-transparent port capabilities. These high-end
applications using advanced PCIe features are just scratching the surface of the
potential for the PCIe industry standard.

A new class of switches coming to market will address the need for lower-cost
solutions with a minimal number of lanes and ports. These latest products serve
many classes of applications, including printer engine controllers, intelligent
adapter cards and redundant-system controller boards. Here, the primary usage
involves aggregation of data as opposed to data switching. With ultra-low latency,
non-transparent bridging and aggregation, as well as peer-to-peer switching
functionality and Hot Plug capabilities, these solutions are still fully capable
of addressing highly cost-sensitive markets.

PCI Express Switches in Office Automation

One “mass market” application for PCIe switches is in printer
or other office automation products. A low-end PCIe switch can be very useful
in an embedded print engine or similar type system controller application. Such
designs require switches with low port/lane counts and low cost, yet the system
throughput can be very high for graphics-oriented data. Figure 1 illustrates
a switch used to transfer data from a scanner and other input devices to the
print engine CPU and back to the marking engine. Endpoint devices are incorporating
native PCIe support to allow the use of the additional ports to provide I/O
based on technologies such as USB, IEEE 1394 (FireWire) and Ethernet.

This particular topology requires the input sources to be switched to the CPU
for image processing and switched back to the marking engine. Aggregation of data
and peer-to-peer functionality needs to be supported. To achieve the needed bandwidth
matching, the switch must provide eight lanes spread among five ports. The lane-to-port
matching must be flexible enough to allow a variety of topologies. To ensure full
non-blocking operation with wire speed, the total number of lanes must allow full
bandwidth matching; in this application a total of eight lanes are required. With
the potentially removable input sources tied directly to a downstream port of
the PCIe switch, Hot Plug capability is very important and must be provided with
the nine Hot Plug pins (ten with Vaux support) per controller supported.

Multi-Port Intelligent Adapter Cards

Intelligent adapter cards are used in a variety of applications including TCP/IP
offload engines, remote direct memory transfer (RDMA), RAID controller functionality,
iSCSI protocol processing and encryption/decryption engines. They offload these
tasks from the server and allow it to concentrate on running applications instead
of networking protocols. With new operating systems supporting such offloads and
new database applications supporting RDMA and other standards, these novel adapters
will soon become more common. To reduce the number of I/O blades within such systems,
card adapter vendors are expanding the number of Gigabit Ethernet (GE) ports.
Low lane/port count switches can be used on dual GE intelligent adapter cards
to support dual GE I/O. GE endpoint solutions on the market today have native
support for PCIe.

Figure 2 illustrates a PCIe switch providing both fan-in/fan-out and peer-to-peer
transfers. Data from the I/O and the local CPU are aggregated to the host through
the switch’s upstream x4-lane port. Traffic is also transferred to and
from the local CPU to the GE I/O through peer-to-peer capabilities. With a single
PCIe lane, a Gigabit Ethernet I/O can be amply supported. With the PCIe 4-port
switch design, cards with dual I/Os can be supported, as can simultaneous transfer
of this data to the local CPU for connection termination and other local processing.
The x4-lane upstream port balances the data throughput. Through a fully non-blocking
architecture with deep buffer memory and optimized flow control, wire-speed
operation can be maintained without dropping packets.

Local processing requires the adapter to have onboard CPU or NPU-based processing.
Normally this would cause a system memory-mapping conflict. Through the use of
the flexibly located non-transparent (NT) port on the PCIe switch, the local processor
can be isolated from the host. The system discovery process will enumerate and
configure all system elements but not the adapter card devices, as the NT port
will act as an endpoint to the root complex. The local processor can undertake
all on-card configuration steps. Address translation between the local processor
and system host processor domains is supported within the switch’s NT port;
this applies to both address and ID routing-based transaction transfers. Mailbox
and doorbell register are available for direct processor communications. For more
on non-transparency and its use and adherence to the PCIe specifications, please
refer to “Enabling Multi-Host System Designs with PCI Express Technology,
RTC,” May 2004, p.14.

As with all communications solutions, speed and latency requirements in such designs
are critical. New PCIe switches coming to market will provide support for ingress
to egress transfers as low as 150 ns, even for large data payloads of 256 bytes.
This is achieved in part through a cut-through switching architecture. For high-priority
data, multiple virtual channels (VCs) with several arbitration techniques are
imperative in order to ensure true quality of service (QoS) support. These new
switch architectures must be low cost yet offer the necessary VCs to provide for
data prioritization and the use of high-priority “control path” data.
These are absolute “musts” in server cluster and high-performance
computing platforms.

Server Blade Isolation for Host Failover

Many complex designs such as application servers, storage systems and routers
often require high availability (HA). HA as defined by the HA Forum is a combination
of structural, temporal and spatial redundancy. Structural redundancy involves
verifying the data transmitted was received without errors; here PCIe technology
provides CRC checks. Temporal redundancy utilizes system bandwidth to achieve
availability such as through the use of PCIe ACK/NAK messages. Spatial redundancy
requires the use of “spare” devices that can be utilized in case the
primary system fails. In each case, these new PCIe switching solutions support
HA and rapid host failover.

To further support structural redundancy, these switches go beyond the basic link-level
CRC checks and add optional end-to-end CRS checks. This ensures that if at the
transaction level of some intermediate PCIe network element any data is corrupted,
that data is marked as such and the terminating device (and system manager) will
be made aware of this event and take appropriate action. PCIe transaction layer
headers provide an optional poison bit for marking data as invalid but allow the
data to be transferred nevertheless.

With support for optional advanced error reporting in the switches, the system
manager can utilize these corrupted packets to better control network elements
and to monitor and maintain data integrity. To expand on temporal redundancy,
these switches can utilize the multiple VCs mentioned above. Control path data
used to alert system elements of a device failure can be marked with a high-priority
type class (TC). While all PCIe elements must support eight TCs, only some PCIe
components support multiple associated hardware-based VCs to actually transport
this control data ahead of normal traffic.

To provide spatial redundancy, switches need to support rapid failover. In many
of these redundant systems, dual processors are employed where one processor is
the backup and only comes into use on failure of the primary processor. As noted
earlier, PCIe switches can be used to isolate these dual-host systems with non-transparency.
Figure 3 illustrates two generic server boards. The PCIe switch is used to prevent
each processor board host from enumerating and configuring the other board. In
addition, these NT ports provide doorbell and mailbox registers for inter-processor
communication. The backup processor will monitor heartbeat messages from the primary
system and upon failure take over as the primary host. These switches have the
very low latency required to assist in a rapid failover to the backup system.
Here, only one such switch is shown with the one NT port. However, for practical
purposes and to maintain identical board layouts, each board would include a switch
with non-transparency. This application can be generalized to where each processing,
storage or I/O board would include a switch to isolate each processor domain.
Hot Plug is also crucial on this NT port to allow the administrator to swap out
boards while maintaining system up-time.

Just as PCIe technology is replacing AGP ports in mainstream consumer PC graphic
cards, this new standard is capable of addressing many other high-volume applications.
These designs require low-cost switching solutions yet demand extensive feature
sets and high performance. Today, companies such as PLX Technology offer x16
and x32-lane devices, and these applications can take advantage of this availability.
In the future these designs can migrate to lower lane count offerings as they
become available. Such switches will continue to be fully interoperable with
endpoint solutions and PCIe bridges to PCI and PCI-X available now. Current
architectures utilizing legacy PCI and PCI-X designs or company proprietary
interconnects can leverage this next-generation interconnect for lower total
costs and improved end-user benefits.

PLX Technology
Sunnyvale, CA.
(408) 774-9060.
[www.plxtech.com].