INDUSTRY INSIGHT
InfiniBand for Industry and Communications
InfiniBand Fabrics Weave Performance into Communications and Industrial Solutions
Vendors are getting onto the InfiniBand wagon with standard chassis solutions that trump the costs and development times of proprietary high-speed interconnects.
THAD OMURA, MELLANOX TECHNOLOGIES
Industry standard fabric technologies have gained significant traction versus proprietary solutions in the embedded market place, especially for performance-driven applications. Just a couple of years ago, terabit switching performance was reserved only for super optimized (and super expensive) systems based on multi-chip silicon fabrics that took years to develop, tune and eventually productize. Developers realize the costs associated with maintaining these fabrics from both a hardware and software perspective is too high, in addition to the investment required to scale these solutions for future generations.
InfiniBand, an industry standard interconnect technology that is backed by a well supported trade association, has not only established itself as an out-of-the-box interconnect used to cluster commodity servers and storage into an efficient supercomputer, but has also found a home in traditional performance networking and industrial applications. InfiniBand is the only standard fabric that supports up to 60 Gbit/s blade-to-blade performance (by using 12 SerDes interfaces running in parallel at 5 Gbits/s) inside a single chassis and can expand this fabric externally to other systems via copper or fiber cables.
More typical of today’s deployment are links comprised of 4 SerDes interfaces running 2.5 Gbits/s or 5 Gbits/s for aggregate 10 Gbit/s or 20 Gbit/s data rates. A single silicon chip available from Mellanox acts as the switch fabric that supports nearly a Terabit of switching capacity, and multiple chips can be aggregated to easily scale well beyond this. All of this performance is available off-of-the-shelf without having to worry about complex protocol development and costly, time-consuming silicon tape-outs.
One of the major costs associated with proprietary fabrics is the need to bridge from a customized physical and link layer protocol to something standard. Typically expensive, large FPGAs are used to bridge proprietary switch fabrics to a standard bus interface so an NPU or general-purpose processor can eventually do something useful with the data. InfiniBand solves this problem with low-priced, standard silicon adapters the size of a U.S. dime, which connect the fabric to PCI Express. PCI Express has quickly become the de facto I/O on-ramp to processing complexes of blades and has had a profound effect on the proliferation of InfiniBand technology.
Prior to PCI Express, fabrics latched onto PCI-X, which provided blades with I/O throughput up to ~8.5 Gbits/s. A PCI Express x8 significantly increases the bandwidth to processing blades up to 20 + 20 Gbits/s or 40 Gbits/s full duplex. This perfectly balances the bandwidth of an InfiniBand DDR adapter on a blade and enables it to be the highest performing industry standard fabric on the market today. Once this connection is made, commodity x86, PowerPC, MIPS and even specialized network processing solutions can take advantage of the high-throughput fabric and communicate with the rest of the system.
Another problem proprietary solutions face is that once the fabric is available from a hardware perspective, following suit with complex software drivers and communication stacks can take months to debug and tune for maximum utilization and throughput from one blade to another. InfiniBand solves this with open source drivers and upper layer protocol stacks that are included directly with every Linux kernel distribution (version 2.6.11 and beyond). The organization responsible for the development and maintenance of this software support, OpenIB.org, is also working on standard Windows drivers, and several vendors offer WindRiver VxWorks packages as well a plethora of Unix, Solaris and MacOS support.
All of this widely available software has tuned drivers that exploit InfiniBand’s remote direct memory access (RDMA) capability, which allows applications running on processing blades to directly pass large blocks of data back and forth without getting the kernel involved, freeing CPU cycles to focus on application processing. Often referred to as kernel bypass, RDMA enables two applications on different blades in a system to communicate with a measured 2.7us of latency compared to greater than 50us latencies observed on standard Gigabit Ethernet fabrics. The end result is a more efficient embedded computing blade, and ultimately peak performance for the entire system.
InfiniBand in Standard AdvancedTCA Chassis
Combining the InfiniBand fabric with a standard chassis provides even greater benefits to the end user including increased reliability and bandwidth with lower resource utilization, at a much lower cost point than proprietary blade installations. One such vendor, Diversified Technology (DTI), has introduced an InfiniBand ATCA solution compliant to the PICMG 3.2 specification. That company’s Targa B solution features an InfiniBand fabric comprised of a switch card and a CPU blade. By using InfiniBand as the fabric in ATCA, users can immediately benefit from the bandwidth and reliability of InfiniBand in a completely industry standard form-factor.
Because InfiniBand is implemented as a channel-based I/O model, connections between fabric nodes are inherently more reliable than conventional I/O technologies. This level of reliability can take months or even years to design into proprietary fabrics. The InfiniBand message passing structure moves away from the traditional “load store” model creating a more efficient and reliable transfer of data. The core of PICMG’s ATCA specification provides link redundancy via multiple connections to multiple switches.
InfiniBand nodes naturally can be attached to a fabric for link redundancy fitting this model. If a link goes down, not only will that fault be limited to the link, but also the additional link ensures that connectivity continues to the fabric. If one path fails, traffic can be automatically re-routed to the final endpoint destination. InfiniBand also supports a reliable link model, where each packet received is acknowledged by a hardware mechanism, thereby reducing the time a system requires to be alerted of a problematic link so that proper action can be taken. With ATCA’s redundant fabrics and InfiniBand, an entire fabric can fail without creating downtime or even dropped packets.
Since InfiniBand technology is implemented as an I/O fabric, InfiniBand benefits extend down to ATCA end applications. Advances in transport hardware mean that InfiniBand-enabled CPU nodes are free to run applications and are not bogged down with carrying fabric overhead, like complex software protocol stacks required of a fabric based on LAN technology.
The heart of the Targa B system is DTI’s ATS2148 Hub Board, which provides separate control plane switching, and data plane switching. It supports data plane switching via a 10 Gbit/s InfiniBand switch with built-in Subnet Management Agent (SMA) and Performance Management Agents (PMA). Multi-pathing and automatic path migration are fully supported enabling fault tolerance and failover. The ATS2148 also features a 24-port Gigabit Ethernet switch for ATCA control switching. Ports are provided to support a full 16-slot shelf with both redundant switches on base and fabric with newly supported links to redundant shelf managers. Up to 8 uplink ports on the base can be provided for connections between shelves and to outside networks. A complete block diagram of the ATS2148 switch card is shown in Figure 1.

For application blade processing, DTI offers the ATC5232 featuring dual low-voltage Intel Xeon processors using an 800 MHz System Bus. Paired with the Intel E7520 chipset, it supports up to 16 Gbytes via four ECC-protected double-date-rate (DDR) SDRAM memory modules. Onboard integration features a single 64-bit/66 Mhz PMC site with PCI-X compliancy and two 1 Gigabit Ethernet control channels. To connect to the InfiniBand fabric, this processing blade takes full advantage of a PCI Express x8 interface and supports dual 10 Gbit/s data plane InfiniBand ports (active-active or active-failover) via Mellanox’s InfiniHost III Ex silicon adapter. A complete block diagram of the ATC5232 CPU card is shown in Figure 2. Figure 3 demonstrates how a dual redundant fabric is created by connecting both ATS2148 ATCA switch cards and ATC5232 CPU cards.

InfiniBand in Standard VME Chassis
After two decades on the market, the VMEbus is a proven and reliable industry standard chassis form-factor with many legacy industrial and commercial applications. However, as a parallel bus technology, the VMEbus has its limitations, including the inability to perform high-speed synchronous data transfers. Its other limitation is that it cannot transfer large data packets.
For applications that communicate via broadband between boards within a system, or between system chassis, fabrics like InfiniBand are ideal switched serial bus topologies for data transfer. As with other chassis solutions, VMEbus has standardized the use of such switched serial fabrics in the VITA trade association as the demand for these features and performance is being driven by the military and industrial markets.
The VMEbus Switched Serial Standard (VXS) or VITA 41 specification defines the pin out and backplane requirements for the evolutionary path of switched fabric systems for legacy VMEbus installations. Since VMEbus connectors are parallel, new serial connectors were developed to accommodate the new switched standards. The VITA 41 specification facilitates high speed and large packet data transfers, extends the life of legacy systems and facilitates an industry-wide migration path from parallel buses to switched serial fabrics.
Another important technological enhancement to the VMEbus standard is the addition of an interface chip designed to support two-edge source synchronous transfer (2eSST). This technology doubles the theoretical bandwidth of the VMEbus to 320 Mbytes/s and enables existing applications to increase performance with only minor system software changes.
For example, the VITA 41-compliant VXS1 6U VME PowerPC Single Board Computer from SBS Technologies offers the advantages of InfiniBand technology with the Dual Port Mellanox InfiniHost Adapter, which includes two independent InfiniBand 4X links. The VXS1 is powered by the MPC7447A G4 PowerPC from Freescale, which provides core processor speeds from 500 MHz to 1 GHz and a 167 MHz system bus. The VXS1 has all the bells and whistles one would expect from a processor board that redefines VMEbus performance, including a Marvell MV64460 PowerPC System Controller (Discovery III) bridge chip and two Gigabit Ethernet ports. A block diagram of the VXS1 PowerPC SBC is shown in Figure 4.

The rugged 6U VXS1 SBC hosts one single-wide PMC site that supports a standard IEEE 1386.1 PMC module to easily expand I/O capability. The VXS1 also supports VME 2eSST synchronous protocol, which can provide data transfer rates up to 320 Mbytes/second.
The technology advances of the VITA 41 specification coupled with the addition of 2eSST support have made the VMEbus specification eminently suitable for high-bandwidth military and industrial applications for quite some time. Couple this with InfiniBand and the performance of the latest processors, and the VMEbus specification will continue to be significant for another 20 years.
The feature set, standardization of chassis form-factors and a clearly defined industry standard performance roadmap to 120 Gbit/s data rates will further entrench InfiniBand into performance-driven communication, military and industrial applications, especially as off-the-shelf components can be leveraged to build world-class performance systems. Add to this built-in reliability, robust and widely available software support and it becomes clear why investment into proprietary performance fabrics is becoming a thing of the past.
Mellanox Technologies
Santa Clara, CA.
(408) 970-3400.
[www.mellanox.com].
InfiniBand Trade Association
[www.infinibandta.org].


Adlink
Elma