10 Gigabit Ethernet Solutions
Consolidating Network Fabrics to Streamline Data Center Connectivity
Cost and performance issues are pushing developers to seek convergence of interconnects in data centers. Both 10 Gigabit Ethernet and InfiniBand appear to have potential, but the demands are militating against Fibre Channel.
DAN TUCHLER, MELLANOX TECHNOLOGIES
Page 1 of 1
Data centers have evolved over time to accommodate a variety of devices and interfaces. For storage, Fibre Channel is firmly established as the de facto standard. For server-to-server connectivity, both Ethernet and InfiniBand are used. And Ethernet is the common language for connecting routers, desktops, WANs, LANs, and other devices. While data center managers can and do implement all three technologies—InfiniBand, Fibre Channel and Ethernet—it’s not a pretty picture. For example, imagine servers populated with three different adapters from three different vendors, running three different drivers. It’s an expensive situation that’s difficult to maintain, so data center managers are looking at some kind of consolidation to simplify the infrastructure and reduce costs (Figure 1).
Fundamentally, data center architects must meet several different connectivity requirements, but they also want to maximize the performance of their server investments while minimizing the risk of unproven technologies. Before we go into the requirements for network fabric consolidation, let’s take a brief look at the three main connectivity choices: Ethernet, InfiniBand and Fibre Channel.
Ethernet and the TCP/IP protocol suite have become so broadly deployed that they are commonly used across the WAN, across vast disparities in the computing power of the attached end devices, and over a dizzying array of intermediate network devices. This breadth of application comes at a price, though—engineered for flexibility and ubiquity, Ethernet has required some tradeoffs that preclude its full optimization for more specific uses. And TCP has been extended so much that it is no longer easy to implement in a dedicated device (more on that later).
Ethernet convergence at speeds below 10 Gbits/s may be useful only for lower performance applications. 10 Gigabit Ethernet (10GbE) Network Interface Cards (NICs) initially cost thousands of dollars each, but prices are expected to come down in 2007. Ethernet standards are governed by the IEEE, which has recently decided that the next speed step will be 100 Gbits/s, and is expected to ratify that standard in 2010. Recently ratified options for running over “Augmented Cat 6” cable will enable products using the familiar RJ45 connector in 2007. 10GbE can run today over CX4 copper cable or fiber optics.
InfiniBand is a standards-based, low-latency, high-bandwidth interconnect, created specifically to address the problem of connecting servers and storage in close proximity with each other. While TCP/IP is general and broad, InfiniBand transport is optimized for low-latency server-to-server and server to storage links and is commonly implemented in silicon to maintain high speed while offloading the host server.
Products using 20 Gbits/s have been deployed in production networks, and 40 Gbit/s products are expected in 2008—about the same time that server PCIe slots will be upgraded to the same speed. 12X InfiniBand switch-to-switch connections will be available in the same time frame at speeds three times faster, so 120 Gbit/s connections will be deployed in 2008. Very large InfiniBand clusters have been deployed, and the technology is now considered mature and low-risk for clusters. The InfiniBand standard is governed by the InfiniBand Trade Association (http://www.infinibandta.org/home) and standards have been completed and ratified for speeds from 2.5 to 120 Gbits/s using either copper cable or fiber.
Fibre Channel is used to connect to storage. In networking a dropped packet can be retransmitted, but in storage, lost data and corrupt databases are unacceptable, so buyers are very conservative. This may explain why storage connectivity changes more slowly than any other. Fibre Channel devices are only now moving from 2 Gbits/sto 4 Gbits/s. It is interesting to note that many leading Fibre Channel vendors are investing in iSCSI and InfiniBand products, and no convincing consolidation strategy for Fibre Channel has been proposed.
InfiniBand adapters and switches are the most cost-effective and highest performance of the three options. Table 1 summarizes the characteristics of these three technologies.
Server-to-Server Connectivity Requirements
Now, let’s look at each data center connectivity requirement in more detail, starting with server-to-server connections. A growing number of applications rely on low-latency, high-bandwidth messaging among a group of servers. For example, cluster applications are optimized to create supercomputer power at a fraction of the cost, and are being used to solve specific problems in fluid dynamics, financial modeling and other areas. Database clusters and virtualized server farms also benefit from server-to-server optimization, because the savings in data movement between servers translates into more computing power for the money.
InfiniBand shines in these applications for several reasons including bandwidth, latency and especially scale. InfiniBand uses high-performance techniques including remote direct memory access (RDMA) technology, so it typically bypasses the remote host CPU and OS kernel, increasing processor availability and creating maximum efficiency in transferring data from one server to another with latency as low as one microsecond. The switching architecture that has been used to scale InfiniBand networks to thousands of nodes is called a CLOS fabric, or full bisectional bandwidth architecture.
Ethernet solutions for node-to-node communications have been roughly modeled after InfiniBand. These solutions duplicate the concept of offloading the transport layer to hardware in an attempt to gain bandwidth and reduce load on the CPU, while also reducing latency to single-digit microseconds. This technology is often called TCP Offload and is done via a “TCP Offload Engine,” or TOE. When low-latency RDMA protocols are layered over TOE, the standard is called iWARP, and this set of technologies is early in maturation, acceptance and OS support.
No large-scale 10GbE clusters have been deployed yet, because high-end Ethernet switches are optimized for Telco and ISP environments and are much too costly and complex for direct server connectivity.
There are several problems with TOE, starting with the notion that TCP is so broadly deployed and has so many special cases that it is very complex and difficult to get it right and produce a stable product: TOE failed with earlier generations of Gigabit Ethernet, 100 Mbit/s Ethernet, and even 10 Mbit/s Ethernet. The second issue is that CPU processing power keeps increasing and eliminates the need for TOE, especially in this era of multicore processors. Also, TOE uses external memory on the NIC to support “state”—driving up NIC complexity and cost. Finally, the Linux community is also opposed to TOE, in part because the proprietary off-loaded stacks prevent security updates and open-source review and support.
Another challenge with Ethernet is that running it at 10 Gbits/s over copper cable presents a dilemma—one that didn’t exist when previous Ethernet speeds were developed. Latency is now a primary concern, but the physics of driving data at such high speeds on an 8-conductor cable are not simple. Going from a computer through one switch to another computer requires four, 10G BaseT circuits and currently has latency of more than five microseconds. In all but the most trivial networks, there are more switch-to-switch hops, adding even more latency. So for latency-sensitive applications, the user is left with a choice of fiber optics or CX4 cable, the same choices as for InfiniBand. In a blade server backplane, traces can be used with either InfiniBand or Ethernet.
No vendor has yet proposed using Fibre Channel to interconnect compute nodes.
There are several choices for storage connectivity, with new technologies emerging to enable convergence and backward compatibility. Native InfiniBand storage solutions now entering the market provide the best performance by harnessing the higher throughput of 20 Gbit/s InfiniBand. User APIs are defined using SCSI commands to minimize changes to the applications. Fibre Channel over InfiniBand allows connectivity to existing storage devices, utilizing an external gateway to make a common conversion point for a large number of compute nodes. iSCSI over InfiniBand can be used in a similar way.
Ethernet, using a combination of 10 Gbit/s speeds and iSCSI protocols, is starting to gain notice in the storage world and provides an acceptable mix of performance and convergence. However, the TOE engines required to make iSCSI work have the drawbacks previously described, and a TOE engine will be required on each node needing access to storage.
Fibre Channel for storage has been proven, is stable and mature, and is the low-risk choice. But with higher costs, a slower performance roadmap, and little prospect for convergence, it is easy to see why Fibre Channel vendors have been embracing iSCSI and InfiniBand in preparation for a possible decline in Fibre Channel products.
I/O and Connectivity Requirements
Ethernet is the champion for connectivity to infrastructure outside of the data center. However, a network deployed using Ethernet and TCP/IP as the convergence fabric would present a conflict. To get good node-to-node performance, iWARP protocols over TOE are needed. Similarly, to get to storage, iSCSI over TOE is used. But to get to the broader network, the TCP stack must be a universal, mature, hardened stack to avoid the large number of security threats that the broader network can expose. It’s naïve to think that firewalls will stop all the threats—multiple defense perimeters are a standard approach. The Linux community has come out against TOE because it creates vendor-specific, closed TCP implementations that are hard to maintain, and data center architects are loath to run separate TCP/IP stacks to connect inside and outside the data center.
InfiniBand has been optimized for servers in close proximity, but not for wide deployment. Thus a gateway would be needed to reach Ethernet backbones and possibly a second one to reach Fibre Channel storage. These gateways add cost and complexity, but typically the total cost of an InfiniBand network is less expensive than alternatives. Current products have reached the point of maturity where this option merits serious consideration.
Where We Stand Today
To sum up today’s options for data center fabric consolidation, both Ethernet at 10 Gbits/s and InfiniBand at 20 Gbits/s are reaching a level of completeness that makes them worth considering.
Ethernet offers iWARP over TOE for server-to-server connections, iSCSI over TOE for storage, and a long list of protocols for connectivity. As iSCSI is relatively new, in many cases a gateway is required to connect to legacy storage. Latency is reaching single digits. For latency-sensitive applications, 10G BaseT is too slow and CX4 or fiber optics may be required. Scale is still an issue for building full bandwidth (non-blocking) networks. For 24 ports or less, reasonably priced solutions exist; but for larger networks, it’s necessary to use inappropriately large switches, and this solution has rarely if ever been deployed. Creating networks using blocking switches defeats the purpose of higher-speed gear, so for medium or large performance-oriented groups of servers, both Ethernet and InfiniBand may still be needed.
InfiniBand offers native server-to-server connectivity at the lowest possible latencies, while adhering to published standards and open source software practices. Storage options include native storage over InfiniBand, Fibre Channel or iSCSI over InfiniBand using gateways. Using IP over InfiniBand, any existing IP-based application can be supported. Scale is a strong point for InfiniBand—already proven in 4,500 node networks—and InfiniBand switching infrastructure is now accepted and mature.
In a blade server environment, using InfiniBand for the backplane and switching, combined with gateways and external switches, provides the most cost-effective and highest-performance total solution. Further, iSCSI and SCSI interfaces provide application compatibility. In fact, in a virtualized environment, applications are not even aware they are running over InfiniBand. Fibre Channel is not a candidate for network convergence, and may be replaced over time by Ethernet- or InfiniBand-attached storage in a converged environment.
As data centers scale their compute resources, the prospect of putting three different I/O adapters into each blade in a blade server or each separate node, is looking less appealing. Fortunately, there are some emerging options for simpler and less costly consolidated solutions.
Santa Clara, CA.