TECHNOLOGY IN CONTEXT
ATCA: Telecom and Beyond
Balancing Line Rate and Security: FPGA-Based MicroTCAs Enable 10-Gbits/s Network Traffic
The flexibility and scalability of MicroTCA and FPGAs keep traffic flowing quickly, smoothly and securely in 10GbE networks.
ROB KRAFT, ADVANCEDIO SYSTEMS
Page 1 of 1
The advance of 10 Gbit Ethernet networks and the increased data rates have provided both opportunities and challenges for developers of high-performance, real-time applications. The number of Internet subscribers and their use of bandwidth continues to rise in response to richer video and image content. Consequently, service providers need a bigger, 10 Gbit/s pipe, which 10GbE can provide. This means aggregating 1 Gbit links into 10 Gbit links, and replacing single 10 Gbit links with multiple 10 Gbit links. Although users expect consistent, smooth access to content, higher bandwidths lead to a greater likelihood of performance bottlenecks. Therefore, network performance must be monitored and optimized. Meanwhile, since security attacks have become more sophisticated, more bandwidth must be monitored and deeper analysis of packets is required.
Because the processors they’ve been using in network performance monitoring/optimization and security systems can’t keep up with the data streaming from a 10GbE pipe, service providers’ applications won’t function with existing systems. Therefore, they must scale the internal bandwidth of these systems in step with these changes. The scaling they need cannot be done using either the current 1GbE hardware employed for security, packet inspection and load balancing functions, or by adding more of the general-purpose processors often used in that hardware, since the congestion problems don’t scale in a linear fashion.
MicroTCA provides a platform that can accommodate this scaling for performance monitoring and security applications. With their switched fat-pipe backplanes and ability to add in more AMC cards as required, MicroTCA architectures have the flexibility and capability to scale when the processing and data bandwidth exceeds the capacity of traditional server/line-card solutions.
Although the MicroTCA switched backplane architecture can provide what may be thought of as the expandable highway infrastructure to accommodate more traffic, there remains the problem of building new equipment with sufficient capacity to service that mass of traffic without simultaneously clogging it and creating a bottleneck during servicing. FPGA-based AMC cards are a key enabling technology for this equipment. FPGAs, when programmed with suitable algorithms, can perform the myriad inspecting, filtering and manipulation tasks on packets flowing by at 10 Gbits/s, a rate that would overwhelm general-purpose processors. The FPGAs may be thought of as “service stations” that “service” the packets and keep them moving along at these high rates.
Network performance monitoring/optimization applications investigate network performance patterns and identify potential architecture bottlenecks and equipment problems. These applications need wire-speed packet inspection and manipulation to identify usage trends and measure traffic flows to plan architecture upgrades and build-outs—for example, observing how new services are being handled by existing equipment. They also need rapid, real-time identification of not only major performance problems, but also more subtle ones that might otherwise go undetected, leading to frustrated customers who take their business elsewhere.
Security applications protect the network and its subscribers from the ever-increasing number and sophistication of cyber threats and attacks that consume network bandwidth and jeopardize personal information and transaction security. In order to detect such threats, the content of all data packets must be inspected. This is a computationally intensive task at 1 Gbit/s, let alone at 10 Gbits/s. Even the act of offloading processing of the 10GbE transport protocol, which could be provided by a commercial network interface card (NIC), is not enough to help the CPU keep up with the application. After the incoming data leaves the 10GbE interface and arrives on the local fabric or CPU, it is already overwhelming that fabric or CPU (Figure 1).
The alternative is to send out the incoming data to large processing farms, but the space and power required, as well as the cost of these, are prohibitive. Furthermore, this alternative is not even available if the data rate already would overwhelm the fabric used for distribution to and communication within the farm. Instead, the solution lies in doing more at the pipe itself—including load balancing, time stamping and packet inspection—before the incoming data ever leaves the 10GbE interface. This can be achieved by replacing the NIC with an FPGA-based 10GbE interface that supplies both NIC and packet processing offload functionality. For higher-bandwidth networks, a MicroTCA 10GbE interface appliance powered by FPGAs can be placed between the 10 Gbit pipe and the existing 1GbE hardware. The appliance provides offload and load balancing functionality to leverage existing hardware. In the highest-bandwidth networks, including future 40 Gbit/s and 100 Gbit/s systems, the same appliance can perform NIC and packet processing functions, as well as performance monitoring, by the addition of more cards. The appliance thus can be scaled with increasing bandwidth demands (Figure 2).
The FPGA Advantage
There are several main advantages to using FPGAs as “service stations” in this new class of compute-intensive equipment, the MicroTCA appliance. These advantages include the tasks they can perform before incoming data leaves the 10GbE interface, the optimized performance FPGAs can bring to these tasks, and the fact that FPGAs can be reprogrammed for new tasks as traffic evolves.
Examples of some of the tasks themselves are load balancing, line-rate packet inspection, packet slicing and filtering, and the creation of multiple DMA queues.
Load balancing is done whenever the data rate exceeds the sustained capability of any one CPU, and the processing that occurs in each of several CPUs may require different amounts of time to complete. In load balancing, a single 10GbE interface absorbs the data that must be distributed among CPUs in a box—such as a MicroTCA box made up of AMC modules—to perform the processing. The FPGA-based 10GbE interface can function in a control loop where feedback from the CPUs is used to determine where to send the data next in order to balance the load. An open-loop scheme may also be used, in which packets are inspected and the packet type or content determines the destination CPU. Load balancing is used in both network performance monitoring and security applications.
One of the challenges in fully utilizing multicore Pentium processors is the fact that a typical NIC will dump data from multiple sockets into a single queue. This means that one core on the CPU needs to be the interface to the NIC and distribute data to the others, which can lead to the single-core interface getting swamped. One way to do load balancing that solves this problem is to create multiple DMA queues with an FPGA-based solution—for example, one that is based on the particular socket or socket type—so that individual cores can access the data independently, and the first core is freed up from the interfacing task. This technique is common to virtually any multiprocessor telco application or system.
Emerging, increasingly sophisticated attacks and threats require correspondingly sophisticated security measures. Statistics obtained by merely sampling packets in flows are insufficient to detect the more insidious modern threats. Instead, the content of every single packet must be inspected, and at line rates. This compute-intensive task, known as deep packet inspection, requires hardware acceleration. Although deep packet inspection cannot be done on a GPP, packets can be inspected at 10 Gbit/s line rates in an FPGA. The detection of anomalies can be flagged for more intense analysis, sending only the ones that really need further analysis by GPPs, so that the output data rate flowing into the GPP is much lower than the 10 Gbit/s incoming data rate.
Both packet slicing and packet filtering can achieve data rate reductions for network performance monitoring/optimization systems and security systems, such as those that provide intrusion protection. In packet slicing, the headers are kept but varying amounts of payload are sliced off. In packet filtering, only packets matching certain criteria—such as a particular IP address, size, or protocol—are passed through to subsequent processing elements.
FPGAs can bring optimized performance to these tasks. They bring massive amounts of configured logic gates operating in parallel to bear on incoming data, giving them the ability to process packets on the fly at line rates. In contrast to other programmable devices like multicore CPUs and specialized ASICs, FPGAs provide unmatched, extremely tight control over all aspects of processing operations. They can therefore be programmed as hardware processors optimized for the performance of challenging tasks.
Furthermore, FPGA-based appliances can easily be reprogrammed for new tasks as traffic evolves. Such new tasks may consist of either new functionality added to existing tasks—such as load balancing based on an emerging traffic protocol—or additional functionality in the form of entirely new tasks, such as redeploying a network security appliance as a performance monitoring device for network optimization, or as a multi-protocol gateway.
The MicroTCA Advantage
The growth of MicroTCA in telco and other high-performance real-time applications has been propelled by the same types of advantages that FPGAs bring to box-level designs: small size, scalability, the flexibility to easily add new or different functions by changing or replacing up to 12 cards on a single backplane, and the ease of customization via changing the mix of cards in the chassis. The MicroTCA architecture facilitates the design of smaller, lower-power, yet highly scalable plug-in appliances that provide the hot-swap and shelf management capabilities found in their larger ATCA cousins. MicroTCA’s high-speed, protocol-agnostic, switched backplane meshes well with the needs of packet-switched communications. It has built-in hot-swap and redundancy via the backplane, which can run two fabrics to each card so that each one forms a separate multi-hub controller. This provides the high reliability needed in security systems.
These characteristics, when combined with its small 2U form factor and high communication bandwidth (from 40 Gbits/s to 1 Terabit/s or more), make MicroTCA an excellent design platform for building the FPGA-based 10GbE interface appliance described above. A MicroTCA-based appliance can start relatively small, with empty slots that can be filled with additional cards of the required type as needed. This architecture meshes well with the need to deploy these appliances in a variety of places in the network, at the edge as well as in the core. For example, the appliance can be placed at connections between wireless and wired networks, at various points within a wired network, and wherever enterprise networks interface with service provider networks. A single appliance architecture can be used in all of these locations, but scaled by the capacity needed. For example, in locations where small systems are required, only a few slots can be filled, and in locations that need higher bandwidth capacity, more slots in the box can be filled with the appropriate cards.
The V3021 from AdvancedIO Systems is an AMC module suited to network security and performance monitoring applications requiring wire-speed packet inspection and manipulation (Figure 3). The V3021 is equipped with two optical 10GbE interfaces, a Xilinx Virtex-5 FPGA, and multiple, large, independent, high-speed memory buffers to handle high-speed packet capture, filtering and processing. The module runs AdvancedIO’s expressXG framework that accelerates the development of high-bandwidth telecom applications. The framework, consisting of FPGA firmware and host software, abstracts the underlying FPGA hardware interfaces, board-level details, and control and set-up functionality, and provides key packet processing building blocks for high-bandwidth application development.
The higher bandwidth of 10GbE networks brings challenges such as performance bottlenecks, which require service providers to monitor and optimize network performance. At the same time, the increasing sophistication of security attacks demands the monitoring of even more bandwidth and requires deeper analysis of packets. But existing equipment cannot run these applications at higher network speeds, or provide the scaling that service providers need. The flexibility and architectural scalability of the MicroTCA architecture, combined with the flexibility and performance scalability of FPGAs, make possible a 10GbE interface appliance that performs the line-rate monitoring or security functions.