Harnessing FPGA Performance for HPEC

FPGA-based Front-End Processing with VPX

As the world moves toward high-performance embedded computing (HPEC), faster and more distributed computing power is being put into ever smaller spaces. The use of FPGAs can help by applying both of these advantages to pre-processing data for specific applications while retaining flexibility.


  • Page 1 of 1
    Bookmark and Share

Article Media

Complex systems—such as real-time image processing, electronic warfare systems and software defined radio platforms—require real-time execution of complex math functions. In processing data streams generated from high-speed sensors, the system hardware must be able to handle high-bandwidth data streams processing data flows that can exceed 1 Gbyte per second.

To adequately handle, and subsequently process, data in these compute-intensive applications, a suitable form factor and system architecture are integral to the design. For example, connectors that operate at 6.25 GHz and above are typically required in these high-bandwidth data applications, mandating hardware components such as high-speed backplanes and board interconnects.

The VITA OpenVPX standard, or VITA 65, defines fabric-based architectures useful in building high-performance embedded computing (HPEC) systems. The standard outlines form factors and allows mezzanine standards that enable front-end sensor interfaces to be connected to high-performance processing devices, like FPGAs 

In order to implement this front-end processing, dense, high-speed logic is necessary, and FPGAs provide a means to effectively implement the high-speed logic, memory and DSP functions required to process the sensor data streams. 

Another specification developed by VITA addresses I/O interconnects to FPGAs. This specification, called FPGA Mezzanine Card or FMC (VITA 57), defines an I/O mezzanine card for FPGA carriers. With VITA 57.1, devices like A/D and D/A can be added in front of high-performance FPGAs. 

Considering the data flow and the standards, VITA 57 enables I/O mezzanine connections to FPGAs. VITA 57, therefore, provides the I/O interfaces into the system. Open VPX provides a standard referencing VITA 46 that defines both 3U and 6U VPX modules. The modules, or boards, provide a platform for FPGA processors, and subsequent interconnects. 

Further, system design includes assessing the number of processing elements necessary to handle the data streams. When implementing HPEC systems based on VPX, a multistage processing pipeline is usually required. FPGAs offer a rich combination of hardware resources well suited for these applications. 

FPGAs provide large pools of configurable logic allowing processing circuits to be implemented.  To properly design a system, the size and type of FPGA must be considered. FPGA capacity is described in slices, and includes configurable logic blocks, or CLBs, each of which includes two slices. A logic slice includes look-up tables or LUTs, arithmetic chains, flip-flops, RAM and shift registers. FPGAs can be tuned to the task, to selectable devices with specific capabilities, including the number and to types of slices, and RAM.

Other specific FPGA resources include DSP functions, or DSP slices consisting of multipliers and accumulators, used to implement DSP functions. Block RAM is another type of resource allowing on-chip storage blocks. These resources can be used to implement many circuit designs, including processors, memory systems and I/O interfaces, and are particularly useful in signal processing systems.

Inherent System Flexibility

The extensive hardware resources of FPGAs allow flexibility in system architecture. In designing a system, if the processing pipeline must be extended or needs to become wider, high-speed interconnect ports can be used to share data between FPGAs to increase the data width or the number of stages in the pipeline. When using FPGAs to process the data, several stages of calculations can be done within the FPGA core itself. Newer FPGAs incorporate a logic footprint big enough to implement complete processing blocks, including hardcore ARM processors such as the Cortex A9, as well as other peripheral interconnects, like PCIe cores. I/O is also a consideration with MAC blocks that enable Ethernet implementation, and high-speed serial I/O on the chips now exceeds an impressive 12.5 Gbit/s.

Figure 1 shows a block diagram of a VPX-based Xilinx Dual Virtex 6 FPGA carrier. In this design, two large scale FPGA parts are interfaced to two VITA 57 FMC sites. The FPGAs are cross-connected with GTX x4 data paths allowing them to directly exchange data. The board enables multiple groups of data lanes or data paths to be connected to the VPX backplane via a PCIe switch using PCIe Gen 2. To support control plane functions and general processing, a P2020 SoC Freescale Power PC processor is also part of this particular design.

Figure 1
6U VPX Front End Processor Block Diagram.

Such a design can be used to build digital filters and execute Fast Fourier transforms in hardware. GTX I/O ports provide data paths that can support data rates of 6.25 Gtransfer/s per lane. Four data lanes can transfer more than 2 Gbytes/s between FPGAs. Remarkably, a 6U by 160 mm PCB can contain all of this hardware, as shown in Figure 2.

Figure 2
Xilinx Virtex 6 Front-End Processor.

Why Use FPGAs?

FPGAs typically provide more processing per watt than conventional processors. They achieve more computing speed per unit of power compared to CPUs, DSPs and GPUs—typically ten times on 16 integers 50 GOPS/watt, as noted in Figure 3. Given this processing efficiency and that the processing configuration is programmable, FPGAs can easily and cost-effectively be tuned to application-specific compute requirements.

Connecting and Moving Data

VPX uses pairs of differential signals to transmit and receive serial data between devices. A single differential pair is called a lane. Groups of lanes can be used to form ports, with scalable bandwidths. In VPX, a group of lanes is called a pipe, and can be configured from one to 16 lanes. Pipes can be grouped into planes, where VPX defines data, control, expansion, management and utility planes.

Data paths are implemented using specific protocols on interconnecting planes. The data plane is used to move primary data, the control plane for control communication, and the expansion plane for local communication of high-speed data. The physical interface and protocol will vary depending on the function.

In moving data, the data rate is of interest. Data planes using a PCIe Fat Pipe can move data at 2.5, 5.0 or 8.125 Gbaud depending on the generation of PCIe, Gen 1, Gen 2, or Gen 3. A PCIe Gen 2 Fat Pipe can move 2 Gbyte/s point to point. The control plane typically uses Ethernet based on serial interconnect, and can be 1000 Base-Bx or 10GBASE-Kx

The expansion plane can be PCIe, 10Gb Ethernet, Rapid I/O or Aurora. Both PCIe and Aurora use 8b/10b encoding. Aurora is a useful protocol for FPGA to FPGA interconnect because it is lightweight and low latency. It can operate at 1.25, 2.5, 3.125, 5.0 and 6.25 Gbaud. The Aurora cores are standard interfaces supported by FPGA tool chains (Figure 4).

Figure 4
Aurora for low latency efficient FPGA data exchange.

A VPX backplane can be used to connect the VPX front-end processor cards (FEPs). The OpenVPX specification allows the data plane and control plane interfaces to be connected through the backplane where high-speed serial point-to-point interconnects are made.

Profiles are used to define and organize the connections made in the backplane. Figure 5 shows a mesh connection on the Data plane. Open VPX makes use of backplane and slot profiles to define interconnect topologies. Various topologies, including star and mesh, can be implemented. The sample view of a six-slot OpenVPX Backplane Profile in Figure 5 allows up to five FPGA-based cards to be connected together. The profiles on the right define the I/O paths associated with the slots in the backplane.

Figure 5
Six-slot Open VPX backplane.

Getting In and Out of the System

Moving data in and out of the system can be done with various interfaces, either taken from the front of the board or via the backplane out the rear of the system. First, a suitable interface must be chosen.

If the data is coming from a sensor, it may be analog or digital. If it’s analog, the data could be interfaced via CoAx to an A/D converter. If it’s digital, from a camera for example, it could run via Ethernet using Gb or 10 Gb Ethernet. For external network interfacing, good old 1000BT Ethernet can be used, directly from the front end processor, a switch card or from a control processor.

In applications that require off-load and high-rate external transfers, interfaces like 10 Gb Ethernet can be used, either with a copper or fiber physical medium. PCIe or Aurora interfaces can be implemented using the quad small form factor pluggable (QSFP)+ standard. 

Figure 6 is an example of a QSFP+ XMC mezzanine that shows the QSFP+ connector. QSFP+ can also be implemented on an FMC allowing external high-speed data to be directly connected to the FPGA resources. Using QSFP+ links, external data paths supporting up to 40 Gbit/s can be achieved.

Figure 6
QSFP+ interface.

Since system designs can be customized and tailored to fit the required hardware architecture, FPGAs become a highly flexible compute resource. High-performance embedded computers using FPGAs implemented on VPX can be used in a variety of compute-intensive applications that require a stable platform. Because FPGA compute operations per second per watt exceed traditional processor architectures, these systems are not only a reality, but can be developed cost-effectively.

Coupling FPGAs to the FMC sites allows sensors to be connected to the FPGAs. This concept, combined with VPX backplanes that interconnect larger format cards, enables more capable systems to be built. External interfaces, such as Ethernet, can be included on VPX cards and switches to allow system elements to be connected within networks.

Such designs implemented on VPX provide an open architecture solution specifically suited for high-performance digital signal processing. Since FPGAs can be reprogrammed, designs can be embedded and quickly modified. Initial investments can be preserved by updating system firmware and software, thereby improving overall system performance and life cycle.

Elma Electronic
Freemont, CA
(510) 656-3400