TECHNOLOGY IN CONTEXT
Mezzanines and Small Form-Factor Boards
Right-Sizing FPGA Mezzanines Expands Application Space
FPGA mezzanine modules are being offered for a broader base of application and user needs, typically at a much lower price point.
JOE PRIMEAU, ACROMAG
Field-programmable gate arrays provide the equivalent of custom hardware but require less development and at a significantly lower cost. Manufacturers of FPGA-based modules have thus leveraged open-architecture bus board standards such as VME, PCI and PMC to serve high-end signal- and image-processing applications. The rigorous requirements of these and other sophisticated applications such as software defined radio (SDR), however, have relegated field-programmable design to the high end, where large FPGAs, fast uplink/downlink converters and large, high-speed memories have driven the price of FPGA modules into the five-figure price range.
Yet many applications would benefit from the use of FPGA-based board-level products. As a result, board manufacturers have begun offering FPGA modules for a broader base of application and developer needs, typically at a much lower price point (Figure 1). They are becoming popular for an ever widening range of applications. Well suited to working with a CPU or as stand-alone controllers, they are also the most reasonable design approach when obsolescence is an issue. Moreover, a rich trove of advanced tools and a wealth of IP cores help speed the development process.

Right-sizing FPGA modules for use across the gamut of embedded computing applications involves balancing variously sized FPGAs—ranging from roughly 5K to 50K logic cells each—with appropriate mixtures of memory, bus interface and I/O components.
Nevertheless, selecting the right FPGA module for a particular design can be difficult for newcomers to this product type. Successfully putting the selected module to work, in turn, will depend on the scope and quality of the tools provided in the board support packages (BSPs), or engineering design kits, provided by board vendors.
At the very least, a BSP should provide example VHDL code, both compiled and in source form, for everything the FPGA touches, including memory, I/O, buses, DMA and interrupt controllers. Having the source code on CD is essential. But when code is pre-installed in the module’s onboard memory, designers can exercise the module out of the box and confirm its proper operation without writing any new code. Many new designs allow code to be downloaded directly to the FPGA. If new programming downloaded to the FPGA doesn’t work, the developer can power down and reload the original programming from flash memory.
Widening the Application Field
A broad variety of possible FPGA, memory and I/O combinations can be integrated into the physical envelope of a mezzanine board standard such as Industry Pack or PMC. Modules can be generally characterized by tiers of complexity—as defined by the capacity of their FPGA, types and availability of onboard memory, speed and type of bus interface—as well as specialized technologies supported such as digital signal processing (DSP).
The application’s data processing speed requirement may also dictate the choice of FPGA chips and modules. For relatively slow applications, device families such as Altera Cyclone and Xilinx Spartan are adequate, while higher-speed requirements dictate devices like an Altera Stratix or Xilinx Virtex 2. Applications requiring the utmost in FPGA speed call for the latest technology, such as the Xilinx Virtex 4 or 5 family or an Altera Stratix III. At the module level, the application also dictates the amount and speed of memory provided for the FPGA, the I/O and bus structures that are used and other factors.
Module designs can be divided into basic FPGA modules, medium-powered modules with expanded memory and bus support, and finally larger, more powerful modules supporting the latest high-performance FPGAs, with a full array memory and high-speed bus support.
Entry-Level FPGA Processing
A basic module with an FPGA containing as few as 5K logic cells, some simple digital I/O and a modicum of SRAM can serve many basic applications. They are usually offered in the form of a small mezzanine module with a three-figure price tag. Up to about 10K or 20K logic cells, FPGAs such as the Altera Cyclone II and Xilinx Spartan families are appropriate.
These modules are well suited for classic machine control applications, such as multi-axis controllers for satellite downlink systems. They also excel at collecting data, making control decisions based on this data and sending status information to the CPU, a common scenario in many machine control and test applications. Using the FPGA’s high-speed, localized processing capacities is common in applications where transferring data across the bus to the CPU would be too slow for proper control.
Basic FPGA modules are particularly suited for applications requiring fast processing of collected data, such as serial protocol conversion applications. Here, data must be collected at a relatively high speed (100 - 200K baud), and managed through the FPGA’s control lines (such as clear-to-send and ready-to-send) from the I/O or “field” side of the board. As data is received, the FPGA can strip packet information down to its basic text and pass this pre-processed data across the bus to the CPU. Applications appropriate for a basic, low gate-count FPGA module typically require little memory and relatively slow memory external to the FPGA (Figure 2). Many can get away with 64K x 16 of SRAM or less.
As far as bus bandwidth is concerned, the needs of small FPGA applications are also minimal. The 8 MHz or 32 MHz operation of an 8- or 16-bit Industry Pack bus, for example, is usually sufficient for an FPGA-based mezzanine module to effectively interface to a CPU board or carrier board. Likewise, the demands on interfaces to the real world beyond the computer are not very high, and basic digital interfaces such as RS-422 or RS-485 and various TTL-level interfaces are usually sufficient.
Toward the higher end of this entry-level tier, though, and into the mid-performance FPGA arena, faster digital I/O interfaces are sometimes brought to bear, usually in the form of low-voltage differential signaling (LVDS).
Many basic FPGA implementations dedicate at least some capacity to DSP functions and DSP cores are readily available. Moreover, a relatively small FPGA can comfortably accommodate as many as 26 18x18 multipliers on-chip. In many applications, the FPGA’s DSP portion is dedicated to pre-processing tasks and does not pass much data on to the CPU.
Even the very simplest FPGAs typically provide at least a fixed clock and one or more phase-locked loops (PLLs) to create internal and external clocks, giving the flexibility of multiple timing domains. Multiple clock managers and PLLs let the designer optimize timing domains for actual application characteristics.
There are further advantages, as well. An FPGA such as a Cyclone II, with four to six PLLs on-chip, can be programmed to multiple clock frequencies. This is necessary to handle the different data rates that the FPGA must accommodate. Even in a relatively simple application, the FPGA might have to generate external clocks to synchronize serial inputs running at various baud rates while simultaneously communicating with the bus at 32 MHz.
Basic FPGA Modules in Action
The flexibility of an FPGA-based design is particularly useful in automated test and in-circuit diagnostics, allowing new configurations and diagnostic routines to be quickly downloaded, either to on-module memory by means of a bus or to flash memory via a JTAG port. One Acromag customer uses a basic FPGA module in a roll-up tarmac tester for commercial aircraft, which puts critical aircraft systems through their paces and monitors the process: a simple “go/no-go” scenario. Although this application requires very tight real-time coordination, the data collected by the CPU from the FPGA module is minimal.
In another customer application, a small FPGA module serves to simulate the different pieces of third-party hardware that comprise a satellite. At one time the module might simulate a power supply subsystem from one manufacturer in order to test its interaction with a solar panel controller from another manufacturer, or an RF subsystem from yet a third. Some time later, with new programming downloaded, the same module can be used to simulate additional subsystems.
The benefit of hardware simulation in this application is fairly obvious, since it’s not advisable in large, expensive systems to simply plug subsystems together just to see if they work. Using even a relatively slow FPGA here, such as a Spartan or Cyclone II family device, is fast enough to deliver real-time simulation performance.
Into the Midrange
Large data sets and high-speed I/O are the primary reasons for moving to midrange FPGA modules (Figure 3). These typically utilize FPGAs with between 20K and 34K logic cells and provide a step up in bus bandwidth, memory and I/O.

While small FPGA modules can comfortably handle a pre-processing application, those applications that must pre-process a lot of data and transfer it across a bus usually call for a medium-sized FPGA such as a Xilinx Virtex 2 or Altera Stratix.
Midrange FPGA applications typically require 256K x 36 of SRAM or more. Real-world interfacing is also frequently more complex, with high-speed analog signaling often added to the I/O mix.
Further, the amount of DSP processing being conducted is often significantly higher. This is a Matlab environment where pre-processing can contain multiple stages of multipliers, FIR filters and other DSP functions. As many as 56 18x18 multipliers can be comfortably incorporated into a medium-sized FPGA.
Architectures of medium-sized FPGA modules may contain several modifications to handle the higher demands on memory and speed of more complex applications (Figure 3). The simple Industry Pack mezzanine bus can be replaced by PMC and its 32-bit, 33 MHz PCI bus, for example. A PCI bus interface chip can be incorporated to provide DMA service between the FPGA and bus for greater transfer efficiency. The module’s SRAM may be dual-ported between the FPGA and PCI chip to optimize data movement. Once the FPGA post-processes its calculations and sends them on to SRAM, the PCI bus controller picks up the data and transfers it to the host under DMA control, independent of the FPGA.
Midrange FPGA Modules in Action
In one customer example, a midrange FPGA module takes the inputs from four audio signal sources, performs a series of algorithms and post-processes the remaining data to be sent back out as analog outputs to a set of headphones. This application requires both higher memory capacity and a higher data rate, with processing and I/O operating simultaneously and continuously.
Sonar systems provide another scenario where medium FPGA modules can be effective. The example module can accommodate a high-speed analog front-end for an FPGA, which pre-processes the data to remove noise, strips off extraneous data, repackages the remaining signal and forwards it to SRAM. At this point, the PCI bus controller picks up the data and transfers it by DMA to the CPU for display or further analysis.
Midrange FPGA modules are also appropriate for sophisticated simulation systems, such as flight simulation systems, which require a lot of high-speed control capability as well as the DMA capability to accommodate real-time image display.
Top-Tier Capacity and Speed
Currently, large FPGAs provide over 35K logic cells, such as members of the Xilinx Virtex 4/5 family and the Altera Stratix III family. What differentiates this top tier is the degree of analysis performed by the FPGA-based module, which is called on to conduct some complex processing tasks onboard rather than passing them off to a CPU (Figure 4). FPGAs in this realm can comfortably provide as many as 192 18x18 multipliers on chip.
High-end FPGAs are complex enough to host a soft CPU core in addition to other onchip processing resources. As with previous steps up in the complexity tier, modules containing such an FPGA must offer greater capabilities to handle more demanding applications.
High-end image and communications processors and sonar/radar downlink analyzers, for example, may demand as much as 256K x 36 of SRAM. When demands are particularly high, faster DDR DRAM can be used. PCI bus width and frequency may have to expand to 64 bits and 66 MHz. For the most demanding applications, a step up from PCI to the PCI-X bus may be required, or even a move to a serial, point-to-point link such as PCI Express.
In the past, because of their cost, field-programmable modules have been limited to the tip of the performance pyramid. But now, with the current trend toward right-sizing, the application field for FPGA-based design has broadened considerably. What’s required to fuel this revolution in flexibility are modules with appropriate combinations of FPGA, memory and interfaces to match real-world application needs.
Acromag
Embedded Board Group
Wixom MI.
(248) 624-1541.
[www.acromag.com].


Adlink
Elma