BROWSE ARTICLES BY TECHNOLOGY

DIGITAL EDITION

RTC Magazine Digital Edition

INDUSTRY NEWS

QUICK DOWNLOADS

RTEC10 is an index made up of 10 public companies which have revenue that is derived primarily from sales in the embedded sector. The companies are made up of both software and hardware companies being traded on public exchanges.

COMPANY PRICE
(USD)
CHANGE
 
Adlink
1.22
-1.781%
Advantech
3.02
-0.889%
Concurrent Comp
3.58
-3.241%
Elma
474.00
0.173%
Enea
5.31
-1.918%
-   Interphase5.130.000%
-   Kontron0.00
Mercury Comp
14.04
1.299%
Performance Tech
1.83
-2.032%
PLX
3.22
-0.617%
Radisys
7.39
0.271%
52 WK HIGH 52 WK LOW MKT CAP (Million USD)
1.24
1.15
167.08
3.06
3.02
1,668.57
3.66
3.51
32.95
474.00
474.00
108.30
5.34
5.00
93.75
5.155.1235.37
0.000.000.00
14.05
13.69
429.77
1.83
1.72
20.36
3.25
3.20
143.40
7.52
7.23
204.97
RTEC10 Index: 603.86 (-4.75%)
RTEC10 is sponsored by VDC research

TECH FEATURE

Reconfigurable Computing

FPGAs and Multicomputers: A Formidable Blend

There are unique benefits to mixing algorithm-specific FPGA schemes with general-purpose multicomputing. An effective solution must marry the design flows of these diverse architectural approaches.

MARK LITTLEFIELD, MERCURY COMPUTER SYSTEMS

  • Page 1 of 1
    Bookmark and Share

In embedded multicomputer systems the use of FPGAs alongside more traditional microprocessors and digital signal processors (DSPs) has moved from novelty to necessity. For certain classes of problems, field-programmable gate arrays (FPGAs) deliver dramatically better performance than microprocessors. And while custom ASICs also offer this performance advantage, FPGAs deliver additional flexibility because they are programmable.

High performance and operational reconfiguration form a compelling argument for the use of FPGAs in modern embedded systems. However, the difficulties in writing code for FPGAs and integrating FPGA-based modules into larger multicomputer systems tends to temper developers’ desire to use them. There remains a dichotomy between FPGAs’ high performance and flexibility, and their issues regarding integration and ease of use. There are also numerous architectural issues that system designers face when integrating FPGAs into embedded multicomputer systems.

Heterogeneous Multicomputing

In the embedded realm a constant battle rages between increased performance, lower power consumption, lower cost and faster time-to-market or deployment. In many applications one or more of these market forces is driven beyond what Moore’s law can compensate for. As a result, developers are on a constant search for ways to improve one or more of these dimensions. With that in mind, interest in heterogeneous multicomputing is on the rise. By mixing computing resources of different types—general-purpose microprocessors, special-purpose processors, DSPs, ASICs, FPGAs, and so on—developers can derive the maximum from a project’s power/size/fiscal budget.

One problem with heterogeneous multicomputing is that rarely are all of the necessary components for solving a problem available from a single vendor. As a result, developers are often faced with a jumble of non-compatible parts that must be integrated to form a system. As a result, the cost or size/power benefits of heterogeneous multicomputing are often offset by increased development costs and time lost during implementation. There can also be performance costs when incompatible components from different vendors are combined. When FPGAs are added to the mix, the general problem is compounded by the relative difficulty in developing for an FPGA, to say nothing of the integration of the FPGA-based application into the larger system.

Multicomputing Problems

Many problems in real-time multicomputing are computationally challenging and often require tens or even hundreds of state-of-the-art microprocessors working in concert. Some of these difficult problems such as convolution, rebinning, backprojection, and synthetic aperture radar (SAR) signal formation and range/azimuth compression can be implemented in FPGAs with a 5:1 to 50:1 performance improvement over a single general-purpose microprocessor. That said, some algorithms are not well suited for implementation on an FPGA, such as those that perform different types of processing on different types of data. Rarely is an FPGA a good fit for all the algorithms in an application.

For those algorithms that do perform well on an FPGA, there is still a catch: they are not easy to work with. Implementing an algorithm on an FPGA is roughly 10 to 30 times more difficult—in terms of hours of effort—than programming on a general-purpose device such as a RISC processor. And, after the required algorithm is running on an FPGA, there is still the task of creating the interfaces so the FPGA can communicate with the rest of the computing system—I/O, memory, and other processors.

This complex design situation can be divided into three groups of problems. First, an effective design must provide a simple, flexible way to partition an application for optimal performance, running some algorithms on FPGAs and others on different devices such as RISC processors. Second, to keep well-matched algorithms running very fast on an FPGA, the design needs equally fast memory and I/O access. And third, even though programming and integrating FPGAs is difficult, development projects must adhere to competitive schedules.

A general approach to solving the first two groups of problems is to link FPGAs with other types of processing devices via a switch fabric as seen in Figure 1. This approach affords application developers the flexibility to execute different types of algorithms on different types of processing nodes. I/O can be implemented directly to an FPGA or to another specialized device. Systems with this type of architecture can be adapted to a variety of application implementations.

FPGA Design Flow

Regardless of the system architecture, the process of creating an FPGA solution includes two distinct challenges: development and integration. The first considers the FPGA as a programmable processor and consists of developing the "bitstream" that will execute on it. The second challenge considers the FPGA as a "black box" component that must interface with the balance of the computing system.

The developmental stages in creating an FPGA-based bitstream follow a well-known design flow. First, off-the-shelf Intellectual Property (IP) modules are identified to solve portions of the problem. These modules may be commercially available or created by a developer for an earlier project. Meanwhile, new IP modules are designed and created to meet specific, customized requirements.

Next, "Glue code" is developed to tie the various pieces together and to form the clock domains. The resulting code is exercised in a simulator. This simulation framework typically includes bus-functional models of the devices and data channels attached to the target FPGA. The results of this stage are used to identify corrections or adjustments that are needed to the baseline VHDL code.

Once the code is synthesized it is combined with a constraint model and entered into a place-and-route tool. At this stage the physical placement of the various code blocks is established. The results are either a valid bitstream or an indication that the bitstream cannot be realized (forcing changes to the base code).

The next step is to take the resulting bitstream and test it on the target hardware. Testing the bitstream on the actual hardware may indicate the need for additional code changes to obtain the correct functionality or performance needed. It is also the point where the FPGA application must interface with either an I/O stream or a microprocessor or DSP-based application.

FPGA-based Multicomputer Design Flow

The design flow of a multicomputer with integrated FPGAs is similar to any other large-system design flow. First, the general computational stages are segmented and sized for the amount of computational power needed to meet application latency or throughput requirements. Data flows throughout the system are analyzed and the interconnect structure is designed to meet the above-mentioned latency and throughput requirements. Next, the individual computational stages are then individually designed, sometimes by different development groups. Oftentimes the stages are tested using "canned" inputs.

Once the computational stages are complete, they’re integrated with the I/O subsystems on the target hardware. Finally, debugging and performance tuning is applied.

The fundamental challenge is to marry the FPGA design flow with the multicomputer design flow into a natural infrastructure. Focusing on the device interfaces found in the preliminary hardware designs and the OS-based control and status requirements, developers need an infrastructure "kit". Such a kit should have:

  • IP modules for switch fabric interfaces, DMA engines, memory controllers, I/O support and utility functionality.
  • Command-line tools and C libraries to configure, query, and otherwise command an FPGA compute node (FCN) from the system host or other CNs. (An FPGA, with associated memory and I/O interface is referred to as an FCN.)
  • Device drivers to control data movement in and out of the FPGA.
  • A simulation test harness including bus functional models of the memories, I/O ports, and RACEway ports.
  • An example UserIP module to give developers a functioning example from which to start their work.
  • Full documentation, examples, and other support files (makefiles, constraint files, etc.)

Using widely available tools like Synplicity’s Synplify Pro, Model Tech-nologies ModelSim and Xilinx’s ISE, developers can design, simulate and build their application-specific algorithms, integrating whichever kit-supplied IP modules are needed for their particular application. The resulting application bitstream is then launched onto an FCN.

Building a Productivity Kit for FPGA Developers

Even with a flexible system architecture and a clear application design flow, developers are still faced with the difficulty of programming and integrating FPGAs into a complete computing system. While some of this challenge is inherent to the nature of an FPGA, providing a kit of tools to address some common design components can mitigate other development difficulties (Figure 2).

Aid in managing memory access is one element of such a kit. For the most part, system designers working with FPGAs appreciate and use off-the-shelf memory controllers to move data in and out of the memories attached to the FCN. However, there are competing requirements for such controllers—some applications need as high a performance as possible at the expense of ease of use or integration, while other applications can sacrifice some performance to make the integration as easy and quick as possible. The solution is to have a tiered memory controller model.

The low-level controllers provide the developer with direct access to the individual memory channels. For example, in a design using double data rate (DDR) memory, the controller converts the single data rate (SDR) user transactions into the DDR memory signals and, in the case of DRAM, manages refresh automatically. However, the user-level code must manage all of the initialization and reset, channel selection, and in the case of DRAM, the page and burst transfer management. The high-level controllers, on the other hand, provide a simple memory-mapped mechanism for reading to and writing from the memories.

Another important element in a developer’s kit is the IP module to manage I/O to a switch fabric. Such a module should provide the following:

  • A high-performance FCN interface to a multicomputer switch fabric to handle the fabric signaling.
  • A high-speed DMA engine with the ability to push data onto, or pull data from the fabric. Control of this DMA engine is through a device driver on one of the microprocessor CNs.
  • A simple "inside" interface consisting of two FIFOs (one "input" to the module, one "output" from) and some simple control logic.
  • Adjustable size FIFOs and command packet (CP) RAM for the DMA engine. This enables the module to be "tuned" to a specific application, thus minimizing the consumption of precious FPGA resources.
  • A mapping window, providing access to an internal control bus on the FPGA. This provides a direct access mechanism to bypass the module per se and to directly read or write to resources in the user’s application logic.

The developer’s kit also needs a simplified example of a user application, or UserIP. This example has no real computational characteristics, but does include interfaces to memory, fabric and other IO. It gives working application developers a template from which they can learn and build upon.

A Real-Life Example

After a period of research and a number of experimental efforts, Mercury began development of a set of FPGA-based adjunct processor products. Market requirements drove the development of a PCI-based product as a first target. Based on the RACE++ switch fabric and modeled on Mercury’s VantageRT 7410 product, the VantageRT FCN combines two PowerPC 7410-based CNs with a Xilinx Virtex II FPGA. Figure 3 shows the basic architecture of the VantageRT FCN.

The FPGA on the VantageRT FCN is combined with its local double data rate (DDR) SRAM, SDRAM, and a pair of LVDS user I/O ports to form the FPGA-based CN (FCN). The FCN is connected to the RACEway network by two RACE++ connections, providing up to 533 Mbytes/s of bandwidth with the rest of the multicomputer system. The VantageRT FCN is supported by an FPGA CN Developer’s Kit (FDK), which simplifies the integration of an FPGA-based application into a Mercury multiprocessor system.

There is a strong movement in the embedded multicomputer community to find ways to harness the power of FPGA-based processing. Most commercial off-the-shelf (COTS) options available today require significant developer effort to overcome system integration problems. Using FPGAs in the context of a heterogeneous multicomputer enables them to be focused just on those parts of an application where they can be most effective. Supplying developers with a kit of tools and ready-to-integrate IP will greatly reduce the development and integration effort.

Mercury Computer Systems
Chelmsford, MA.
(978) 256-1300.
[www.mc.com].