Open up the simplest consumer device—cell phone, MP3 player, etc.—and you will see a dizzying number of chips, each performing a specific function. The model that you buy today will be replaced in six months, by a new model that is smaller, lighter and packed with more functionality. Like a mouse on a wheel, embedded programmers are constantly running to make the complex simple, with a due date of yesterday.
As embedded systems evolve to support an ever-greater number of functions and capabilities, an increasing number of designers are turning to multiple processor architectures to achieve their performance, power, cost and time-to-market goals. Today’s embedded systems are no longer stand-alone, single-processor architectures that can be neatly controlled by a single team using a single development environment. In many cases, these architectures are a heterogeneous mix of single and multicore general-purpose processors (GPPs), digital signal processors (DSPs) and field programmable gate array (FPGA) processing components.
Further complicating design is the need for overall system flexibility. Not only do designers need to be able to repartition functionality between processing components to resolve bottlenecks and maximize efficiency, they also need mechanisms in place to enable them to quickly and effortlessly take advantage of future innovations in processors, operating systems and application-specific algorithms for next-generation designs.
Effectively developing the architecture of these new multiple processor systems—as well as the applications that span them—requires a new approach to system design. They must be treated as small networks unto themselves. System architects need an efficient and reliable communication infrastructure, commonly referred to as middleware, in order to pass data between processing components. While at first glance it may seem like a straightforward process to design and implement an efficient communication infrastructure specific to an application, doing so can actually place unnecessary and undesirable long-term constraints on a system.
Every time you repartition functionality, you will need to redesign multiple custom interfaces as well. When the time and effort needed to adjust communication interfaces manually becomes too high, the overhead involved can exceed the value in performance gained by repartitioning. Consequently, you lose the performance benefits of optimizing your system through repartitioning.
Standardizing the Communication Framework
The Common Object Request Broker Architecture (CORBA) standard was developed to provide a common communication framework between software components in a system. By abstracting applications, hardware components and the interfaces used to communicate between them, CORBA enables different parts of a system to communicate with each other, independently of how and where functionality is actually implemented (Figure 1).
For example, consider a military radar application that needs to process a high-speed signal. From an application perspective, what matters is the resulting signal, not whether it has been processed by a DSP or FPGA or both. CORBA ensures that the appropriate signal and signal processing workload are moved as efficiently as possible to the DSP or FPGA (or both) without requiring involvement from the application or developer. This is made possible through the use of a standard interface.
As important as making sure functionality can be moved to a DSP is the ability to easily and efficiently remove from the DSP all the tasks that might interfere with its optimal performance. Thus, this same standard interface enables system architects to further break down large partitions into several smaller ones. It now becomes both possible and simple to divide an algorithm initially assigned completely to one processor to multiple different processors, e.g., GPPs, DSPs and FPGAs. This further maximizes performance, minimizes latency and reduces system cost.
Flexibility of this magnitude is essential at the system level. System architects can then migrate logic transparently to optimize an architecture in many different ways, depending upon the application’s requirements. Voice applications, for example, may implement voice processing on a DSP with signaling handled by a GPP. Various stages of a video pipeline such as color correction or scaling can be interspersed back and forth between a DSP and an FPGA. When the cost of repartitioning is minimized, system architects can test out more partitioning and processor options to achieve the optimal system architecture.
CORBA for Multiple Processor Applications
CORBA was originally created for enterprise applications by the Object Management Group (OMG), which is currently comprised of more than 800 companies. CORBA’s value in the development of any complex architecture has been recognized for quite some time, and in 1999 Real-Time CORBA was released. In 2006, the OMG approved CORBA for embedded applications (CORBA/e). CORBA/e was specifically designed for real-time and embedded systems that require a small memory footprint and predictable, deterministic execution behavior. CORBA/e also provides a flexible architecture capable of supporting distributed processing. Commercial ORBs are now available that have been built from the ground up with very small footprints and high throughputs for real-time and embedded systems.
CORBA/e is primarily for GPPs and DSPs, which means that it only provides two-thirds of the answer required for today’s multiple processor systems. Fortunately, CORBA continues to evolve to serve all three processing platforms, enabling them to communicate efficiently. Through committed industry support, the capabilities of CORBA have been carried over to FPGAs.
Even with CORBA/e, FPGAs have historically been difficult to include in the CORBA framework. Without a communication framework, designers have had to create a custom proxy or bridge interface between the GPP and accelerated functions, and the result was that partitioning of functionality quickly became fixed.
FPGA-specific CORBA implementations, however, are now available to remedy this problem. With solutions such as ORBexpress FPGA from Objective Interface Systems, accelerated functions can be contained within a compact CORBA wrapper, which provides the necessary communications framework. Implementing CORBA directly in hardware provides the seamless interface between GPP and user-defined blocks necessary to support flexible partitioning of functionality. Additionally, wrappers are extremely efficient, utilizing only a small fraction of available gates while minimizing latency.
Within each development environment—GPP, DSP and FPGA—the various components that make up the CORBA framework are constructed and tuned to the specific implementation without requiring manual involvement from developers. For example, when components are co-located on the same processor (i.e., two partitions on a DSP communicating with each other), there is almost no overhead since all unnecessary ORB mechanisms can be eliminated. It is important to note that even when components are located on different processors, ORB overhead is often less than that of other communication mechanisms developers might use.
Location Transparency: Optimization through Migration
In order to manage complexity and enable systems to take advantage of new innovations in silicon and software as they develop, many developers are adopting a software-based approach to design. Software Defined Radio (SDR) is an example of how 100% of signal processing is being defined in software instead of being hardwired in ASICs.
The beauty of substantially defining the system in software is that you can change the physical processing components of the system without having to modify the software. This brings greater flexibility and efficiency to the development process.
Because functionality can be migrated between processing components, assignment of functionality can be done later in the development process. In fact, even after a system has been deployed for some time, functionality can be reassigned as new advances in processors become available.
Consider the development of a complex embedded system, whether a military, communications, financial, industrial or medical application. First, developers need to prove out the concept, typically using a GPP or workstation. While a GPP is likely not the optimal platform for many of such an application’s functions, the flexibility and rapid prototyping capabilities of the GPP’s development environment make it the fastest way to demonstrate the feasibility of a particular design approach.
In traditional development cycles, system architects find themselves forced to partition the various stages of the processing pipeline to the various processing components. In each case, a specialized interface would be created to connect pipeline stages (Figure 2). Both these interfaces and the component-specific implementation of the stage require substantial engineering resources. As the design progresses, bottlenecks and other inefficiencies will invariably arise, making it clear that other partitioning choices would be more optimal. However, because of the work already invested in this particular partitioning of functionality, it becomes more cost-prohibitive to repartition the system over time. Additionally, repartitioning increases time-to-market, threatening the timely release of a product.
Through the use of a common and well-defined communications framework such as CORBA, developers are able to more easily manage design complexity in multiple-processor embedded systems because they can abstract the functionality from its actual implementation. After creating a GPP-based proof-of-concept, system architects are now able to begin optimizing for performance or cost-reducing a system by migrating specific algorithms and functions to the processing components best suited to handle them. Because of the standard communications infrastructure enabled by CORBA, true location transparency becomes possible and application functionality can, for example, migrate to and from an FPGA at any point in the design cycle. The top-level application remains unchanged throughout the entire development process as developers address performance bottlenecks through seamless migration of functionality (Figure 3).
Portability of Functionality
Many developers make the mistake of focusing on code portability rather than functional portability. Code portability means that a developer can move, for example, DSP code between processors within the same family or, with substantially more effort, between DSP architectures. The idea of code portability completely breaks down when considering moving code to an altogether different processor technology, such as from a DSP to FPGA. Yet, this is exactly the level of flexibility that developers require to effectively address processing bottlenecks.
For example, it is often the case that a processing pipeline is first proven using a GPP, re-implemented on a DSP, and then further broken down by moving some portion of the pipeline to be implemented on an FPGA. Additionally, each of these migrations can occur multiple times as developers discover that they didn’t select quite the right place to partition the pipeline.
Certainly, moving part of a pipeline from a DSP to FPGA will require significant recoding. However, repartitioning is even more difficult under these circumstances since a new interface between the new partitioning lines must be created and implemented as well. Each interface is as unique as where the break in the algorithm occurs. What ends up happening is that system architects must allocate a substantial part of their constrained engineering resources, reinventing and re-implementing these interfaces. As interfaces change, so must the software components that use them, resulting in changes potentially propagating throughout a system. As a consequence, instead of focusing effort at the application level, developers must spend precious time creating the communications infrastructure over and over again, as well as debugging it.
With CORBA/e for GPPs and DSPs and new CORBA solutions for FPGAs, developers are able to focus completely on functional portability. This allows system architects and other designers to stay focused on what the system needs to accomplish, not on the minutiae of how it will be accomplished. The use of standard interfaces between software components means that repartitioning an algorithm from one processing technology to another does not impact other partitions—an invaluable benefit. Rather than boundaries between software and processing components representing a barrier to repartitioning—because of the additional interface development that will be required—these boundaries can reflect the optimal assignment of functions to the best-suited processing resources. This approach results in a more efficient and flexible architecture. Through CORBA, embedded systems can be deployed across platforms without modifying application-level software. This leads to a higher level of efficiency and allows the optimization of system performance, latency and cost in ways not previously possible.