Defining ‘High Performance Embedded Computing’

How advancements in parallel processing, power scalability and open development frameworks are transforming compute-intensive applications from avionics to medical imaging and beyond

by Colin Cureton and Mitch Furman, Advanced Micro Devices

The concept of ‘high performance computing’ (HPC) typically calls to mind a vast server farm underpinned by tens to hundreds of thousands of processing elements, orchestrating the extreme data crunching required for advanced applications like geothermal exploration and molecular dynamics simulation. ‘High performance embedded computing’ (HPEC) mirrors this construct in some ways, insofar as it relies on parallel processing techniques to run numerous computational threads at high speeds. And as with HPC, some will define HPEC based purely on FLOPS speed, anticipating the day when mainstream embedded processing platforms break the TFLOPS barrier – a benchmark that’s now within reach of some APU and GPU processors.

But the hardware compute level is where the meaningful comparisons between HPC and HPEC end. The divergence between the two rests largely on performance and power scalability, from the seemingly ‘infinite’ in the case of HPC, to the decidedly finite with many HPEC applications.

HPEC typically hinges on the performance of a single processor (or co-processors) within an embedded system that is often tailored to a single user’s requirements. In many cases the performance measurement is inextricably linked to power consumption, distinguishing ‘performance per watt’ (or FLOP per watt) as a more meaningful metric particularly for battery-powered and handheld systems. Where HPC datacenter architects can readily throw more CPU cores and power at a formidable compute challenge, embedded system designers are far more constricted in this regard – the system size, weight and power budget requirements are ever more exacting.

Heat dissipation is another key challenge for HPEC systems given the sheer density of today’s high-end embedded electronics, the narrowing airflow paths passing between embedded subsystems, and the wide variance in operating temperature that can be brought on by fluctuations in environmental conditions. Active fan cooling isn’t a viable thermal management approach for many of these systems given the reliability concerns that accompany moving parts, which can be subject to failure due to extreme shock and vibration, humidity, particulates and other variable and often harsh conditions.

Parallel Processing & Performance Per Watt

For HPEC, high speed parallel processing is largely, but not exclusively, the domain of graphic processing units (GPUs) and accelerated processing units (APUs). Workloads have skyrocketed to demand ever increasing processing performance, often overloading even the fastest CPUs operating at full power. Massively multicore processor architectures can provide an advantage for handling these compute-intensive workloads, distinguishing the parallel processing capabilities of a GPU from a high-end CPU.

Meanwhile the emergence of the Heterogeneous System Architecture (HSA) is allowing programmers to take advantage of the parallel processing capabilities of GPUs applied as co-processors to traditional multithreaded CPUs. HSA brings together the specialized capabilities of the CPU, GPU and various other processing elements within a single chip APU, enabling the ability to dynamically toggle between CPU-optimized serial I/O tasks and GPU-optimized single instruction multiple data (SIMD) parallel graphics and multimedia tasks depending on workload requirements (Figure 1). This approach exploits the dual strengths of GPU parallel processing and CPU serial processing, the latter of which can be capable of much higher clock speeds. HSA also embraces a fully coherent memory model, allowing the CPU and GPU to share and access the entire memory address space.

RTC01 TCrTW AMD Fig1 copy

Figure 1: The Heterogeneous System Architecture integrates parallel processing units such as GPUs with CPUs and other elements such as DSPs on a single chip across a unified interface architecture.

In terms of power management, configurable performance and power scaling capabilities can help a parallel processing-based HPEC system strike an optimal performance-per-watt balance. This advantage manifests as early as the processor selection/procurement stage – with the ability to scale power and performance, a customer designing for a 20W thermal budget, for example, doesn’t have to compromise on a lesser performing 15W processor when he/she could use a higher performing 25W processor that’s ‘tuned’ to 20W.

Once deployed and operational in the field, an HPEC system with real time power/performance scaling capabilities is well suited to support rapidly-shifting processing workloads and challenging thermal conditions. What’s needed here is the ability to reduce the power of underutilized cores while also allowing for dynamic allocation of the thermal budget between cores for improved performance and more agile heat dissipation. This is especially important for APUs and other platforms with multiple onboard processing engines and varied functional ‘blocks’.

AMD’s Turbo Core technology utilizes algorithms that assess a variety of frequency, voltage, temperature and logic activity inputs to determine in real time which core needs a performance boost and how much thermal headroom is available. This intelligence enables an AMD APU to dynamically downclock GPU frequency and increase CPU frequency, or vice versa, without straying outside its thermal envelope. This configurable thermal design profile (TDP) capability ultimately equips a parallel processing-based HPEC system to conserve power while allocating it to where it can do the most good for the system at any given moment.

New Frontiers in HPEC

Among the embedded applications at the forefront of HPEC innovation, avionics and medical imaging are perhaps the most prominent drivers of embedded processor performance requirements.

For graphics-intensive conventional military and civilian avionics applications, incremental gains in processing performance unlock new potential for improving responsiveness and situational awareness for pilots and UAV operators. New advancements in synthetic vision and video overlay capabilities hold the promise to transform modern aircraft control panels and displays bringing photo-realistic 3D graphics clarity to the cockpit to enhance the pilot’s understanding of the flying environment in real-time. This yields clear advantages in commercial air transport and military applications while helping to ensure greater overall safety.

To achieve these new levels of graphics performance, designers of avionics systems are increasingly transitioning away from FPGA and DSP platforms in favor of more versatile, higher performing embedded GPUs, which are optimized to handle the high-speed parallel processing required for tasks like radar processing, object recognition, 3D mapping and video manipulation.

In the medical imaging domain, advancements in HPEC extend from the point of diagnosis to the patient’s bedside. High-performance, low-power APUs and GPUs are enabling portable ultrasound capabilities – including 3D visualization – that previously had only been available in hospitals or clinics. These systems can now be deployed for ambulatory and battlefield usage, providing accelerated image transformation and delivering excellent image rendering for medical personnel. HPEC can also be applied to improve 3D image reconstruction in low-dose X-ray applications, helping to minimize patients’ radiation exposure by boosting the computation required to ‘fill in the blanks’ of the sparse data being collected in these instances.

The medical device market is also moving quickly to embrace 3D visualization at the point of care via a new generation of video- and graphics-optimized touchscreen panels that can be attached to hospital beds and dental chairs for use by medical staff and patients alike. Medical practices are beginning to transition away from conventional 2D X-ray film and light-box illuminators to sleek, chair-side monitors that provide 360 degree image visualization and other advanced graphics-driven capabilities in HD resolution with intuitive multi-touch interactivity à la tablet computers. These devices help can equip medical staff to assess patients’ medical imagery with great accuracy and process efficiency, while simultaneously providing new levels of visual detail to their patients. Where previously patients may have struggled to understand diagnoses and treatment recommendations made on the basis of static 2D renderings, they can now have a clear view of the treatment area and care methodology.

HPEC is finding its way to a host of other applications as well, enabling advanced facial recognition capabilities in video security and surveillance systems, and gesture interactivity capabilities in digital signage systems, for example. For many of these applications, and for digital gaming as well, HPEC also plays a significant role in powering multiscreen visualization and immersion.


Figure 2: High-performance embedded computing is enabling ever greater visualization in medical devices.

Agility Is Everything

The concept of HPEC extends beyond the processing hardware itself. It also encompasses the design processes and software tools that can help to accelerate application performance by maximizing parallel compute utilization for heterogeneous architectures across all supported mainstream parallel processing platforms. HPEC system designers are increasingly seeking out methodologies and tools that present the designer with an abstract platform model that conceptualizes all of these architectures in a similar way and targets all of the parallel processing elements available in a given system.

Cross-platform, non-proprietary programming frameworks like HSA, OpenCL and OpenGL have proven very valuable in this regard, enabling designers to focus on applications rather than chip architectures via a single, portable source code base. This approach helps designers achieve significant programming efficiency gains for parallel processing-driven HPEC systems while lowering costs by maximizing the ability to repurpose existing code for new systems, preserving the value of their accrued investments in code development.

Continued innovation in APU and GPU parallel processing is equipping embedded system designers to approach teraFLOPS processing performance while taking advantage of advanced power scaling capabilities that maximize core utilization and optimize thermal budget management to achieve HPEC-caliber performance per watt profiles. Leveraging open development frameworks to achieve new levels of parallel processing efficiency and design agility, system designers are well positioned to push the boundaries of HPEC for decades to come.

Advanced Micro Devices, Sunnyvale, CA (408) 749-4000.