Mobile Graphics CPU Promises High Performance Embedded for both Graphical and Compute-Intensive Applications

The demand for graphics performance is resulting not only in processors that deliver increasingly realistic and interactive graphics, but also capabilities for ever higher-performance numeric computing. Now these capabilities are going mobile—with big implications for the embedded world.


Many years ago in the late 1970s, I attended one of the early personal computing shows where the new system from a major vendor (who shall remain nameless) was being introduced. This particular system was able to display patterns put together from blocks of x by y white pixels to form rather crude images. But it was new; nobody else had it at the time and the person manning the booth was overheard to say, “Yes Ma’am, full graphics capability.” Today, of course, we know better and even then I shook my head. However, today we can rightly say that we have “truly amazing graphics” and very high-performance computing.

In fact, these days, the term “high-performance computing” is increasingly popping up in things like marketing and conference programs. While it might be possible to dismiss this as somehow vague and self-serving, it nonetheless indicates a change in awareness. Something new is happening. And now we are also hearing the term, “high-performance embedded computing.” Does this mean that what is generally considered to be high-performance computing (i.e., the perceived performance capabilities) is now finding its way into embedded systems? There are strong indications that this is exactly what is happening.

The answer to the question, “How much performance can we actually put into an embedded system?” is, “Just as much as you can get without exceeding the size, weight and power restrictions.” And the answer to the question, “What can you do with all that performance?” is, “Not quite everything you might want to.” In other words, there are no conceivable limits. There are, however, moments where we can take stock and appreciate how far we have come. And it appears that with the introduction of its new Tegra K1 processor, NVIDIA has just driven such a stake in the ground.


Actually, “stake in the ground” may be the wrong metaphor. A better one might be, “a crop circle in a field.” A couple of weeks prior to the introduction of the Tegra K1, reports started emerging about a crop circle that had appeared in a barley field near Salinas, California. The pattern appeared to depict the diagram of a highly integrated IC and contained Braille code for the number 192. After weeks of speculation and learned-sounding analysis by numerous UFO “experts,” NVIDIA revealed at the introduction that it was behind the crop circle as a way of emphasizing its claim as to the advanced nature of the processor

(Figure 1).


Figure 1
Crop circle, which “mysteriously” appeared in a barley field near Salinas, California.

It turns out that the number 192 (which some pointed out is the atomic number for a radioactive isotope of Iridium) referred to the number of cores in the Kepler GPU architecture graphics engine on the chip. The Tegra K1 is specifically targeted for mobile systems—at this point primarily mobile gaming systems. That makes perfect sense, since the gaming market is large enough and has sufficient demand for performance to justify the investment and design effort involved. But as we shall see, the ability of the GPU to also handle extremely intense mathematical applications from seismology to astrophysics makes it attractive in a much wider range of applications, many of them also mobile and embedded, such as robotic vision.


NVIDIA has actually developed two pin-compatible versions of the Tegra K1—a 32-bit and a 64-bit version, both based on the ARM instruction set. The 64-bit version appears to be scheduled for later release and is a dual Super Core CPU based on the ARMv8 architecture. The 32-bit version uses a 4-Plus-1 quad-core ARM Cortex A15 CPU first used in the Tegra 4. This arrangement enables power saving by using variable symmetric multiprocessing (vSMP) for performance-intensive tasks on the quad-core complex and can also switch to the (plus-1) “battery saver” A15 core for lower-performance tasks. NVIDIA states that it has optimized the 4-Plus-1 architecture to use half the power for the same CPU performance as the earlier Tegra 4, and to deliver almost 40% more performance at the same power consumption (Figure 2).


Figure 2
Tegra K1 Delivers higher CPU performance and power Efficiency.


In addition to the Kepler GPU and the Cortex A15 complex, the Tegra K1 incorporates a dual ISP core that can handle up to 1.2 Gigapixels to support cameras up to 100 Megapixels. In addition, there is a display engine that can simultaneously drive a 4k local display as well as an external 4k monitor via HDMI (Figure 3).


Figure 3
NVIDIA Tegra K1 Mobile Processor (32-bit version).

High Mobile Performance: Graphics and Otherwise

The driving force in the gaming industry, which has significant implications for all other aspects of computer systems, is the demand for an ever richer, realistic and interactive graphical experience. Kepler-based GPUs from NVIDIA have found their way into a number of advanced gaming systems and also into high-end workstations used for 3D visualization, medical imaging and a host of scientific applications. The demands of handling textures, tessellation shading and providing anti-aliasing for smooth motion visuals have called for ever greater graphics performance. In addition to that, there is a growing need for computational power to calculate the physics involved with motion and collisions (e.g., parts, rocks, etc., flying everywhere). All these and more must be addressed by a GPU like that in the Tegra K1.

Kepler GPUs range in size with the largest, used in desktops and supercomputers, including up to 2880 single-precision floating-point cores and consume power in the hundreds of watts. The Tegra K1 GPU has 192 cores and consumes an average of under two watts—that is for the GPU, not the processor as a whole. The Tegra K1 GPU has one graphics processing cluster (GPC) with the 192 cores, a streaming multiprocessor (SMX) unit, a memory interface and a 128 Kbyte L2 cache. The unified cache is important in reducing off-chip memory accesses and keeping power consumption down.

In many mobile and embedded systems, high-end interactive graphics is becoming increasingly important for such things as gesture recognition, facial recognition and a host of automotive applications that affect safety. But at the same time—as noted with physics calculations for gaming—the ability to handle numerically complex and intensive computational tasks is equally important for such things as visualizing plaque in arteries, analyzing traffic flow or visualizing molecules to name a few.

The Tegra K1 is designed to support the latest graphics protocols such as OpenGL 4.x and DirectX 11.x. But the inherent floating-point performance can also be harnessed for a vast number of other tasks. NVIDIA has developed a parallel computing platform called the Compute Unified Device Architecture (CUDA), a set of libraries, compiler directives and extensions that allow programmers to use C and C++ to execute code in parallel on the GPU cores. Thus the GPU was dubbed a general-purpose graphics processing unit (GPGPU). While CUDA was developed and is supported by NVIDIA, there is another framework called OpenCL, which like the graphics-oriented OpenGL was developed by the non-profit Khronos Group. NVIDIA has stated that it is willing to support OpenCL 1.2 for the Tegra K1 “based on customer needs.”

The availability of a processor such as the Tegra K1 in the mobile space opens up possibilities for speech recognition, gesture recognition, computer vision and live video processing in small, even handheld, devices. It will certainly not be the last. The availability of compatible language platforms that can move applications among different models, even different vendors, of graphics and GPGPU engines seems destined to accelerate the development of such devices and the expansion of high-performance embedded computing as well. And disguise it as the work of space aliens if you will, the implications for human ingenuity are immense.


Santa Clara, CA
(408) 486-2000

Be the first to comment

Leave a Reply