Pushing the Envelope on Graphics Processors and Opening the Way for Neural Nets

Pushing the Envelope on Graphics Processors and Opening the Way for Neural Nets

NVIDIA targets the autonomous vehicle with processor and neural net learning advances and partnerships with multiple auto manufacturers.

BY TOM WILLIAMS, EDITOR-IN-CHIEF

NVIDIA, long a key player in the graphics processor (GPU) arena, is coming out with ever more powerful processors build on its well-known and widely used CUDA architecture, is now making a major push into the quest for the driverless car. In doing so, they are leveraging the inherent processing power of their newest GPUs as well as enabling the use of neural network processing on their architecture and quite probably giving a big boost to the use of neural nets and artificial intelligence in a wide range of application areas.

At its recent GPU Technology Conference in San Jose, NVIDIA unveiled a number of new products with advanced graphics, intensive numeric processing and neural network capabilities, in a surrounding context of advanced automotive technologies that will one day soon cultivate in the driverless car. In addition, it became clear that, while the initial focus is on automating the automobile, the type of “deep learning” of which these new processors are capable will be able to address a wide range of heretofore intractable problems.

Sharing the stage for part of the keynote address with NVIDIA CEO Jen-Hsun Huang was the CEO of Tesla Motors, Elon Musk. Tesla is, of course, known for its pioneering work in electric vehicles and most recently for its Tesla Model S, which at a mere $80,000 is pushing the price tag for EVs lower, with a new model coming out that is supposed to land in the range of around $50,000. Musk’s message the day, however, it to reinforce NVIDA’s push for the autonomous vehicle.

One of Musk’s points is that there is a need to establish a hardware platform that can be used for continuous software updates. He notes, “The car will get smarter and smarter with the current hardware suite. Even with just what we have, we’ll make huge progress in autonomy. We can make car steer itself on freeway, do lane changes. Autonomy is about what level of reliability and safety we want.” With current processors, he says, having autonomy in the 5-10 MPH range it is relatively easy because a car can stop within the range of sonic sensors. The big hurdle is the 10-50MPH range because situations can be quite complex. Then in the freeway environment over 50MPH it gets easier because the range of possibilities gets smaller. But, he notes, that environment could be handled with today’s processors.

Now surprise, surprise, NVIDA is announcing the Drive PX, a single-board computer targeted specifically at developing autonomous vehicles. The Drive PX uses dual Tegra X1 processors, the 64-bit version of the Tegra announced last year, the X1 has 8 64-bit ARM cores and a GPU with 256 CUDA cores for processing power of one TeraFLOPS and a video throughput of 1.3 gigapixels/second. It can handle the input from 12 two-megapixel cameras at 60 fps. The Drive PX, however, is intended for use within the automobile where it will execute the driving software developed on systems using neural networks and deep learning software.

NVIDIA has been actively partnering with Tesla on developing such software as well as with Audi and BMW. Developing such code requires much more processing horsepower than would currently run conveniently on even the dual-Tegra horsepower of the Drive PX.  In that context, NVIDIA is announcing ever more powerful processors with the announcement of the new GeForce Titan X processor, which, for starters boasts 8 billion transistors (Figure 1).

Figure 1

NVIDIA’s new GeForce Titan X boasts 3072 CUDA cores for only 9.

While the Titan X is being promoted for its truly powerful capabilities in the gaming arena and for implementing virtual reality, its 3072 CUDA cores make it a powerful tool for neural networks and the deep learning utilized by the advancing automotive automation as well as for added topics in science and engineering. With 12 Gbytes of on-board frame buffer memory and a memory speed of 7 GBytes/s, the Titan X appears to be the latest and perhaps final processor that will be based on NVIDIA’s Maxwell architecture.

Still in the future is the Pascal, which was discussed last year as becoming available this year. It would appear that that schedule has been revised a bit as the Pascal is now slated for availability in 2016. Pascal will represent another large advance as it is projected to use NVIDA’s new NVLink technology. NVLink is expected to increase data rates from 5 to 12 times that of current PCIe 3.0. Putting this fatter pipe between the CPU and GPU will thus allow data to flow at more than 80GBytes/s, compared to the 16GBytes/s available now. The GPU will be able to access memory at near the bandwidth of the memory and will enable a faster data link between GPU and CPU. The NVLink model will also implement unified memory, in which the developer can treat GPU and CPU memory as a single block.

The new Pascal GPUs will also feature 3D memory, which stacks DRAM chips into dense modules with wide interfaces, and brings them inside the same package as the GPU. The new memory chips are expected to have multiple times the existing bandwidth, about 2.5 times the current bandwidth and size and have 4 times the energy efficiency of today. This makes possible more compact GPUs that put more power into smaller devices. The result: several times greater bandwidth, more than twice the memory capacity and quadrupled energy efficiency.

The Maxwell architecture is inherently parallel, which has long been known to lend itself to advanced high-speed graphics processing. It also is an appropriate architecture for implementing neural networks. Neural nets have been implemented to a certain extend on traditional CPU architectures and as such have been well used to advance the knowledge about neural nets. Now, however, massive parallel architectures are becoming available to enable the implementation of truly deep learning as the speed and density needed for such things as object recognition and classification in real-time environments like that of the autonomous vehicle.

Software for such object recognition and vehicle control applications can simply not be mastered using an “if-then” structure, which would have to expressly define all cases and variations in objects. The deep learning approach’s neural nets learn many levels of abstractions ranging from simple concepts to complex ones. Each layer categorizes some kind of information, refines it and passes it onto the next, hence the term “deep learning.” Thus the first layer might be simple edges, the next simple shapes composed of edges, the next might be features like eyes or noses. At deeper layers these might be composed into faces or individual objects.

Applications in facial recognition, genetic analysis, speech recognition and translation and many more are then possible. In the case of automotive automation, the ability to read street and speed signs, distinguish pedestrians and all manner of other objects is essential and now possible. NVIDIA is offering two new products to help create such deep learning neural network-based systems. The first is a software tool called DIGITS Deep Learning GPU Training System, which lets users get quickly started implementing and developing neural nets.

The second is a targeted system specifically designed for developing deep learning neural nets called DIGITS Devbox, which is a desktop system that comes with four Titan X processors, 64 Gbytes of DDR4 memory and is preloaded with the DIGITS software (Figure 2). The Devbox sells for $15,000 and is available only to qualified customers. The idea is that applications created by training the neural networks on the Devbox can be loaded onto a more compact embedded computer such as the Drive X.

Figure 2

The DIGITS Devbox is specifically designed for developing deep learning neural net applications with four Titan X processors.

DIGITS guides users through the process of setting up, configuring and training deep neural networks, handing the underlying details so that scientists can focus on the research and the results. Preparing and loading training data sets with DIGITS, whether on a local system or from the web is simplified by an intuitive user interface and workflow management capabilities. The system provides real-time monitoring and visualization so users can fine-tune their work. It also supports the GPU-accelerated versions of Caffe, a framework used by many scientists and researchers to build neural nets.

At the keynote address, which was delivered on a Tuesday, Tesla’s Elon Musk hinted at an upcoming announcement from his company. He remarked that we are very close to a self-driving car from a technology standpoint. However, the social and regulatory factors that must be overcome will put it off for some time. The following Thursday, Tesla revealed that it will be sending software upgrades to users that will enable them to drive hands-free on the Interstate and operate the car autonomously on private property. It would seem that this corresponds to the 0-5 and over 50 MPH ranges referred to above. It notably does not include the 5-50 MPH range that is still problematic.

Regulators are said to be scrambling to deal with the Interstate capability because there are actually many states with no laws against it. California, however, where most Tesla S models are sold, does have laws requiring specially-trained drivers in autonomously controlled vehicles. That would allow autonomous operation on private property but would be risky otherwise. Still, it does look like we are on our way to fully autonomous cars—for those who would actually want them—far sooner than anticipated. And Elon Musk predicts that with some two billion cars on the road, any transition will take years. However, he says, “In the distant future, (legislators) may outlaw driven cars because they’re too dangerous.”