New GPGPU modules expand Curtiss-Wright’s family of high performance embedded computing (HPEC) processors for artificial intelligence (AI) applications
Curtiss-Wright’s Defense Solutions division announced that, through its Reseller Agreement with WOLF Advanced Technology, it has expanded its family of open architecture high performance embedded computing (HPEC) processors designed for demanding ISR applications with the addition of three new NVIDIA Quadro Turing (TU104/6) GPU/inference engine-based OpenVPX™ modules. Curtiss-Wright also announced the availability of a new AMD Radeon™ (E9171) based XMC graphics engine card. Designed to support compute-intensive ISR and EW systems, the fully rugged VPX3-4925, VPX3-4935, and VPX6-4955 modules feature Tensor Cores (288, 384, and 768 respectively) that are ideal for accelerating tensor/matrix computation used for deep learning neural network training and inference used in deployed and artificial intelligence (AI) applications requiring TFLOPS of accelerated processing. These applications include high-performance radar, SIGINT, EO/IR, data fusion ingest, processing and display, and autonomous vehicles.
The size, weight, and power (SWaP) optimized VPX3-4925 module, a 3U OpenVPX GPGPU processor, features a NVIDIA Quadro Turing TU106 (RTX3000E) GPU that delivers 6.4 TFLOPS/TIPS performance. It provides 2304 CUDA® cores, 288 Tensor Cores and 36 ray-tracing (RT) cores. For higher performance in SWaP-constrained applications, the 3U VPX3-4935 module, features a NVIDIA Quadro Turing TU104 (RTX5000E) GPU that delivers 11.2 TFLOPS/TIPS. The VPX3-4935’s higher core count includes 3072 CUDA Cores, 384 Tensor Cores, and 48 RT Cores. For more demanding applications, the 6U form factor VPX6-4955 (6144 CUDA cores, 768 Tensor Cores, 96 RT Cores) hosts dual TU104 GPUs for 22 TFLOPS/TIPS performance. Designed to work in conjunction with NVIDIA TensorRT™, CUDA, the modules’ Turing Tensor Cores add INT8 and INT4 matrix operation while continuing support for high-precision workloads. These state-of-the-art GPGPU modules further extend Curtiss-Wright’s proven leadership as a supplier of the most advanced computing solutions for embedded ISR applications. To meet demanding rugged military and aerospace specifications, these GPGPU boards feature a chip-down design.
“The introduction of these three new embedded AI-engines brings NVIDIA’s industry-leading Turing GPU architecture to deployed defense solutions,” said Lynn Bamford, Senior Vice President and General Manager, Defense and Power. “These newest additions to our family of COTS-based modules further strengthens our commitment to being the embedded industry’s proven and trusted HPEC solution provider.”
Complete System-Level HPEC Solutions
The VPX3-4925, VPX3-4935, and VPX6-4955 modules are fully interoperable with Curtiss-Wright’s broad family of 3U and 6U system-level C4ISR/EW OpenVPX solutions. SWAP-constrained systems can pair the single GPGPU VPX3-4925/35 modules with the Intel® Xeon®-D processor-based CHAMP-XD1 DSP engine, and together can augment the powerful sensor processing capabilities of our Xilinx® FPGA-based VPX3-534/535 transceiver modules. For higher performance 6U systems, the dual GPGPU VPX6-4955 can pair with our dual Intel Xeon-D processor based CHAMP-XD2 DSP module and powerful CHAMP-FX4 triple-FPGA processer module workhorse, along with many other modules in Curtiss-Wright’s Fabric40™ family of 40 Gbps products.
Numerous system configurations can be formed using combinations of these boards. For example, a single CHAMP-XD2 node can be used to control both Turing TU104 GPUs on a VPX6-4955, or each node on a CHAMP-XD2 can be mapped to each individual GPU. Taking this design one step further, a single Xeon-D on the CHAMP-XD2 can control one VPX6-4955 board upstream, while the other Xeon-D maps to a second VPX6-4955 module downstream. This processing slice, consisting of a CHAMP-XD2 and two VPX6-4955 modules, delivers an impressive 45 TFLOPS of performance that can be connected to the rest of the system via multiple 40 Gbps Ethernet/InfiniBand™ ports available on the OpenVPX data plane, and through a Curtiss-Wright VPX6-6802 central fabric switch.
OpenHPEC™ Accelerator Suite™ Support
Curtiss-Wright HPEC modules and systems are supported by the OpenHPEC Accelerator Suite of best-in-class software development tools. These powerful tools enable system developers to develop their software faster. For example, the suite includes the powerful Bright Cluster Manager from Bright Computing, an NVIDIA partner. Bright Cluster Manager provisions and monitors both the CPU and GPU boards, and includes a fully configurable module environment. The OpenHPEC tool suite includes Bright’s deep learning libraries and tools from both Intel and NVIDIA, including Caffe and TensorFlow. It also provides the Arm® Forge suite, which enables true system level debugging and profiling for both CPU and GPUs, and supports MPI, OpenMP, and ACC.
For high-speed, low-latency, peer-to-peer communications, the OpenHPEC Accelerator Suite also includes Dolphin’s PCIe communication library, which hides the complexities of directly programming the system’s PCIe devices. In addition to supporting GPU sharing between the CPUs, the Dolphin library also supports both CPU direct and remote direct memory access (RDMA).
Use of the OpenHPEC Accelerator Suite simplifies, speeds, and lowers the cost of ISR application development. These tools deliver the benefits of open standard High Performance Computing (HPC) software to the COTS market to effectively remove risk when developing large scale embedded computer clusters.
Resulting from its Reseller Agreement with WOLF Advanced Technology, the VPX3-4925, VPX3-4935, and VPX6-4955 HPEC modules have been pre-validated. They complement Curtiss-Wright’s previously announced family of NVIDIA Pascal™ GPGPU modules by speeding and easing the integration of HPEC solutions into deployed systems.