AMCs Make Modular Multiprocessing a Reality

Several standards have emerged to service the modular computing market. The Common Mezzanine Card (CMC) standard provides a small form-factor capable of supporting a CPU, system bridge, memory and flash. The PCI Mezzanine Card (PMC), a derivative of the CMC standard, added a PCI interface, allowing modules to communicate with one another and peripherals through a high-speed data bus.

The PMC provided a good platform for many types of computing systems. However, engineers were not satisfied with a single processor per PMC module. It was soon realized that there was just enough PCB real estate to support dual processors in a symmetric multiprocessing (SMP) architecture. In this environment, two processors share the same memory and I/O resources. For applications that mostly lived in the CPU caches, this architecture provided significant performance-versus-price improvements.

The PMC standard, however, did not account for the large power and cooling requirements of multiprocessor boards. PMCs were originally intended to host peripheral devices, such as SCSI and Ethernet controllers, not power-hungry CPUs. Additionally, features that were advantageous to high-performance computing, such as hot swap and board management, were not requirements when the PMC standard was created.

The Advanced Mezzanine Card (AMC) standard is the most recent addition to the suite of modular standards. AMCs increase the total amount of power allocated to each module, provide a larger selection of high-bandwidth interconnect, as well as support modular computing features such as hot swap and board management. Figure 1 shows a comparison of the two form-factors.


Multiprocessors

A multiprocessing system consists of multiple processing units tied to a single system controller bridge. This bridge usually consists of CPU interfaces, a memory controller and various I/O interfaces. As an example, consider the Marvell Discovery III PowerPC system controller. This embedded device consists of a 60x CPU bus interface, a DDR SDRAM controller, multiple gigabit Ethernet controllers, as well as two PCI bus interfaces (Figure 2).

The key advantage of a symmetric multiprocessing (SMP) system is the cost-effective sharing of resources such as memory and interconnects. While it is true that each CPU in a dual-processor multiprocessing system only has half the potential memory bandwidth, there are a large number of computationally intensive applications that benefit from the reduced costs and PCB real estate of the multiprocessor approach. These applications typically are able to reside mostly within the caches of a processor. However, there are memory-intensive processor applications such as pattern searching that would be better suited to having a separate memory subsystem for each processor. One must perform a careful analysis of the application before considering an SMP platform over a uniprocessor approach.

SMP architectures also provide performance enhancements without the software headaches of clustering separate computer systems. In SMP, a single operating system is run on the processors. This places the majority of the load-balancing responsibility on the operating system and allows the programmer to follow a single processor model. The ease of sharing data structures is also a significant performance advantage of SMP. Shared data structures can be examined by any processor without the overhead associated with interconnect transactions or message-passing protocols.


PMCs

The PMC form-factor has several limitations when used in a multiprocessor system. These limitations include power dissipation and interconnect bandwidth, as well as lack of high-availability features.

PMCs are limited by the total amount of power they can draw from the 3.3V and 5.0V power rails provided. The power pins are limited to 1A per pin. This works out to 29.7W on the 3.3V rail and 30W on the 5V rail. While in theory this would yield around 60 watts of power, the CMC standard limits total power dissipation to 7.5 watts. The PrPMC (Processor PMC – VITA 32) specification extends this to a maximum of 25 watts.

Cooling a PMC can present additional challenges. The standard 10 mm and 15 mm module heights provide a limited amount of cross-sectional area for dissipating heat. For the 15 mm case, a cross-sectional area of 14 cm2 is available for components, heat sinks and airflow. With a standard 200 linear feet per minute of airflow, it is difficult, if not impossible to dissipate more than 25 watts of power in a commercial 0° to 55°C operating environment.

PMCs utilize a PCI/PCI-X bus for interconnect with a carrier and other modules. PCI is a shared bus architecture, where each device presents an additional load on the common bus signals. The ability to meet signal integrity requirements limits the number of devices that can directly talk to one another at any given frequency. Transparent bridges can be used to create clusters of devices that operate at higher frequencies, with a tradeoff of higher transaction latency. When a point-to-point architecture can be used, PCI-X can be operated at 133 MHz. This yields a theoretical maximum data rate of around 1 Gbyte/s over a 64-bit bus.

PMCs are not hot-swap devices, making them unsuitable for the kind of high-availability systems that are beneficial to modular computing. They are directly mounted into the host carrier. However, in the case of a CompactPCI PMC carrier card, the entire assembly can be hot-swapped. If the hot-swap carrier card is populated with more than one active module, then all modules much be taken out of service to replace the one faulty module. Clearly, this is not advantageous in a clustered modular computing environment. As an example, consider a PrPMC card consisting of dual Freescale 7448 1 GHz G4 PowerPC processors, a system controller, 1 Gbyte of DDR DRAM, 64 Mbytes of flash and two Gigabit Ethernet ports (Figure 3).

The first obstacle beyond fitting all the components in the PMC form-factor was satisfying the power requirements. It was a requirement that all power be dissipated from the 3.3V power rail, limiting the total power available to less than 25 watts. While a 25-watt maximum is allowed for a PrPMC, most carriers are designed to accept only PMCs, limiting the selection of carriers on which this PrPMC could be placed. Additionally, this power ceiling restricted the speed at which the CPUs could operate, forcing one customer to spend resources optimizing software to reach original goals.

The next challenge was cooling the CPUs, bridge, DDR SDRAM and Ethernet PHYs. Fortunately, a 15 mm stacking height form-factor was allowed, which yielded much more room than the typical 10 mm for a heat sink. This challenge required the creation of a novel heat sink, which covered both CPUs and system controller, allowing maximum surface area for cooling. However, the custom nature of the heat sink added unforeseen complexity to the final product.

Another PMC design issue was a requirement to have a watchdog timer perform a full hardware reset of the module, including the system controller. The PCI architecture includes a global reset asserted by the carrier card. This made it difficult to allow the module to act as a separate entity, capable of completely resetting itself without disrupting other members of the PCI bus.

Making the design fit into a PMC required several tradeoffs. These included specific PMC carrier card requirements, performance limitations due to power budgets and sacrifices in functionality. Had the design been based on the AMC standard, several of these tradeoffs could have been addressed.


AMC

As a more recent standard, the AMC architecture is able to provide several key advantages over PMC while maintaining a similar form-factor. Advantages include improved power capability, heat distribution, interconnect technology and support for high availability.

The maximum power consumption of an AMC is 60 watts. Instead of providing several separate power rails, AMCs consolidate all power onto a single +12V rail. Today’s designs need to support ICs that operate at many different voltages. A typical design may need 2.5, 1.8. 1.5 and even 1.0V. Utilizing a single supply to derive all onboard voltages eliminates the need for board and system designers to balance power across multiple voltage rails, making power distribution much cleaner and easier to design.

The AMC standard also improves the ability to cool high-power components such as CPUs and bridges. The AMC specification provides several options for module height. A full height AMC module provides a cross-sectional area of 22 cm2. This is a 57% increase in area as compared to a 15 mm PMC module. This additional area provides the necessary room for airflow and allows larger fins for greater heat sink surface area.

As processing performance increases, so does the I/O bandwidth. The AMC architecture supports three high-speed interconnect options: PCI Express, Gigabit Ethernet and RapidIO. Having three scalable options gives system designers much greater flexibility in meeting bandwidth requirements. As an example, PCI Express is a serial, lane-based interconnect that can support 1, 2, 4, 8, 16 or 32 lanes, each providing approximately 200 Mbytes/s of bandwidth.

AMCs also provide a important feature that PMCs lack: the ability to isolate and hot swap faulty modules. A malfunctioning PMC module requires removal from its PMC carrier card for servicing, and could even misbehave on the shared PCI bus, causing other modules or the entire system to malfunction. The point-to-point nature of the AMC interconnect standard allows a faulty module to be contained and removed for servicing without interrupting other system functions.

The AMC standard also includes an IPMI manager running on an auxiliary low-cost, low-performance microprocessor, and allows the AMC carrier to access information such as module power requirements, power supply health, temperature and other status. With this information, the AMC carrier can decide to shut down modules that are over-heating or about to fail. As an example, consider migration of the previous design to the AMC equivalent. The major changes include the migration from PCI/PCI-X to PCI Express and the addition of an IPMI controller.

Processor performance no longer needs to be sacrificed due to power and heat dissipation constraints. The 60-watt maximum AMC power easily allows the processors to run at maximum speed. The additional cross-sectional height also allows the use of standard heat sinks, saving the costs and complexity associated with exotic custom heat sinks.

The watchdog timer reset challenge that was identified in the development of the PrPMC module is no longer an issue. Since PCI Express is hot-swappable and point-to-point, the module can be locally reset, disabled or removed without disturbing the host and other modules. The AMC specification allows the full performance of the embedded SMP system to be realized. In addition, it adds features that improve the module’s ability to satisfy the requirements of modular computing.

AMCs provide a feature-rich solution to today’s multiprocessor needs. As the market continues to demand higher power, higher bandwidth and higher availability, the AMC standard provides room for SMP designs to grow.

While AMC modules are currently being deployed on AdvancedTCA carrier cards, PICMG, the developer of the AMC standard, is working toward a chassis and backplane standard. This system, called MicroTCA, will expand the possibilities of the AMC module and its ability to function as a building block to the next generation of cost-effective multiprocessing platforms.

Extreme Engineering Solutions
Middleton, WI.
(608) 833-1155.
[www.xes-inc.com].