SOLUTIONS ENGINEERING
CompactPCI 2.16 and Beyond
Cluster Computing and I/O: What Blade Servers Miss, CompactPCI Can Hit
CompactPCI came into being primarily to solve I/O platform needs back in the mid-1990s, with a single (or sometimes redundant) control processor board connected to many I/O boards. The inventors were focused on industrial automation and control applications, but it soon became clear that this was a good foundation for telecommunications systems.
BOB EHLERS, PERFORMANCE TECHNOLOGIES
The first blade server was a pre-PICMG 2.16 product, designed to port the CompactPCI form-factor into a blade server architecture. This early predecessor to today’s proprietary blade server architectures was ahead of its time and set the standard for many of the features that are commonplace among blade servers today, including Ethernet switch blades, comprehensive hardware management and processor boards. Shortly after this first blade server hit the market, a variety of server manufacturers announced initiatives to produce their own versions of blade servers, and a new market category was created.
PICMG 2.16 was a bellwether for the migration of application processing into the telecommunications network. With IP available to interconnect I/O and computing/control elements in a system now readily available, CompactPCI became an ideal platform for third-generation networks. Lightweight applications could now be placed on x86, SPARC and PPC, directly adjacent to I/O interconnects such as TDM trunk and line cards, serial cards and DSPs.
Thermal Demands of Computing
Expectations in the market regarding computing architectures have traditionally been driven by Moore’s Law: “Computing capacity (transistors per chip) doubles every couple of years.”

Unfortunately for system designers, the relentless push for greater computing capacity has also resulted in greater heat being concentrated in a much smaller space. As such, a system that was designed to support the thermal demands of a processor two years ago, may deliver less than half of the cooling required by today’s processors (Figure 1).
And this is where the road forks for blade servers and CompactPCI.
The blade server market has, for the most part, taken the approach of accepting the computing architectures as they roll out of the silicon manufacturers. In order to maximize computing performance, they use a single, monolithic processor with the highest density of transistors, the largest amount of internal cache and the fastest clock speed available. If a processor requires more cooling, reduce the number of slots and increase the airflow in the system. This paradigm has worked well, up to this point.
Recently, however, CPU manufacturers have started to run into trouble. Back in December of 2002, Intel’s Andy Grove commented that the end of Moore’s Law might be near, pointing to issues such as power leakage from inactive processors as a significant impairment to continuing on with more dense chips. There could be another approach.
What if processors were slightly less performance-oriented but much more thermally efficient? What if you could put two moderately performing processors into the same thermal envelope where a single monolith processor sits today? What if you could put four or even eight processors in that same space?
CompactPCI systems started as I/O-centric platforms from the beginning and have gradually migrated toward being more compute-intensive. As a result of being used as a computing platform, CompactPCI has suffered from the thermal demands made by computing silicon. It has also suffered from the form- factor itself. The slot pitch, the connectors for power delivery, the power bus and the ability to implement airflow solutions all contribute to a current limitation of about 80W per slot in CompactPCI systems.
Given this limitation, using the traditional approach of more integration and greater heat concentration, it looks like CompactPCI runs out of steam with a single 2GHz x86 processor per slot running at full clock speed with memory and support chips. With this limitation, can CompactPCI be effectively used as a computing platform?
Third-Generation Networks
The advantage that CompactPCI has over blade server architectures, and even over AdvancedTCA, is that it is a much more granular form-factor. The 6U Eurocard design allows for great density of slots (up to 21 across in a standard 19-inch rack). Each slot can have a discrete purpose, allowing for system configurations that incorporate a broad array of functions into a single platform. Blade servers and AdvancedTCA are both larger form-factors that have opted for a reduced slot count in favor of larger board space and/or increased airflow per slot.

As we consider the needs of the telecommunications market in particular, having the ability to combine features such as high-density DSPs (2000 channels or more in a PMC), high-density TDM access (24 E1 spans in a single slot), reasonable storage (600 Gbytes per slot) and reasonable computing (2 GHz x86) into a single platform is very attractive, particularly when the standard assures the availability of a broad array of component options from many vendors.
As mentioned earlier, blade server architectures tend to be more compute-centric and lack many of the discrete components of CompactPCI. In fact, most blade server implementations require I/O chassis external to the compute chassis, and the array of available I/O components is extremely limited. Even when a manufacturer opts to open their architecture to the industry, there are still fewer suppliers of these semi-proprietary components than suppliers in the truly open CompactPCI space. Customer choice will always be limited in proprietary architectures. Most importantly, these architectures are compute-centric, not I/O-centric.
So what about AdvancedTCA? Isn’t it a better heterogeneous platform for telecommunications than CompactPCI?
AdvancedTCA was specifically designed to accommodate the high wattage environments that are created by today’s leading x86 and network processors. The wider slot pitch allows for greater airflow over these massive processors, and the real estate available on AdvancedTCA allows for more flexibility in board design. CompactPCI cannot, in its current form, compete with AdvancedTCA on support for high wattage processing architectures or where lots of component space is required. AdvancedTCA is very well suited for telecommunication core applications, where thermal densities can be very high.
However, CompactPCI does have some advantages. First is its maturity. To a large extent, CompactPCI has achieved vendor-independent interoperability. The 2.16/2.9 backplane does a pretty good job of defining pins for interoperability, and manufacturers are now fairly familiar with what point specs have been adopted by other manufacturers and what traditional combinations of point specs they are likely to encounter as their customers assemble systems. Whether or not the CompactTCA specification will be ratified and all vendors will agree on a rigid system configuration specification, the industry knows how to deliver interoperable systems to customers.
AdvancedTCA is a very new specification and has not gone through this maturing process. It has taken CompactPCI almost 10 years to achieve the level of stability where interoperability is possible. With the emerging large array of point specs, the ability for vendors to define their fabrics and the virtually limitless permutations of backplanes that result from combinations of the “layered spec” approach, it could be some time before AdvancedTCA approaches the normalization point where true vendor-independent interoperability can exist.
Second are the design specifications to which existing telecommunication facilities have been built. Most central office buildings, telecom racks, power distribution architectures, air conditioning systems, etc. were built with assumptions of systems that are similar in design to CompactPCI. The racks are 19” and 42U high (just enough space for three 12U chassis, a pair of 2U routers and a pair of 2U switches). The racks can accommodate generally two 20 amp circuits (4 to 5 kW), which is just about what CompactPCI requires for fully loaded systems. AdvancedTCA, because of its ability to support higher levels of thermal density, is looking for more input power, more ambient air conditioning and more rack space. AdvancedTCA is very appropriate for core network applications, but is not universal in its fit and could be problematic at the edge where facilities are less robust.
Lastly, coming back to the original point, CompactPCI is a more granular architecture, which is what makes it attractive for so many applications. For example, let’s say a customer wants to design a system that has one large compute element board in it for some type of application such as a database, combined with some traditional I/O, which would otherwise not require lots of real estate and cooling. An AdvancedTCA or blade server architecture would require the customer to put in place a large chassis, a large fabric switch and a large form-factor I/O blade, all to accommodate the needs of one board in the system. With CompactPCI, on the other hand, the customer might opt to cluster together a pair of compute boards to handle the application and thereby significantly reduce the total cost of their architecture by not “overbuilding” all of the other elements of the system just because the architecture is not granular enough.
FCC Unbundling
Recently, the FCC announced the UNE-P rules, which basically free incumbent local exchange carriers (ILECs) from having to provide facilities to competitors at cost. Once implemented, the rules will force competitive carriers (CLECs) to find replacements for the facilities the ILECs were providing them. In many cases this will mean that the competitive carriers will be installing their own facilities.

So, where the large ILECs once had a single piece of equipment servicing the needs of several CLECs, we will now see the need for many smaller, more granular systems, each operated by a CLEC who previously only represented a fraction of the utilization of the entire system. Will each CLEC want to invest in large scale infrastructure from the beginning, assuming that their customer base will grow in a particular market? Or would each CLEC prefer to start with a smaller level of granularity and build up as their market expands?
In a market where a competitive carrier has a small but growing subscriber base, they will need to deliver a small service platform that can scale up as they grow. CompactPCI is ideal in this regard in that there are 1U, 4U, 7U and 12U platforms that all conform to the spec. So a competitive carrier could choose to install a 1U media gateway now, and later opt to install a 7U softswitch that integrates the boards from the 1U media gateway together with call control application processing boards, SS7 signaling boards and trunking boards. Later, they may want to move to a 12U system that integrates the softswitch with a feature server or media server to provide value added services. This may only mean adding a DSP board and some storage elements.
CompactPCI offers a roadmap for growth to the CLEC, whereas the investment in blade server or AdvancedTCA architectures may require the CLEC to purchase in advance the full chassis infrastructure that they think they will ultimately require, in order to house the first blade (Figure 2).
Computing Performance
One of the driving assumptions in moving to larger form-factors was the premise that application processing could best be accomplished in large, monolithic processors as opposed to clusters of smaller processors. This premise is quickly fading.
The world’s largest supercomputer, BlueGene, which was built by IBM for the U.S. Department of Defense, was built using a PowerPC architecture and incorporates approximately 32,000 CPUs in a clustered environment. Table 1 indicates the characteristics of various clustered supercomputing architectures.
If you average out computing performance for the cluster on a per CPU core basis, you get some base line numbers for predicting performance of clustered processors. The Power PC 44GX delivers about 2.8 GFLOPS per CPU, the Pentium 4 3.06 GHz Xeon delivers about 6.12 GFLOPS and the Opteron 2.2 GHz single core delivers about 4.4 GFLOPS.
Today, it is simple with tools such as OpenSSI (Single System Image) to replicate the clustered computing environments used in supercomputers on CompactPCI, AdvancedTCA and blade server architectures. OpenSSI allows any application that is designed to use process threading to run in a cluster with minimal porting activity.
Now, this is where the slot density, or granularity, of CompactPCI becomes meaningful. In looking at a typical telecommunications blade server architecture (like that of the IBM BladeCenter T using large Intel Pentium 4 processors) and comparing it to a CompactPCI-based computing architecture using a thermally optimized processor such as the AMD Opteron 64 dual core, slot density makes a meaningful difference in the number of Gigaflops of performance per 42U rack.
For instance, a BladeCenter T chassis can support eight blade slots and is 8U high, meaning that a 42U rack can support 40 slots. If you populated each slot with a BladeCenter HS40 dual 2.7 GHz Pentium 4 blade, you will have 80 CPUs, each with GFLOP performance of 3.47 for a total rack computing performance of 277.24 GFLOPS. This would cost approximately $440,895 or about $1,590.29 per GFLOP.
Now compare this to a CompactPCI architecture using a 12U chassis with 18 slots per chassis or 54 slots per rack. Each slot is populated with a Dual Core AMD Opteron 64 2.2 GHz blade, with each core having a GFLOP performance of 3.15. The total number of cores in the CompactPCI rack would be 108 with a total computing capacity of 340.07 GFLOPS and a cost of $293,583 or just $863.29 per GFLOP.
On a simple cost per GFLOP, the CompactPCI architecture is half as expensive as the BladeCenter T architecture and it delivers denser computing performance in the same 42U rack space. This is a function of granularity and the number of smaller processors that are supported in the architecture. So there is a simple economic argument that can be made in favor of CompactPCI-based clustered computing architectures.
Additionally, the granular, open standard architecture also supports a broad array of I/O options that the BladeCenter T architecture does not. Then consider the other benefits of a clustered architecture such as application availability, load balancing and scalability, and the argument in favor of a clustered computing architecture becomes very compelling.
One thing that has hampered CompactPCI has been the lack of storage in the system. Both AdvancedTCA and blade server architectures offer high-capacity, highly reliable, high-performance storage as part of their designs. But CompactPCI has a narrow slot pitch, which has precluded it from using anything other than 2.5” drives. These drives have been slow, relatively unreliable and lacking in capacity. Additionally, there has not been a shared storage mechanism that the industry could agree on.
In the last year we have seen new SATA and SCSI storage media, which not only are reliable, but also have good capacity and performance. Add to this the availability of IP storage, such as iSCSI, and storage is no longer an issue for CompactPCI.
Performance Technologies
San Luis Obispo, CA.
(805) 783-6153.
[www.pt.com].


Adlink
Elma