BROWSE ARTICLES BY TECHNOLOGY

DIGITAL EDITION

RTC Magazine Digital Edition

INDUSTRY NEWS

QUICK DOWNLOADS

RTEC10 is an index made up of 10 public companies which have revenue that is derived primarily from sales in the embedded sector. The companies are made up of both software and hardware companies being traded on public exchanges.

COMPANY PRICE
(USD)
CHANGE
 
Adlink
1.22
-1.781%
Advantech
3.02
-0.889%
Concurrent Comp
3.58
-3.241%
Elma
474.00
0.173%
Enea
5.31
-1.918%
-   Interphase5.130.000%
-   Kontron0.00
Mercury Comp
14.04
1.299%
Performance Tech
1.83
-2.032%
PLX
3.22
-0.617%
Radisys
7.39
0.271%
52 WK HIGH 52 WK LOW MKT CAP (Million USD)
1.24
1.15
167.08
3.06
3.02
1,668.57
3.66
3.51
32.95
474.00
474.00
108.30
5.34
5.00
93.75
5.155.1235.37
0.000.000.00
14.05
13.69
429.77
1.83
1.72
20.36
3.25
3.20
143.40
7.52
7.23
204.97
RTEC10 Index: 603.86 (-4.75%)
RTEC10 is sponsored by VDC research

TECHNOLOGY IN CONTEXT

ATCA: Telecom and Beyond

An Architect's Checklist: Designing a Carrier-Grade, Application-Ready ATCA Platform

A network element constructed with COTS components can save development effort upfront, as well as reduce maintenance and upgrade headaches downstream. Here is a checklist of considerations for creating a carrier-grade, network-ready system.

DR. ASIF NASEEM, GOAHEAD SOFTWARE AND SIMON STANLEY, HEAVY READING

  • Page 1 of 1
    Bookmark and Share

Article Media

In this day of service-oriented architectures that offer converged data, voice, video and mobile services on the same IP-based network, continuous service is an increasingly important requirement. However, designing a network element for deployment in a network expected to provide uninterrupted service is a complex task. Such a task requires specialized expertise, and significant resource and time commitment. While the availability of commercial-off-the-shelf (COTS) building blocks makes it easier, designers must follow a disciplined approach to ensure the resulting system meets the most critical requirements.

#1: COTS or Not?

The first decision to make is determining where to leverage COTS components throughout the system. A vibrant COTS ecosystem including standards-based hardware—shelves, blades, components, etc.—and high-availability middleware makes it very compelling to put together a system that accelerates time-to-market and revenue for even the most complex telecommunications applications. Empirical data in the industry is emerging that reinforces this assertion (Figure 1). This data shows that in-house development of a network element takes anywhere from two and one-half years to four years from concept to commercial deployment. Before revenue-generating applications are developed, significant R&D time and effort goes into designing the platform. Cost and effort can be saved if such a platform is acquired from the ecosystem, and resources quickly applied to the development of applications and services. A side benefit of such an approach is that since the platform has already been put together with pre-tested and pre-integrated COTS components, the overall testing and quality assurance effort is also reduced.

Choosing a standard platform like ATCA does not however mandate the use of COTS components throughout. Equipment providers can opt to use in-house solutions for any or all of the system, taking advantage of the standard platform with custom or cost-optimized components as required. Many companies use a mix of COTS and in-house design, working with suppliers to integrate and test a common platform for multiple products.

#2: Choosing a Platform

Once the COTS question has been answered, it’s time to choose a hardware platform. While the market offers many compelling choices, traction around ATCA is rapidly increasing. According to a recent survey conducted by Light Reading, 50 percent of the NEPs reported that they are developing systems using ATCA hardware. Furthermore, the survey data shows that an increasing number of telecommunications applications are beginning to use the ATCA systems. When choosing ATCA hardware, careful consideration must be given to several factors; key among them are switching options, fabric capacity, configurability, and thermal and NEBS validation.

The ATCA specifications provide many options for configuration of interfaces and switching across the backplane. Options for the fabric interface include Gigabit Ethernet, Gigabit Fibre Channel, 10 Gigabit Ethernet and RapidIO. The designer must keep in mind future upgradability when choosing a particular switching option. Over the past five years the performance of Ethernet-based ATCA systems has grown dramatically. In 2004, the first ATCA systems with a single gigabit Ethernet interface per blade had a total chassis capacity of less than 70 Gbits/s. With 10 Gigabit Ethernet switching, the capacity of second-generation ATCA systems has grown to almost 160 Gbits/s, and will grow further to 600 Gbits/s for third-generation systems with 40 Gigabit Ethernet (Figure 1).

ATCA systems not only offer many different options for ATCA blades but also enable the use of advanced mezzanine cards (AMCs). AMCs are hot swappable and have integrated system management functions. AMCs can be used in ATCA, MicroTCA and other platforms. There are multiple configurations for AMCs including single or double width, and full, mid or half-size modules.

Thermal and NEBS validation are a key part of ATCA-based system development. Quad core CPUs, 16 core NPUs and multicore DSPs are pushing the limits on power delivery and cooling. Chassis airflow varies from slot to slot, cooling efficiency depends on the airflow direction and distribution, airflow depends on board topology, and theoretical airflow data is typically based on unrealistic lamina flow tests. To ensure compliance, operating limits should be verified over the entire large range of target conditions and environments.

Finally, NEBS compliance is usually a key requirement for carrier-grade systems. Systems integrators and blade suppliers have already completed significant testing of components as well as application-ready platforms. Designers can take advantage of this fact and plan accordingly.

#3: Platform Independence and Portability

The next step is to ensure changes can be made down the road. Designing this flexibility into the system is important not only at the hardware layer, but at the middleware and application layers as well. A key advantage of using accepted industry standards is that various functional layers can be abstracted from one another, making it possible to create and/or modify one layer without impacting others. The commercial acceptance of ATCA, CGL and service availability middleware based on the SA Forum open interfaces has enabled a system design where the hardware, the operating system and the middleware can each be acquired from different suppliers. NEPs no longer have to be locked into a vertically integrated, proprietary architecture from a single vendor. Furthermore, since the layers are based on open standards, each can be replaced should the NEP decide to switch a particular supplier with another. A high level reference architecture enabled by the industry standards is depicted in Figure 3.

While choosing a suitable hardware platform, the designer should consider if it offers the libraries specified by the Hardware Platform Interface (HPI) specification. This specification gives the designer the flexibility to design overlaying functional layers in a way that is platform-agnostic, preserving the ability to replace, change or add hardware platforms without having to modify other system components. Similarly, the SA Forum Application Interface Specification (AIS) offers a designer the ability to author applications that are abstracted from the underlying layers, making them portable across other platforms that are based on the AIS standard.

#4: Performance

Performance is the next key factor. In telecom devices, the performance requirements—especially failover times—depend on whether the processing is in management, control or user plane. In terms of high-availability (HA) requirements for a device at the edge of the network (e.g., a home router), there are no failover requirements since the significance of an outage is of little or no consequence. Only a single user (or an end-point) is affected and a simple reset resolves the issue. However, at the core of the network an outage affects many more users and has a much greater impact.

As the functional density in a device or a system increases, the HA requirements can emerge where they did not exist before. For example, in an integrated security solution for telecom, significant HA requirements emerge when the various stand-alone devices that previously performed that function are consolidated into a large system. A cable TV universal edge device is another such example where the number of subscribers supported by one of these devices increases to the point where the impact of an outage would affect a large number of customers.

Allowable recovery times—specifically failover performance time—in general, are budgeted to best fit the needs of the appropriate level. Management plane recovery times (e.g., for SNMP) are relatively generous, often in seconds, since the device’s functional performance typically can continue even if management actions are not available. On the other hand, control plane recovery times (e.g., for signaling protocols), are more demanding and can range between seconds and milliseconds, since the device must pick up and connect and tear down incoming connection requests within strict timeframes.

Even more demanding, user or data plane recovery times for real-time data, for example in the case of an in-phone call, must be extremely fast to ensure no perceivable interruption to the service. This requires failover times in the millisecond to microsecond range. Upfront allocation of failover time budgets during the design phase is critical to ensure desired HA performance of the deployed system.

#5: No Single Point of Failure

The most critical requirement of a system expected to provide uninterrupted service is the elimination of all single points of failure (SPF). It is a fact that some hardware or software components will fail at some point in time. The designer must identify possible failures and ensure they aren’t realized. Redundancy is a popular scheme employed to achieve continuous service availability even in the presence of failures. Using a reference architecture can help a designer identify and address possible points of failure within a system. Consider a reference design created around an ATCA chassis, multiple operating systems and the SA Forum framework (Figure 4). The five-blade system provides redundancy including the following:

• Redundant base and fabric backplane interfaces

• Redundant platform support including power, cooling and shelf manager

• Redundant HPI interface available to the service availability middleware

• Redundant service availability middleware managers and client processes

In this example, for each active component there is a standby that is ready to take over the processing on a redundant physical resource, should something happen to the active resource. A 1+1 redundancy model is employed for resources such as the management capabilities, HPI, operating systems and various platform support resources. Both 1+1 and 2+1 redundancy models are applied to multiple applications.

#6. Upgradability

Last but not least, a design that considers upgradability ensures an operational system can be upgraded as new versions of software or hardware become available, without impact on service. The system designer must be clear about the properties that today’s code base must exhibit in order to successfully participate in an in-service upgrade in the future. For example, the designer must consider the impact of future potential changes in messaging, protocols, APIs, checkpoints of configuration data layouts, etc.

The most common in-service upgrade is the rolling upgrade where all active processes migrate elsewhere as a node comes down, this node is upgraded with the new version of the software, and the node is rebooted and brought back into the system as the standby node. This process is repeated until all other nodes in the system are upgraded.

Several key issues must be thought through in the design phase to ensure upgradability. First, it is important to determine the nature of the intended upgrade. Upgrading for functional improvements is relatively straightforward, whereas upgrading for structural changes can be challenging. Designing for upgradability means that the service must be maintained throughout the upgrade process. Most systems have one or more single points of failure that should be identified and addressed so that their impact is eliminated or minimized during an upgrade. Downgrade requirements must also be considered. Upgrades do go wrong, and the design must account for a plan to roll back to the previously known operational versions. Although the availability of COTS building blocks helps aid the designer of a highly available network element to get solutions to market more quickly, it also poses additional considerations that must be thought through during the design phases.

GoAhead Software
Bellevue, WA.
(425) 453-1900.
[www.goahead.com].