BROWSE ARTICLES BY TECHNOLOGY

DIGITAL EDITION

RTC Magazine Digital Edition

INDUSTRY NEWS

QUICK DOWNLOADS

RTEC10 is an index made up of 10 public companies which have revenue that is derived primarily from sales in the embedded sector. The companies are made up of both software and hardware companies being traded on public exchanges.

COMPANY PRICE
(USD)
CHANGE
 
Adlink
1.22
-1.781%
Advantech
3.02
-0.889%
Concurrent Comp
3.58
-3.241%
Elma
474.00
0.173%
Enea
5.31
-1.918%
-   Interphase5.130.000%
-   Kontron0.00
Mercury Comp
14.04
1.299%
Performance Tech
1.83
-2.032%
PLX
3.22
-0.617%
Radisys
7.39
0.271%
52 WK HIGH 52 WK LOW MKT CAP (Million USD)
1.24
1.15
167.08
3.06
3.02
1,668.57
3.66
3.51
32.95
474.00
474.00
108.30
5.34
5.00
93.75
5.155.1235.37
0.000.000.00
14.05
13.69
429.77
1.83
1.72
20.36
3.25
3.20
143.40
7.52
7.23
204.97
RTEC10 Index: 603.86 (-4.75%)
RTEC10 is sponsored by VDC research

INDUSTRY WATCH

SYSTEM MANAGEMENT

Integrating High Availability with Network Management

Building a highly available and manageable network element is quickly becoming a reality. However, the integration and testing of these building blocks still poses a significant challenge on the road to COTS adoption in the telecommunications market.

ASIF NASEEM & HAKAN MILLROTH, GOAHEAD SOFTWARE & TAIL-F SYSTEMS

  • Page 1 of 1
    Bookmark and Share

During the last decade standardization empowered the enterprise computing industry to put together systems using Commercial Off-the-Shelf (COTS) building blocks designed by a variety of hardware, operating system and applications providers. Enterprises were relieved to have an alternative to vertically integrated systems that locked them into proprietary architectures from individual vendors. This successful transition from the vertical to horizontal business model is now showing up in other markets as well–especially in the telecommunications industry.

Building a system using COTS building blocks still has its challenges. System developers may have to deal with issues like inadequate software APIs, overlapping functionality, duplicate data stores and memory footprint constraints. To unleash the benefits of a COTS ecosystem, pre-integration and testing among suppliers and adherence to industry standards and open specifications is required.

The encouraging news is that the industry is recognizing this need and key alliances are being formed in response. Let us first look at the anatomy of a highly available and managed network element. The essential set of services required to create such a system can be broadly classified into four categories (Figure 1).

High-Availability Services

Forming the centerpiece of any highly available and managed platform, high-availability services typically comprise a sophisticated availability management framework and key functionality, that in combination, ensure continuous service availability in the presence of failures–hardware or software. The Availability Management Service (AMS) implements two key functional areas: a comprehensive system model of all managed resources, and an associated state model that defines and governs the state of these resources. Managed resources can include applications, operating system, chassis, I/O cards, redundant CPUs, networks, peripherals, clusters and other software.

The availability system model represents each system resource to be managed as a managed object in the system model. It also captures resource dependencies, including critical relationships that form a given service. It manages objects with attributes for health, operational state, administrative state, role, availability status and dependencies and also has methods for access/control, monitoring and configuration. It is also responsible for implementing powerful recovery policies, such as 2N, N+1, N+M, Active/Active, etc. The AMS state model determines the state of each object, such as healthy/failed/shutting down, active/standby, locked/unlocked and enabled/disabled. Intelligent recovery decisions are made by the AMS based on detailed information about each resource’s attributes and methods to apply during a failure.

The presence of redundant resources is a critical requirement for highly available systems. Redundant resources can be configured in various ways to provide standby for active resources in case of a failure. Clustering is responsible for such a configuration. The AMS manages the service availability of hardware and software resources in the cluster. The clustering service is responsible for discovering, incorporating and monitoring the nodes within the cluster along with their associated network interfaces. An efficient messaging engine communicates the addition or failure of nodes and their network interfaces to AMS and any other relevant management applications. The clustering service also works with AMS to provide manager node redundancy to eliminate the manager node as a possible single point of failure.

Stateful failover is a key attribute of highly available systems, especially those required to provide seamless, often real-time, processing failover. Checkpointing capabilities allow such systems to collect state information about the applications (e.g., any in-progress calls on an active node) and replicate it to a standby node to take on call processing should the active node fail.

Management Services

Effective monitoring, access and control of various hardware and software resources in a highly available platform require sophisticated management. The SA Forum’s Hardware Platform Interface (HPI) specification has greatly facilitated the standard implementation of hardware platform resource management capabilities. The HPI provides a set of APIs for discovering, monitoring and managing hardware resources on compliant platforms. Implemented by hardware platform providers and used by the developers of management middleware, the HPI significantly reduces the time required to design and develop the system model representation of hardware resources via default resource management capabilities. Additionally, the Platform Resource Management Service (PRMS) provides a framework for developers to create custom alarm and hot swap management policies to enhance manageability and availability of their target platform.

Many telecom applications demand advanced alarm management and hot swap capabilities, which provide fault detection of system hardware resources and a set of pre-defined action policies. Alarm management is the process of monitoring a system for conditions that may jeopardize healthy operations, and implementing policies to take appropriate action. The Alarm Management Service (ALMS) provides the ability to deal with both hardware and software alarm conditions. The hot swap management Service (HSMS) allows hot swap insertion and extraction sequences for the field-replaceable hardware resources in the system. Such hardware resources, commonly known as field-replaceable units (FRUs), enable an operator to insert or extract the FRU from the system while power is currently applied to the system. Hardware platforms that support this capability provide flexibility and reduce service interruptions, because the service provider does not need to power off the equipment to replace failed hardware components or to expand the capacity of the system.

A management information base (MIB) module is a process that enables access to management information and thus facilitates management of various applications and system components. A MIB module interacts with an SNMP agent rather than directly interacting with an SNMP manager. All communications occur with the agent, which relays messages as appropriate.

Configuration Management Services

The need for more sophisticated management interfaces developed as network elements became more complex. A command line interface (CLI) and SNMP Agent are no longer sufficient northbound interfaces. Network administrators expect to find Web and NETCONF interfaces available as well. As carriers leverage the power of chassis-based platforms to deploy applications such as IMS and security, they need to manage multiple applications with a common management interface.

Since CLI and Web interfaces face the customer, it is important that they can be customized to meet the branding and presentation requirements of their user community. For example, CLIs often are styled after Cisco and Juniper’s CLI command line environments with all the well-known features and idioms from these CLIs. Web UIs need to be modern and dynamic and benefit from the ability to refresh data on a real-time basis.

Management interfaces must be in a framework that ensures data consistency, provides session management, and delivers the requisite level of security for all users across all interfaces. A stove pipe approach to development is likely to put considerable strain on developers as recoding is required for each interface whenever a managed object is changed. In contrast, using a shared software backplane and common data model can save significant development time that otherwise would have been invested in duplicate coding on separate northbound interfaces.

Modern network applications require multiple managed objects to be configured at one time to provision services like MPLS, VPNS and VoIP. If all the configurations are not correctly and consistently made, the service will not be provisioned and the network may be disrupted. The industry has responded to these issues by working on a new standard for automated configuration management called NETCONF. The NETCONF protocol (RFCs 4741 and 4742) provides mechanisms to install, manipulate and delete configuration data on network devices. It uses XML-based data encoding for the configuration data as well as for protocol messages. NETCONF is increasingly required by network operators to configure and provision large networks in an efficient and reliable fashion.

NETCONF capabilities include transaction management, validations and rollbacks. For example, a configuration change will be initially written as a candidate, and after a configured interval, devices automatically revert to their original configuration, unless the change has been confirmed by a second, confirming commit. Administrators can use this capability to test configurations that may potentially degrade or disable connectivity. Transaction management has become a critical component of configuration management services used in carrier networks and data centers. It is recommended that this capability be part of Web and command line interfaces.

System Services

Finally, key services are required to facilitate the design and integration of various building blocks to construct a highly available, easily manageable and network-ready platform. Examples include an efficient (i.e. small footprint) and fast distributed message engine (DMS) used as the main infrastructure for intra-node and inter-node communication. A cluster management service (CMS) implements functionality such as automatic node discovery, health monitoring, custom cluster policies and virtual IP address management. A browser-based console is used as a development tool to facilitate implementation and testing, which can be extended to serve as an operational console during deployment). Quite often an in-core, fast and efficient data store capability can facilitate quick storage and retrieval of system data, such as configuration and checkpoint information.

The Integrated Platform

Tail-f Systems and GoAhead Software have teamed to provide an integrated application-ready platform that provides state-of-the-art high availability, configuration management and platform management capabilities.

Tail-f’s ConfD provides an XML-based application for developing on-device network management systems. ConfD includes key northbound interfaces, a shared transaction-based software backplane, and a management infrastructure delivering carrier-grade performance and security. GoAhead’s SelfReliant provides high-availability management capabilities to equipment manufacturers who design and develop continuously available systems and applications. A catalyst for the standards-based COTS ecosystem, SelfReliant enables rapid development of new products based on proven HA infrastructure and standardized underlying components.

The two applications are integrated as a managed object whose availability is represented and managed through AMS in the system model. Pre-defined components manage the availability of the ConfD through SelfReliant. All relevant physical resources are represented as managed objects in the AMS system model. Each resource has a remote adapter that resides in a proxy application (Figure 2). The remote adapter defines the relevant methods, which invoke functions to instruct ConfD when and how to react to high-availability events (e.g. activate and switchover). These custom functions are specific to the SelfReliant ConfD Service and serve as the interface between the remote adapter and a ConfD instance. The ConfD library defines this custom interface for the relevant functions.

The proxy applications house remote adapters for ConfD instances. The proxy application contains two remote adapters, one for the proxy application and the other for the local data store. The data store remote adapter interfaces with the availability manager to receive events, upon which the remote adapter calls specific corresponding methods in the ConfD library. For example, when AMS assigns the active role to a ConfD instance, the activate method is invoked for ConfD’s remote adapter, located in the proxy application. The proxy then uses the functions provided by the ConfD interface to inform the local ConfD instance to start performing active service. The proxy application remote adapter registers with the availability manager, but does not receive any events from the availability manager.

It is important that the configuration management services allow developers to describe their networking application once and then automatically render all northbound interfaces, e.g., NETCONF, CLI, SNMP and Web UI, etc., from that single underlying model. In other words, for a coherent operation it is highly desired that all northbound interfaces implemented within the system coexist with each other and operate using the Confspec adaption layer. This allows communication with the built-in configuration database and the data provider API and instrumentation. The configuration database (CDB) maintains application configuration in RAM, as well as on a file system. The Data Provider API (DPAPI) is used as a pass through to allow SelfReliant to store data required for the bootstrap process and essential HA operations (Figure 3). The file system may reside on disk or flash memory. However, users may choose to keep the configuration on persistent storage only.

GoAhead Software
Bellevue, WA.
(425) 453-1900.
[www.goahead.com].

Tail-f Systems
Leesburg, VA.
(703) 777-1936.
[www.tail-f.com].