AdvancedTCA Lifts HA to a New Level

For critical systems such as those comprising a telco central office, the importance
of high-availability operation keeps going up. Demands on these systems include
not only ensuring the availability of services and important data, but also efficient
resource sharing of the relatively expensive components. High-availability environments—one
requiring 99.999% failure-free operation or greater—need a robust and efficient
solution to high availability and application failover.

Building that level of functionality calls for an HA hardware platform with features
such as redundancy, hot swap and an efficient and fast HA middleware. Also critical
are automated provisioning and deployment of hardware to support scaling system
resources to meet increased demands for customer services. The emerging AdvancedTCA
system architecture was developed to be just such a platform. The hardware features
of an AdvancedTCA platform help HA middleware to achieve resource scaling along
with fast failover—less than 30 msec.

Fault Management Cycle

HA middleware achieves application failover by performing a number of tasks. This
set of tasks, called the Fault Management Cycle, includes detection, prediction,
diagnosis, isolation, recovery and fault repair. In the detection phase, the HA
middleware identifies an undesirable condition by checking the health of every
entity on a blade such as applications, OS running on the blade and hardware entities
such as network port or memory faults, for example. The Intelligent Peripheral
Management (IPM) device or Baseboard Management Controller (BMC) on each blade
monitors several hardware/physical entities such as temperature, voltages and
processor(s) on the blade.

In the event of a fault, the BMC logs the events and faults in its local database,
and informs the Shelf manager. There may also be watchdog timers that monitor
certain events, and in case of time out will inform the BMC about such events.
The HA middleware can directly access the BMC’s event logs to find out
about such events, thus avoiding the necessity of navigating through layers
of software applications (Figure 1) and operating systems.

In the diagnosis phase, the HA Middleware analyzes faults to determine their nature
and location. It will also perform root cause analysis to identify the underlying
issue. Next, the isolation phase contains the problem or prevents the fault from
resulting in a system failure.

The recovery phase involves restoring the system to expected behavior including
policy-based actions such as restarting the failed application or switching over
to a standby node; or the Shelf Manager and the board’s BMC can assist by
rebooting or restarting a node. And finally, the repair phase entails restoring
a system to full capability utilizing redundant boards, or replacement of hardware
or software components.

Performance Management Cycle

Just as with hardware faults, when performance of one of a system’s components
becomes taxed to its limits, availability of customer services may be affected.
HA middleware predicts, diagnoses and helps to avoid performance degradation.

As part of the Performance Management Cycle, the HA middleware stack (Figure 1)
should perform the following tasks to provide high availability of services: prediction,
diagnosis, isolation, prevention and administrative. Or at a minimum, it should
initiate a separate provisioning and deployment infrastructure to take action.

The prediction task is where the HA middleware identifies an undesirable condition
by checking the performance of various entities on a blade such as applications,
OS running on the blade and hardware entities such as CPU utilization, memory
utilization, network port or storage IO bandwidth utilization and so forth.

During diagnosis the HA middleware will analyze instances of early degradation
of system performance by monitoring the performance of modules or components which
may affect system level performance, and determines the appropriate numbers of
and methods for bringing additional resources on line to maintain service levels.
Next, the isolation task contains the problem with the goal of preventing the
fault from resulting in a system failure. Prevention typically means scaling system
resources to maintain required performance levels including: policy-based actions
such as provisioning and deploying additional boards to handle the load.

When restoring a system to full performance capability including redundancy, the
Shelf Manager is expected to be an active participant in automatic provisioning
and deployment of additional standby boards. When standby resources are utilized
to scale system resources,

an administrative task of System Management should be to automatically initiate
a request for installation of additional standby hardware resources (boards) based
on predetermined routine service schedules for the facility.

Two Types of Application Failover

Failover can take place in either of two ways: to a standby application running
on the same board (case one), or to a standby application running on a different
board (case two). In the first case, a standby application is running on the same
board and the data and state of the active application is checkpointed to the
standby application on the same board. If the application fails, the standby application
on the same board takes over.

In the second case, a standby application is running on a redundant standby board
that is the same as the primary board. The update ports can be used to checkpoint
the data and the state of the active to the standby application on the redundant
board. If the active application fails, the standby application on the redundant
board takes over.

Every application registers its health status with a Failure Detection Agent
(FDA) that is a part of HA management middleware (Figure 2). The FDA keeps a
timer corresponding to each application for which it is responsible. As soon
as the application informs the middleware of its health status, the FDA resets
the timer to a pre-specified value and starts to decrement it at a fixed interval.
Note that the timer value is a tunable parameter and will be different for each
application depending on the criticality of the application.

For example, an SS7 application will be deemed as critical and the timer value
might be less than 30 milliseconds, whereas a non-critical application may have
a larger timer value. If the watchdog timer for an application times out, then
the FDA will need to take an action depending on the criticality of the application.
For the critical applications, a recovery procedure needs to be initiated immediately,
whereas, for a non-critical application, the FDA may ping the application to check
on its heartbeat. The heartbeat is a periodic check by middleware between an application
running on a primary board and the same application running on a standby board.

AdvancedTCA as an HA Platform

Developed specifically for next-generation network platforms by more than 100
companies in the telecommunications industry, the AdvancedTCA (ATCA) platform
spec features a rich set of mechanical and electrical improvements over existing
slot-board system platforms.

ATCA’s platform mechanicals were developed around NEBS and ETSI standard
telecommunications equipment practices with a forward thinking perspective on
equipment that will use next-generation fabric interconnects for boards used in
constructing new systems. The platform specification was developed to provide
a robust power distribution system enabling required redundancy, while also providing
redundant cooling systems capable of supporting individual boards with up to 200W
power requirements.

The ATCA specification also calls out requirements for a backplane that supports
the redundant power distribution, redundant IPMB buses for management and fabric
interconnects. The first fabric interconnect is called the Base Interface, which
is 10/100/1000 BaseT Ethernet and provides, at minimum, IP layer 3 communications
between installed boards in the shelf that support this interface. Another fabric
interconnect using a SERDES interface supports a variety of implementations for
high-speed communications between boards. This fabric interconnect may be implemented
as a star using a fabric switch board, or alternately as a full mesh and no switch
board on the fabric interface. Redundant communications are supported on the mesh
and on a dual star using redundant fabric switch boards.

Management: A System’s View

The platform management features of ATCA were developed with a system view
in mind. It includes specific platform requirements such as Intelligent Peripheral
Management Interface (IPMI) management with extensions to support a modular
platform, dual Intelligent Peripheral Management Bus (IPMB) buses for redundancy
and Shelf Management Controller (ShMC) functionality, as well as specified functionality
to enable system management. Included in the ATCA platform management requirements
is a prescribed functionality for communications between the Shelf Manager and
a System Manager using Remote Management Control Protocol (RMCP) to encapsulate
IPMI over a LAN interface. Figure 3 illustrates a logical view of AdvancedTCA
system management showing the relative interfaces.

There is some flexibility in the implementation of the AdvancedTCA system management,
provided the required functionality is present. For example a dedicated (redundant)
Shelf Management Module (SMM) may be implemented that combines the functionality
of the ShMC—which links IPMI between shelf modules and boards—and the
Shelf Manager.

Alternately shelf management can be componentized with the ShMC on one of the
boards installed in the shelf, such as a fabric switch board or a node board,
while the Shelf Manager functionality may reside elsewhere, including external
to the shelf. Communications between the Shelf Manager and a System Manager are
defined as RMCP (Remote Management Control Protocol) or encapsulation of IMPI
over a network. This encapsulation protocol defines a common IP Layer 3 interface
for managing an ATCA shelf from a System Manager.

A
number of ATCA components can be used for fast application failover (Figure
4). Among these are an Update Channel, an IPMB Bus, a Shelf Manager and fabric
interfaces. The Update Channel provides a low-latency board-board channel. It’s
used to check the heartbeat of board. The Update Channel provides a bandwidth
of 2.5 Gbits/s and can be used for checkpointing purposes. The HA middleware
can use this channel to synchronize with the neighboring node. The switch fabric
bypass interface of the update port can be used for sending check-point application
data and state information to a standby application running on another board,
thus bypassing the latency of the switching fabric.

The HA middleware can use the IMPB Bus to communicate to the Shelf Manager about
the status of applications or other system components on the board. IPMB Bus provides
bandwidths from 100 to 400 Kbits/s. The propagation delay for a 16-byte (heartbeat)
is 320 microseconds. The Shelf Manager uses the health data provided by the HA
middleware to take corrective action, including generating alerts and emails,
and reconfiguration of a warm standby board. The Shelf Manager can also interact
with HA middleware performance monitoring and be used to identify available standby
hardware resources (boards) for provisioning and deployment of boards in order
to scale such hardware resources and maintain system performance levels.

Implementations of the Base Interface and fabric interface also provide an avenue
for selected communications. The Base Interface is an Ethernet BaseT link supporting
speeds of 10, 100 and 1000 Mbits/s. The fabric interface can be PCI Express or
1 and 10 Gbit Ethernet and can be used entirely for application data. Each interconnect
(Figure 4) has characteristics that will determine the appropriate usage for a
particular HA system functionality. The IPMB bus, which has limited bandwidth,
but serves critical needs for managing a shelf and therefore a system, must not
be unnecessarily burdened with any unusual traffic or risk inhibiting the ability
to manage the platform.

Applications supporting HA systems as well as routine provisioning or deployment
applications must be carefully designed to limit usage of the IPMB bus to shelf
and system-critical functions. Configuration of a board is a valid use of the
IPMB bus, as is delivering firmware upgrades to a shelf module such as an intelligent
fan tray, that otherwise has no other communications path available. Deploying
software to a board over the IPMB bus that has or could have a connection to the
backplane Base Interface, however, would likely be an improper use of the IPMB
interface.

Tomorrow’s Platform Today

The AdvancedTCA platform’s robust mechanical, power and thermals in a modular
form, coupled with management functionality previously only available in expensive
proprietary system platforms, makes this an ideal platform for HA systems. The
capabilities of this platform, a rich supply of commercial off-the-shelf boards
built by multiple vendors and a host of industry standard interfaces and management
middleware add up to a unique opportunity to accelerate development and deployment
of HA systems for next-generation networks.

Intel
Santa Clara, CA.
(408) 765-8080.
[www.intel.com].