TECHNOLOGY IN SYSTEMS
Rugged, Hot-Swappable and Reliable
Lessons in Performance, Ruggedness and Reliability for Commercial Embedded Markets
Rugged, hot-swappable features required by telecommunications systems are compatible solutions for military and mission-critical commercial applications.
NANCY PANTONE & KEITH TAYLOR, KONTRON
Page 1 of 1
Embedded systems are all about performance and reliability; they are expected to handle their assigned tasks efficiently and without fail. Secure network infrastructure applications in particular, such as those found in telecommunication central offices, are often expected to perform non-stop even in remote locations. Central office systems require the utmost in performance and overall reliability since failure is just not an option. Rugged, hot-swappable design features characterize these applications, and must meet not only their mission-critical requirements but also their high-bandwidth performance and grueling environmental demands.
In fact, strict telecommunications standards and requirements have advanced the design of systems to the point that they directly benefit technically adjacent markets as well. Key design considerations that affect telecommunications’ high availability include designing in Network Equipment Building System (NEBS) standards for certification, and compliance with European Telecommunications Standards Institute (ETSI) standards. They also include Reliability, Availability and Serviceability (RAS) features, evaluating and managing shock and vibration issues, and planning for overall system manageability. Further, advanced condition monitoring and in- and out-of-band system management functions are part of this evaluation, ensuring proper notification and identification of potentially degrading or failed components with the least amount of system disruption and the most cost-effective service schedule. Military, and in turn, rugged commercial embedded markets such as transportation, energy and manufacturing, stand to benefit greatly from the proven design theories behind extremely reliable, standards-based performance.
Telecommunications system designers routinely work with rugged, NEBS-certified solutions for high-performance, high-bandwidth applications in extreme environments. In compliance with NEBS-3/ETSI requirements, telecommunications systems are designed and tested to withstand extreme heat, humidity, altitude and zone 4 earthquake shock (7.0 Richter scale and higher) and multiple other extreme environmental conditions. These rigid requirements also include advanced server management and telco alarm management, which are important features that provide visual, audible and/or remote network event indications of faults.
As an example, communications rackmount servers used in the central office are specifically designed for space-constrained, thermally and mechanically rugged environments that also require very high availability. These systems are characterized by extended temperature components, robust mechanical design including redundant, hot-swappable features, and world class vibration suppression technologies that set them apart from conventional enterprise and industrial servers.
Balancing Reliability Features with Environmental Demands
Military designers are making the most of proven telecommunications design tenets, developing systems that must perform under many of the same high-bandwidth, increased security and sophisticated data processing pressures in punishing physical and environmental conditions.
Rugged military applications integrate many of the same RAS features found in the “always on” telecom infrastructure. Non-stop performance is ensured through the use of redundant systems that in turn enable hot-swappability. The components that are most likely to fail are power supplies, drives and fans. As a result, these are considered critical RAS features and the most important priorities in terms of designing in redundancy.
Redundant power supplies—probably the most basic yet critical component in the hot swap arsenal—are required for NEBS certification, and are commonly found in telecom applications to ensure continued non-stop system operation. It is important to note that redundant power supplies also permit the use of redundant external power sources. Central offices all have two separate power feeds—and as a result, they are protected not only against power failure but also from the failure of external devices that might cause the loss of an entire power rail.
Hard disk drives are a close second—not quite as critical as a power supply but also more generally prone to failure. The effect of a drive failure can be mitigated through a redundant storage configuration. Hardware or software RAID is often implemented in order to manage data redundancy across the drives, enabling valuable data to be retained on the remaining drives until such time that a service event can be scheduled and the failed drive conveniently replaced without system interruption. Redundant, hot-swappable fans are also included as part of NEBS certification—illustrating the importance of thermal management in keeping rugged systems operational. Overheating and improper airflow can degrade system performance so dramatically that it can achieve the same level of service interruption as a full system failure.
Remote monitoring systems provide a Web-based means of observing critical components, tracking system malfunctions in real time and notifying network administrators of impending issues. This is essentially a form of preventive maintenance—avoiding downtime by constantly monitoring the system’s vital operating parameters, such as processor temperature, fan speeds, power supplies and hard drive status. Service events are recorded so they can be effectively managed and also archived for future review.
Extreme, space-constrained environments such as military aircraft, ships and field datacenters are exposed to wide temperature variations, high altitude and exposure to shock and vibration. Rugged chassis design in these instances involves minimal use of plastic, or use of higher grade burn-resistant plastic with UL 94-VO certification. Thicker sheet metal is added for improved rigidity and is coated with Zinc Chromate to avoid rust on shear edges. Even the server cables are viewed as an opportunity to design in additional rugged reliability, incorporating high-quality locking mechanisms, shrouds and thicker 30-micro-inch gold contacts. Some systems further improve reliability and serviceability by reducing or eliminating cables, through integration of multiple functions on one board or direct physical docking of boards.
Uncontrolled vibration in high-performance environments can render a system non-operational—and can at the same time be extremely difficult to diagnose. As hard disk drives become more and more sensitive to vibration, their magnetic heads have greater difficulties staying on track. As a failing drive continually tries to correct itself, performance decreases proportionally. Isolating both vibration-generating devices and vibration-sensitive devices is proving to be a highly viable approach. As systems become more powerful, innovations in vibration suppression will continue as manufacturers persist in their ongoing evaluation of both fan and disk drive products.
Modern Military Design
With an unmatched diversity of application requirements, the modern military relies heavily on secure communications and networked systems, sharing vital information in real time and linking command centers to individual soldiers mobilized over land, sea and air. Driven by technology initiatives such as Warfighter Information Network-Tactical (WIN-T) and Brigade Combat Team Modernization, designers are building in rugged, hot-swappable system features integral to ensuring real-time situational awareness across military command centers and the rank and file (Figure 1).
Secure network communications for military applications are being deployed incrementally in the WIN-T program, bringing greater levels of networking capabilities to field units and ground commands.
For example, secure network communications for military applications are being deployed incrementally in the WIN-T program, bringing greater levels of networking capabilities to field units and ground commands. “Networking-at-the-halt” is the first phase of the WIN-T program, and provides roll-on/roll-off mobility and Internet-based connectivity to the warfighter, satellite and line-of-sight connectivity, and Defense Information Systems Network (DISN) services down to the Battalion level. Phase two is “networking-on-the-move,” enabling a mobile infrastructure on the battlefield and extending a broadband communication network down to the Company level.
Earlier military systems generally relied on system redundancy using purpose-built proprietary solutions, which increased overall system cost and hampered their transportability. Maximum system uptime is still a critical requirement for integrated battlefield management, and system designers can now take advantage of the high-availability features this COTS technology offers. Many command centers have the need to process huge amounts of high-bandwidth data and communicate in real time with troops at every level. Utilizing standards-based COTS technologies, these command centers can now function in a very similar fashion to the telecommunications central office. Size weight and power (SWaP) will always be a top military design issue, however, these secure network applications also require computing bandwidth and high availability that just cannot be sacrificed as a design trade-off.
Reliable Design Options
While battlefield applications have unique requirements, they share many with telecom. These include carrier grade servers, NEBS-3/ETSI-compliant standard building blocks that meet stringent environmental requirements. IP Network and industrial servers are additional options—optimized for high I/O throughput and compute performance, and well-suited for data network applications with large I/O requirements (Figure 2).
The Kontron CG2100 Carrier Grade Server combines performance, ruggedness, reliability and long life in a NEBS-3 and ETSI-compliant 2U chassis. It provides dual socket support for the Intel Xeon Processor 5600 series, coupling high performance with power efficiency to provide improved performance-per-watt over previous-generation rackmount servers.
Carrier grade servers are differentiated by NEBS certification, validated to handle power management, electrical shielding, disaster preparedness, environmental safety and specific application-defined hardware interfaces. IP Network and industrial servers (Figure 3) are not NEBS rated, however, they implement much of the ruggedness and reliability of carrier grade systems. For example, extreme industrial systems handle an operating temperature range of 0° to 50°C and an operating humidity range of 10-95%, which offer all-around IP 20 protection (configurable to IP 52 at the front), and high shock and vibration protection. By implementing an Intelligent Platform Management Interface (IPMI) over LAN, network users have an OS-independent, cross-platform interface for monitoring the server system’s temperature, voltage and fan status, including out-of-band management even when the main processors are not powered on.
The Kontron KISS 4U KTC5520’s rugged design, manufacturing, high MTBF classification and manageability all contribute to lower total cost of ownership. Standard applications for the rugged and extremely silent embedded server include industrial imaging and military applications, as well as high-end data processing, storage and simulation applications.
Many traffic-intensive applications demand the unique capabilities of specialized network processors coupled with general purpose CPUs. However, the majority of conventional enterprise class servers cannot provide the extra power or cooling required to support these network acceleration cards, which typically draw two to three times the maximum power allowed by standard PC slots. Carrier grade servers, IP network and industrial servers solve this challenge with dedicated power rails and auxiliary power connectors that deliver additional power directly to the adapter’s auxiliary power ports. Improved cooling capacity is designed into the I/O’s thermal zone to specifically accommodate these high-power I/O cards. Further, I/O can be supported in the front of selected IP network servers, providing simplicity in connecting network ports from the same side of the rack.
Designing Reliability for the Long Term
Designers of military market applications are recognizing the requirement similarities and proven successes of telecom designs. The incorporation of RAS features, including rugged, hot-swappable components, and the consideration of physical design characteristics, such as altitude, limited space and extreme thermal requirements, are proven principles, enabling new applications and meeting the needs of modern battlefield initiatives. However, high-availability, mission-critical applications require a much deeper look at current and future environmental conditions, as well as careful longevity planning.
Not only do military designers face environmental impact from sources like heavy equipment, vehicles, generators, engines or other types of industrial machinery operating within or very near network installations, they must manage their successful design for long-term exposure to these elements. Beyond ruggedness, designs must have an extended lifecycle with product availability and manufacturer support as an ongoing requirement. Conventional enterprise-class servers have an expected lifespan of 18 months before EOL (end-of-life) from the supplier; however, telecommunication service providers require equipment to be in production for three to five years or even longer. This assurance is very attractive to military and other embedded markets, increasing stability and reducing maintenance and qualification costs with fewer product releases and validation cycles. Further, manufacturer service and support must be continued for another two to three years after production has ended, allowing end-users more time to scale operations and remain with the same products longer.
Military designers are among the first to utilize telecom methodologies in rugged environments and mission-critical situations where nothing less than infallible performance will do. The networking achievements of WIN-T and Brigade Combat Team modernization demonstrate this influence and the proven success of rugged, reliable telecommunications design. This high bar for guaranteed non-stop performance is also likely to illustrate the optimal solution for other rigorous commercial applications such as transportation management, smart grid energy systems, chemical plants or other severe manufacturing settings.
System reliability is a dynamic equation that changes under varying operational circumstances, and designers must treat high-availability computing as “more than the sum of its parts.” This forces designers to fully understand and evaluate potential environmental issues early in the design process—ideally leveraging the design lessons of the telcommunications central office. Designers can anticipate these proven concepts to provide significant competitive value in the evolving range of unique and demanding physical computing environments.