SOFTWARE & DEVELOPMENT TOOLS
Linux and Java Team Up to Address High-Availability Needs
The combination of appropriate Linux and Java technologies provides a powerful platform for the development of cost-effective high-availability systems. Exploiting these technologies, however, requires careful consider in the selection of Linux distribution and Java virtual machine.
GEOFF BAYSINGER AND KELVIN NILSEN, MONTAVISTA SOFTWARE AND AONIX
Page 1 of 1
High-availability computer systems provide non-stop operation for mission-critical business, telecommunication and defense applications. To tolerate failure of both hardware and software components, the architecture of high-availability systems is provided with various forms of redundancy. When one part of the system fails, other redundant parts of the system take over, and new redundancies are introduced to accommodate further component failures. When a failed component is repaired, that component is restored into the system, where it quickly configures itself as a primary or backup participant in the ongoing computations.
In the past, high-availability (HA) systems were the exclusive domain of very expensive and proprietary hardware and operating systems. But recently, increasing reliance on computer systems for mission-critical activities has moved HA requirements into mainstream markets. The concurrent maturation of both Linux and Java has resulted in both technologies offering the specialized capabilities required in HA applications.
The appeal of using Linux and Java together for the implementation of HA systems derives largely from the fact that both technologies are widely recognized as open industry standards and both technologies have now proven themselves to deliver the breadth of features and the reliability required in HA applications.
For its part, the more recent Linux distributions include support for networked and journaling file systems, more robust driver and kernel coding to minimize panics, additional kernel preemption modes, hot-swap PCI devices, bonded Ethernet connections, backplane messaging and hardware monitoring. This support builds on open standards such as OpenAIS, OpenHPI, OpenIPMI, and the Carrier Grade Linux (CGL) specification produced by the Open Source Development Labs (OSDL).
Java builds on this foundation, providing high-level programming language support to facilitate the development of HA application software. The strengths that Java brings to the HA community include easier and less costly software development and maintenance, improved security through built-in language features like byte-code verification and array subscript checking, and dynamic class loading to support no-downtime software upgrades and on-the-fly system reconfiguration.
As a high-level programming environment, the Java platform generally hides its implementation details, including its dependencies on the underlying operating system, from Java software developers. Abstraction of this detail encourages software portability and simplifies development and maintenance activities. However, software engineers responsible for delivering HA must tunnel through the layers of abstraction in order to analyze and address availability vulnerabilities that might exist in the underlying implementation.
For example, Java provides standardized libraries for file input and output operations, network socket communication, and interaction with commercial data base implementations. When a Java virtual machine is running on a typical desktop operating system, the implementation of the file, network and database services is unlikely to support high-availability operation.
However, when the Java virtual machine is properly integrated within a Linux distribution that is configured for HA operation, the same portable Java code will support HA operation. In this configuration, the implementation of the java.io libraries is likely to support journaling, data mirroring and/or redundant distribution; the implementation of the java.net communication libraries is likely to exploit redundant network interface controllers and multiple wired connections to the network; and the implementation of the java.sql data base library would probably be based on a commercial HA data base implementation.
Many HA systems must comply with application-specific real-time constraints. Network infrastructure nodes, for example, are expected to report their status regularly and frequently. If companion nodes do not receive particular status reports within certain 50 ms timing windows, the network may conclude (erroneously) that certain nodes have failed. This will trigger fault recovery activities, adding to the network load, and increasing the likelihood that additional nodes will subsequently miss their status report timing windows. Reliable message delivery and overall message throughput will both suffer as a result of missed real-time deadlines. Other HA applications have similar timing constraints.
To meet HA timing constraints, both the Java virtual machine and the underlying real-time operating system must cooperate. Commercially supported Linux distributions have demonstrated consistent interrupt response latencies of less than 50 microseconds and thread preemption latencies of less than 65 microseconds. Each Linux distribution will have marked differences depending on the markets it serves and on which software packages are included in the distribution. Linux’ ability to satisfy real-time constraints is an essential part of a complete HA solution. The implementation of the Java virtual machine must support similar timing guarantees (Figure 1).
An important consideration in configuring a Java virtual machine for HA applications is support for predictable thread scheduling and synchronization behavior. According to the Java language specification, Java thread priorities are treated by the Java scheduler as mere suggestions, but the scheduler is not required to strictly honor thread priorities when dispatching time slices to each thread. Most Java virtual machines allow underlying OS heuristics (such as priority aging) to violate the thread priorities specified by Java programmers. Since developers of real-time systems use thread priorities to control the responsiveness of particular real-time components, it is important that a virtual machine designed to support deployment of HA real-time software honor strict priority scheduling of threads.
When HA systems are built from the combination of Java and non-Java components, it is essential that system engineers carefully account for the deadlines of each Java and non-Java thread and assign priorities to the respective threads so as to guarantee that each thread satisfies its timing constraints. In the most general case, it is necessary to create a carefully tailored interleaving of Java and non-Java thread priorities to address this requirement.
Another important consideration when integrating independently developed Java and non-Java HA software is a requirement to support efficient coordination and sharing of information between components written in the different languages. The Java Native Interface (JNI) is a well-defined mechanism that allows Java to call C and C to call Java. A weakness of the JNI protocol is the high CPU-time overhead that is consumed each time the JNI boundary between languages is crossed. Another weakness is the loss of information hiding. As HA software systems grow in size and complexity, a powerful engineering approach is to divide large systems into many smaller components, and conquer each of those components independently. Unfortunately, the JNI boundary between C and Java allows the C code to gain full access to information that ought to be hidden from C programmers.
Over the years, we have found that many customers describe this fragile JNI boundary as the single most common source of development errors. Since nearly all HA systems that include Java technologies also make extensive use of non-Java components, consider alternatives to the JNI protocol for implementation of the glue software that connects the two worlds. An approach that has been successfully demonstrated in several commercially deployed telecommunications products establishes shared buffers external to the Java virtual machine and provides mechanisms to allow both the Java code and the non-Java code to very efficiently monitor and modify the contents of these shared buffers without the use of any JNI services.
To support HA and real-time requirements, one popular real-time virtual machine offers a variety of Linux-specific configuration options. Among the configuration choices that can be specified at start-up time, system integrators can arrange for:
• All currently loaded virtual memory pages and all subsequently loaded virtual memory pages to be locked into physical memory.
• Optional use of an extended range (from 1 to 32) of Java thread priorities.
• A specific mapping that associates a particular native operating system priority with each of the allowed Java thread priorities.
• A choice to use FIFO vs. Round-Robin scheduling. If FIFO is specified, threads run to blocked or complete status before relinquishing to another thread at the same priority.
• Selection between use of the normal priority range and the real-time priority range. Threads running in the real-time priority range are not subject to priority aging.
This same virtual machine also takes full responsibility for implementing the Java thread scheduling and synchronization algorithms, thereby establishing a portable threading model that behaves the same across Linux, LynxOS, OSE, VxWorks, and a variety of other real-time operating systems. Multiple Java thread priorities may map to a single real-time operating system priority. This Java virtual machine ensures that the ready Java thread with highest priority is the only Java thread that is eligible to run. Thus, Java threads demonstrate consistent and predictable scheduling behavior relative to other Java threads, even though the underlying operating system might manipulate thread priorities in an attempt to improve system throughput or interactive response times.
Priority inversion occurs in Java software when multiple threads with different priorities access shared synchronization locks. If a low-priority thread acquires a lock and is preempted by a medium-priority thread while it still holds the resource lock, higher priority threads that desire to enter the lock will be blocked until the medium-priority thread relinquishes the CPU so that the low-priority thread can release the lock. This priority inversion problem is addressed in real-time virtual machines by implementing the priority inheritance protocol on all synchronization locks. With this protocol, a low-priority thread’s priority is automatically boosted to that of the highest priority thread requesting access to the same synchronization lock. At least one popular virtual machine designed for real-time HA operation guarantees that all synchronization implements priority inheritance as part of its portable soft real-time API definition (Figure 2).
Among the key real-time issues that must be addressed in an HA Java implementation is the behavior of the garbage collector. Traditional Java virtual machines may introduce occasional processing delays ranging from hundreds of milliseconds to tens of seconds due to garbage collection interference. Special real-time garbage collection techniques support the necessary timing constraints. These real-time garbage collectors can be paced to ensure that the garbage collector replenishes the free pool faster than the application code consumes memory, while defragmenting the free pool to ensure reliable allocator performance, and supporting preemption of the garbage collector within approximately 100 microseconds.
In the rare occasion that it is necessary to reboot an HA system, an important goal of an HA engineer is minimizing the mean time to repair (MTTR). This translates into a desire to minimize the time required for system restart. One technology that significantly reduces the time required to start-up a virtual machine is ahead-of-time compilation. With traditional Java virtual machines, the boot-up process is slowed because all of the application code must be dynamically loaded, verified and JIT-compiled before the application can begin to run. Virtual machines designed to support HA operation generally offer the option of compiling and linking the entire application as a static operation performed prior to run-time.
As with Linux itself, Java for HA systems must also be able to handle low-level interactions with fault-tolerant hardware subsystems. In HA systems, it is common to allow hot replacement of failed hardware components. Traditional Java lacks the ability to interface directly to hardware devices. Enhanced real-time virtual machines offer the ability to implement device drivers, including interrupt handlers, as portable real-time Java components, using conventions that allow the device drivers to be replaced dynamically when the corresponding hardware components fail.
MontaVista Software Aonix
Santa Clara, CA. San Diego, CA.
(408) 572-8000. (858) 824-0212.
San Diego, CA.