BROWSE ARTICLES BY TECHNOLOGY

DIGITAL EDITION

RTC Magazine Digital Edition

INDUSTRY NEWS

QUICK DOWNLOADS

RTEC10 is an index made up of 10 public companies which have revenue that is derived primarily from sales in the embedded sector. The companies are made up of both software and hardware companies being traded on public exchanges.

COMPANY PRICE
(USD)
CHANGE
 
Adlink
1.22
-1.781%
Advantech
3.02
-0.889%
Concurrent Comp
3.58
-3.241%
Elma
474.00
0.173%
Enea
5.31
-1.918%
-   Interphase5.130.000%
-   Kontron0.00
Mercury Comp
14.04
1.299%
Performance Tech
1.83
-2.032%
PLX
3.22
-0.617%
Radisys
7.39
0.271%
52 WK HIGH 52 WK LOW MKT CAP (Million USD)
1.24
1.15
167.08
3.06
3.02
1,668.57
3.66
3.51
32.95
474.00
474.00
108.30
5.34
5.00
93.75
5.155.1235.37
0.000.000.00
14.05
13.69
429.77
1.83
1.72
20.36
3.25
3.20
143.40
7.52
7.23
204.97
RTEC10 Index: 603.86 (-4.75%)
RTEC10 is sponsored by VDC research

SOFTWARE & DEVELOPMENT TOOLS

Carrier Grade Linux

Runtime Application Patching for High Availability with Carrier-Grade Linux

Runtime patching allows a system maintainer to apply patches to applications in a running system without having to halt and restart either the system or the applications

JOHN MEHAFFEY, MONTAVISTA SOFTWARE

  • Page 1 of 1
    Bookmark and Share

Systems that must run continuously (five nines systems) present special obstacles to system maintenance. By definition, five nines systems cannot have more than 5 minutes of down time per year. Such systems cannot be routinely shut down for system maintenance, either for software upgrades or to fix defects. Runtime patching can aid in keeping up running systems by allowing for bug fixes, software upgrades, and even temporary debugging routines to be placed into the applications without stopping the system.

A patch is a shared object that has some special information added to it that describes the patch. A patch starts out as the source file for the new or repaired functions and methods and is then compiled into a shared object file. Other information about the patch is placed into a patch information file, which is then added to the .patchinfo section of the shared object file to create the patch object.

Why Patch?

For systems that can be brought down to perform maintenance, patching doesn’t always make sense. It is easier and safer to just bring the system down, back it up, upgrade the software, test it, and bring the system back into service. If the new system doesn’t work properly, you can recover back to the old system and try again later.

Patching is ideal for systems that need to run “24 x 7 x 365”, and need to meet five nines, such as carrier-grade systems. These types of highly available systems must undergo all maintenance while operating. Patching reduces the need to upgrade the entire system in order to fix software defects. Upgrading in a redundant system usually means that one of the redundant parts of the system must be brought down, upgraded, brought back up, and synchronized with the rest of the system. This exposes the system to possible downtime due to a single hardware failure during the upgrade procedure. To meet five nines, the system can have only a very limited number of upgrades because of this exposure.

Using patching to fix bugs does not risk downtime due to hardware failure, since no parts have to be brought down to apply the patch. Patches may be prepared ahead of time, extensively tested in offline systems, and applied in a matter of seconds.

Sometimes bug fixes have hidden side effects that are even more undesirable than the original defect, and are only found after the system is back in operation. If the system was backed up and upgraded, it may not be possible to restore the old system without losing all the information collected since the upgrade. Runtime patches can be removed as easily and quickly as they are applied.

Upgrade problems have probably happened to everyone, including those upgrading five nines systems. Some strange problem in the existing system may make upgrading to a new system very difficult due to differences in the deployed system versus what was developed and tested in the lab. With patching, you can fix the problem in the old system instead of trying to work around it in the new. It can make the upgrade job much simpler.

Lastly, patches can help debugging. For example, if only you had one little piece of information in a certain circumstance, it could help you to debug a difficult problem. You can deliver a patch that gets this information, apply the patch, collect the information, and then remove the patch. Patching doesn’t replace upgrades, of course, but it adds another tool to the system maintainer’s kit to make the system easier to maintain in the field.

Patching doesn’t necessarily work well for every application. The following examples are types of applications that probably can’t make use of patching:

  • Applications requiring hard real-time. These applications probably cannot take the latency hit when all threads are suspended during the patching operation.
  • Applications with severely limited memory space. These applications may not be able to accommodate the additional overhead of the symbol tables required to patch.
  • Applications that are life-critical. These applications, such as certain medical equipment or aircraft traffic control systems, cannot take the risks associated with patching.

How Does a Patch Work?

A patch can replace pieces of code in the system, add new code to the system, provide new global data for itself and future patches, and run code upon loading, activation, deactivation and unloading. Not all kinds of code are possible to patch.

A patch can replace any function or method, add new functions and methods, and add new data items. A patch cannot directly make existing data items larger (you cannot add new items to a structure), automatically change existing data items, or update a code flow that never leaves a procedure.

If code is running in an old version of a method or function, it will continue to run in the old version until it exits and calls the function again. For this reason, infinite loops can never be patched. For instance, in the function in Figure 1:

for (;;) {

<do something>

}

The “do something” can never be patched. Instead, if you code this as:

void do_something(void)

{

<do something>

}

for (;;) {

do_something();

}

As long as do_something() is not inlined, it can be patched. If you actually need to leave the loop, a post-patch-activation procedure can set a global variable that causes the loop to terminate.

The patcher requires the symbol table to work properly. Programs must be compiled with -rdynamic at the link stage so that the program’s symbols are available to the patch. Using tools that remove symbol table information from the application such as strip or MontaVista’s Library Optimizer Tool (LOT) will prevent you from using the patcher on those applications.

Patches apply in two phases: loading and activation. The system maintainer first loads the patch using a runtime patcher utility called fsadcon (see Figure 1). This utility makes sure all other required patches are loaded into memory, loads the patch into memory, and builds a table of places to patch. It does not replace the functions at this point in time. Any new functions and data provided by the patch are available, but the old ones will still be in use by the application.

The patch loader works with the standard glibc loader to load the code, and the code is available as a standard loaded shared object. The patch loader takes standard glibc flags—if you want the patch’s functions to be usable to other patches or new code, you must pass in the proper flags.

To replace the functions, the patch must be activated. When the patch is activated, the patching system suspend all threads, making sure they are not running in the areas of code being patched, inserts code at the beginning of the old functions that jumps to the new function code, and then resumes the suspended threads. Patches can also be deactivated and unloaded.

Activate and Deactivate Code

To allow the patch to prepare for activation or deactivation, you can add special functions that run before the patch is activated, after it is activated, before it is deactivated and/or after it is deactivated. There are six special functions that run before activation or deactivation and may return an error code to prevent the activation or deactivation. Each of this set of six routines receives the patch as a parameter so that it can access the PATCHID (which is used as the file name in the symbol table) to look up symbols in the patch, if desired.

Practical Considerations of Patching

Patches can depend on other patches both implicitly and explicitly. An explicit dependency comes from a list of patches specified by the “Requires” keyword in the patch info file. An implicit dependency comes from applying a patch to a symbol that another patch has already patched.

For example, if patch1 patches function test and patch2 also patches function test, if patch2 is loaded after patch1, patch2 implicitly depends on patch1, i.e.: you cannot deactivate patch1 without also deactivating patch2.

If a patch explicitly depends on another patch, then it cannot be loaded until the required patch is loaded. A patch cannot be activated until all the patches it depends on are activated. Also, a patch cannot be deactivated/unloaded unless all the patches it depends on are deactivated/unloaded.

Changing code while it is running can lead to problems unless the code and the patch are well constructed. A thread running in the unpatched version of a function might call another function and get the patched version. Two threads can be running, one in the old code and another in the new code. A function running in old code can use data that the patch has changed, which might cause it to go into an infinite loop or SEGV. The programmer can do a number of things to help avoid these problems. Design choices can make a big difference here, and a robust patching solution can be designed.

The selection of programming language can make a big difference as well. If you program in a language such as Ada95 or compiled Java that can catch errors such as array overruns and properly recover, even if a patch causes a problem it will only cause one operation to fail. The system can recover and continue to operate. Using such languages has many other benefits beyond patching, and any designer should consider using them.

The threads in the system should be designed to catch errors or signals and recover properly. Use of a language that supports exceptions can help. Exception handlers in the code can clean up if something goes wrong. This can also help recover from programmer errors that don’t have anything to do with patching.

Adding one or more unused fields at the end of strategic data structures can allow future upgrades to use the exiting data structures, perhaps by defining the unused field as a pointer to an extension. The _patcher_post_load() routine can then use malloc() to allocate memory for each of the extensions, patch in the pointers to the existing structures, and populate the extensions, saving the trouble of having to redo all of the data structures.

Embedded systems generally get driven by events; event-driven programming makes designing such systems much easier. Using an event-driven system (also called state machine programming) can make patching quite safe.

In an event-driven system, a small number of threads run all the operations of the system. To schedule something to be done, instead of allocating a thread for it, you put it into an event queue. The event threads then take operations out of the queues and run them. For patching, the event system could be modified to halt all the operations, activate/deactivate the patch, then continue. Only the patches to code that don’t run under the event framework will need careful design to account for concurrency.

MontaVista Software
Sunnyvale, CA.
(408) 328-9200.
[www.mvista.com].