SOFTWARE & DEVELOPMENT TOOLS
Carrier Grade Linux
Runtime Application Patching for High Availability with Carrier-Grade Linux
Runtime patching allows a system maintainer to apply patches to applications in a running system without having to halt and restart either the system or the applications
JOHN MEHAFFEY, MONTAVISTA SOFTWARE
Systems that must run continuously (five nines systems) present special obstacles to system maintenance. By definition, five nines systems cannot have more than 5 minutes of down time per year. Such systems cannot be routinely shut down for system maintenance, either for software upgrades or to fix defects. Runtime patching can aid in keeping up running systems by allowing for bug fixes, software upgrades, and even temporary debugging routines to be placed into the applications without stopping the system.
A patch is a shared object that has some special information added to it that describes the patch. A patch starts out as the source file for the new or repaired functions and methods and is then compiled into a shared object file. Other information about the patch is placed into a patch information file, which is then added to the .patchinfo section of the shared object file to create the patch object.
Why Patch?
For systems that can be brought down to perform maintenance, patching doesn’t always make sense. It is easier and safer to just bring the system down, back it up, upgrade the software, test it, and bring the system back into service. If the new system doesn’t work properly, you can recover back to the old system and try again later.
Patching is ideal for systems that need to run “24 x 7 x 365”, and need to meet five nines, such as carrier-grade systems. These types of highly available systems must undergo all maintenance while operating. Patching reduces the need to upgrade the entire system in order to fix software defects. Upgrading in a redundant system usually means that one of the redundant parts of the system must be brought down, upgraded, brought back up, and synchronized with the rest of the system. This exposes the system to possible downtime due to a single hardware failure during the upgrade procedure. To meet five nines, the system can have only a very limited number of upgrades because of this exposure.
Using patching to fix bugs does not risk downtime due to hardware failure, since no parts have to be brought down to apply the patch. Patches may be prepared ahead of time, extensively tested in offline systems, and applied in a matter of seconds.
Sometimes bug fixes have hidden side effects that are even more undesirable than the original defect, and are only found after the system is back in operation. If the system was backed up and upgraded, it may not be possible to restore the old system without losing all the information collected since the upgrade. Runtime patches can be removed as easily and quickly as they are applied.
Upgrade problems have probably happened to everyone, including those upgrading five nines systems. Some strange problem in the existing system may make upgrading to a new system very difficult due to differences in the deployed system versus what was developed and tested in the lab. With patching, you can fix the problem in the old system instead of trying to work around it in the new. It can make the upgrade job much simpler.
Lastly, patches can help debugging. For example, if only you had one little piece of information in a certain circumstance, it could help you to debug a difficult problem. You can deliver a patch that gets this information, apply the patch, collect the information, and then remove the patch. Patching doesn’t replace upgrades, of course, but it adds another tool to the system maintainer’s kit to make the system easier to maintain in the field.
Patching doesn’t necessarily work well for every application. The following examples are types of applications that probably can’t make use of patching:
- Applications requiring hard real-time. These applications probably cannot take the latency hit when all threads are suspended during the patching operation.
- Applications with severely limited memory space. These applications may not be able to accommodate the additional overhead of the symbol tables required to patch.
- Applications that are life-critical. These applications, such as certain medical equipment or aircraft traffic control systems, cannot take the risks associated with patching.

Kontron
Interphase