Analysis Tools Get to the Heart of Software Performance

Getting code to run requires programming skill. Making sure that it runs correctly
requires that it be exercised and debugged. Assuring that it runs efficiently,
optimally, reliably and safely requires in-depth analysis. The tools to achieve
these latter goals start from the concepts of debugging but are used to examine
code from myriad different aspects. However, for embedded projects that must work
and on which human life and safety often depend, they are not an option.

When applying analysis to running code, the developer is very often confronted
with a kind of “Heisenberg dilemma.” That is, “What level of
intrusiveness into the actual execution can I tolerate and still be confident
that what I’m seeing is what will actually happen in the deployed system?”
That question applies mostly to timing issues. Among the other issues are assessing
the overall performance, making sure that memory is being used efficiently and
reliably and finding intermittent glitches that may not always show up in a standard
debugging session.

In addition to the level of intrusiveness, one must also consider the specificity
of the tool. Generally speaking, and of necessity, the deeper a tool delves into
the inner workings of a system, the more intrusive it tends to become as well
as more specific to the underlying operating system. There was a time when a certain
tool loaded instrumentation tags into the source code, which then produced instructions
that shot analysis data off to a connected development host or some attached probe.
That has since largely fallen out of favor—at least for embedded development,
although a certain amount of instrumentation may be needed in some cases.

Generally, however, the overhead on the target is caused by a relatively non-intrusive
monitor or interface that buffers and sends execution data to the host. The least
intrusive and deepest analysis is possible using tools that have a hardware assist,
such as a JTAG probe, and take advantage of on-chip debug facilities such as the
Traceports or embedded trace macrocells (EMTs). When not relying on instrumentation,
a tool must be able to reproduce the timing characteristics as they would be without
the overhead introduced by the tool.

Thus, when running a profiler, for example, the code may run slower overall than
the deployed system, but the results displayed must reflect the actual execution
times of the various functions. Most profilers can generate accurate results whether
the code is running on the target or under an instruction set simulator on the
host. In the latter case, it certainly runs slower, but is able to get an accurate
instruction count and deliver performance measurements that can be used to identify
areas that may be bottlenecks. It is these routines that the developer will want
to zero in on to try to make more efficient.

A rich set of analysis tools, known as ScopeTools is available from both Wind
River Systems and from Real-Time Innovations. Targeted at Wind River platforms
using the Tornado development environment, the suite consists of five tools that
are representative of the kinds of analysis tasks that need to be done for quality
embedded software. They examine the code and its behavior from different aspects.

StethoScope is a tool that can monitor a running system and watch a set of variables,
of any memory location. It lets you trigger data collection on specific events,
change variables in a running program, see peak values and save all the data to
disk. The purpose of StethoScope is to provide live data analysis on a running
system without interfering with the code.

ProfileScope, on the other hand, is used to diagnose the execution speed of
a program on a function-by-function basis. The profiler produces histograms
of the execution times of the various routines so that you can zoom in on those
that represent bottlenecks and concentrate on the areas that appear to be taking
the most CPU resources in an effort to improve the overall performance of the
application (Figure 1).

MemScope is a visual memory analysis tool that helps to efficiently manage memory
use by identifying memory leaks as they occur and to check memory consistency
and find errors that may occur in the memory pool. It offers Aggregate, Tree,
Time and Fragmentation views and can track the allocation and deallocation of
memory. This includes a view of the full stack of allocation to help figure out
why memory was allocated. The tool can be used with a running system with no need
for instrumentation using special compilations.

TraceScope lets the developer follow program execution by recording any calls
to a user-specified set of function in the running system. Every time a specified
routine is traced, the tool records what routine was called, what task called
it and what arguments were used. This tool does not record the execution of every
instruction, but lets the user zero in on routines of interest and see them in
the context of calling sequences.

CoverageScope is used in conjunction with testing to show what sections of code
have actually been executed during testing—and therefore, what areas need
yet to be exercised. The tool provides a color-coded scheme that can indicate
different levels of coverage: function, block, decision and condition. While viewing
results, the user can also use the source window to browse the corresponding sections
of the source code. A coverage tools lets a developer establish a level of confidence,
if 100

percent of the code hasn’t been covered. In some cases, cost and time
considerations may indicate that further testing is resulting in finding no
more errors and the code can be released at, say, 90 percent coverage. At such
a point, at least one knows where one stands.

Massive Tracing

In some cases, developers need to delve even more deeply into the workings of
timing relationships, intermittent glitches and the interaction of the applications
with both the hardware and the operating system. In the past, in-circuit emulators
were used to gather execution trace data of every cycle. With today’s highly
integrated CPUs, tracing requires on-chip support. Green Hills Software has recently
introduced a trace probe and analysis tool that can work in tandem on up to a
whole gigabyte of recorded trace data.

The SuperTrace probe works with ARM processors, such as the ARM7, ARM9 and
ARM10 that have the embedded trace macrocell (ETM), and with processors with
a more generic JTAG-like trace port such as the PowerPC 405/440. With a full
gigabyte of trace memory, it is able to record up to 1.7 billion cycles from
a PowerPC 405, for example, running at 600 MHz. Additional target support is
under development. This allows the probe to record several seconds of execution,
or if using software-assisted branch analysis, up to minutes of program execution
(Figure 2).

In addition, when used with Green Hills’ Integrity RTOS, the probe can support
virtual memory. It does this, according to Green Hills’ VP David Kleidermacher,
by understanding where all the mapping tables are and the address translation.
“We can detect whenever there’s an address switch so we can tell for
any point in the code what address space we’re running in. The mapping information
gives us exactly what’s running at any time.”

With such massive trace capability, it was possible to apply a new kind of analysis
tool called the TimeMachine, which, in effect, lets you run the program backward
and forward as many times as you wish. What is actually happening, of course,
is that the tool is following the recorded instruction sequences back and forth—and
displaying the source code—as if the actual code were running, and stepping
backward and forward as fast or slowly as the developer likes.

What such a combination of deep trace and analysis enables is an improved ability
to catch and find the causes of intermittent glitches that may occur only under
unusual circumstances. Such events are difficult to reproduce and may be caused
by an error that occurred much earlier in the program, such as a corrupted pointer.
Using the TimeMachine, one could simply let the program run until it hits a glitch
and then set a watch point on a suspected variable, and then run the program backwards,
set to break when the variable changes. This is what Kleidermacher calls “catching
the bug in the box.” It means you don’t have to try to reproduce the
possibly rare conditions that caused the bug but have a better chance of finding
out what they were.

A tool like the TimeMachine can be used for non-kernel-specific diagnosis,
but can also be linked with tools that are very specific to the Integrity RTOS,
such as the EventAnalyzer. The EventAnalyzer shows interactions with the kernel
by showing graphical displays of operating system events including interrupts,
context switches, service calls and exceptions with a display that resembles
a traditional logic analyzer (Figure 3). This lets you look at the virtual-to-physical
mappings, even in systems that switch between multiple threads.

Kernel Awareness

In order to verify real-time operation, it is necessary to drill down to the
specifics of the individual operating system. While this does yield vital information,
it also limits a tool to that particular OS. However, if you need to verify
the schedulability of code, a tool such as the RTXC Quadros kernel awareness
tool from Quadros Systems is what is needed. The tool provides profiles of all
execution entities: threads, interrupt service routines and kernel. It measures
execution times, preemption times and latency to give you a worst-case timing
for each entity, including how often and how long it was preempted or interfered
with. This information allows you to set priorities for deadline monotonic scheduling
(Figure 4).

Under deadline monotonic scheduling, the tasks with the shortest deadlines are
scheduled first. It is necessary to measure under real-world conditions because
as Quadros president Tom Barrett says, “Things are not always what they
seem. For example, was a task’s release time stable or did it hang around
a long time before it got control?” Once you have the information, the theory
is that if you are sure that all the tasks can meet their deadlines, then the
system is schedulable by definition.

Another scheduling tool, called Rapid RMA from Tri-Pacific Software, supports
rate monotonic scheduling as well as deadline monotonic. With rate monotonic scheduling,
you set the priorities of your tasks according to the rate at which they have
to run, the frequency of their duty cycle. It works with Wind River’s Tornado
environment and interfaces with that company’s WindView tool, which is similar
to the Green Hills EventAnalyzer. Other versions of Rapid RMA interface with object-oriented
UML-based graphical development tools such as Rational Rose Real Time and the
Rhapsody tool from iLogix.

Looking at Java

With Java moving ever deeper into embedded systems, there is a need to analyze
and verify its performance. To that end, Aonix has adopted a tool called OptimizeIt
from the desktop environment that is basically a profiling tool for Java. It helps
identify which pieces of the system are consuming the most CPU time. It gives
a breakdown of how much time is spent in each method and can also narrow its view
to blocks within a method and identify which pieces of the system are allocating
memory.

OptimizeIt works with Aonix’ PERC real-time Java implementation, but can
theoretically work with any Java virtual machine that supports the profiling interface
API defined by Sun. The tool runs mostly on the host system and works at the VM
level by connecting to the byte code and communicating with the VM in terms of
ranges of byte code and mapping those back to the source code level. This makes
it fairly intrusive in terms of the speed at which the system runs while OptimizeIt
is gathering data. It does not, however, alter any of the code by inserting instrumentation.

Keeping track of memory is vital for getting Java to perform in embedded applications.
Programmers are encouraged to think at a high level of abstraction and often do
not adequately consider performance issues. In Java, memory, once allocated, is
not specifically deallocated, but is returned to the system via the garbage collector.
In embedded systems, it is important to keep track of how much memory is allocated
and that when garbage collection does run, that it not interfere with vital tasks.

To help decide when to run garbage collection, PERC 4.1 has a pacing agent, a
thread that runs on top of the VM management API and schedules garbage collection.
Rate monotonic analysis helps characterize the real-time work load so that a certain
amount of CPU time can be reserved for certain priorities that will not be interfered
with by the garbage collector. The garbage collector can then only run in the
leftover CPU time and thus does not interfere with the application’s deadlines.
The pacing agent is used during development to characterize the schedulability
of both priority tasks and the garbage collector. It also runs in the deployed
system and looks specifically at how to add garbage collection to the rate monotonic
workload.

One effect of applying analysis tools to Java, according to Aonix’ Kelvin
Nielsen, is that programmers learn to develop a discipline about the effects on
the system’s resources of certain kinds of programming. For example, they
may learn that it’s not a good idea to allocate a lot of objects in time-critical
loops or that there are certain things, like string concatenation, that you can
do in Java without realizing that you’re allocating memory. The result,
according to Nielsen is that, “having paid the price of education, the next
time they write code, they will be thinking, ‘Oh, I know what happens when
I write this kind of code.’”

That effect can probably be expanded to apply to the discipline of analyzing software
in general. After all, the goal of analysis is to understand. Once we understand
what’s going on (and realize how often we haven’t), we can take action
to improve the code and, perhaps most important, gain confidence in it. That then
feeds back into coding practice and in that sense, investment in good software
analysis tools, detailed and time-consuming as they may be to use, has rewards
not only in programs that run better but also in programmers who program better.

Aonix
San Diego,CA.
800) 972-6649.
[www.aonix.com].

Green Hills Software
Santa Barbara. CA.
(805) 965-6044.
[www.ghs.com].

Quadros Systems
Houston, TX.
(832) 351-2830.
[www.quadros.com].

Real-Time Innovations
Sunnyvale, CA.
(408) 734-4200.
[www.rti.com].

Tri-Pacific Software
Alameda, CA.
(510) 814-1770.
[www.tripac.com].

Wind River System
Alameda, CA.
(510) 748-4100.
[www.windriver.com].