BROWSE BY TECHNOLOGY










RTC SUPPLEMENTS


TECHNOLOGY DEPLOYED

Code Requirements and Verification

Transforming Code Analysis with Visualization

As the volume of code becomes ever larger and more complex, more efficient methods and tools are needed to analyze it to find and correct defects. A newly emerging approach of graphical navigation can help engineers find their way through the thicket of complex interdependencies.

PAUL ANDERSON, GRAMMATECH

  • Page 1 of 1
    Bookmark and Share

Article Media

Code analysis tools for finding programming defects in large code bases have proven very popular in recent years because they are effective at improving software quality. These tools work by finding paths through the code that may trigger risky, undefined, or unwanted behavior. For a large code base, tools may generate many warnings, so it is important to be able to process these efficiently. Each report must be inspected by an engineer to determine if it constitutes a real problem and whether it should be corrected. If a fix is proposed, then the engineer will want to know what other parts of the code may be affected by the change.

The process of inspecting a warning to determine if it warrants action is known as triage. Some studies have shown that it takes an average of ten minutes to triage a warning report, but there is a large variance. Many reports can be dealt with in a few seconds, but the more complex ones can take significant effort. They may involve unusual control flow along paths that go through several procedures located in different compilation units, and can depend subtly on different variables. The path may be feasible and risky in some contexts, but infeasible or harmless in others. Consequently it can be tricky and time-consuming for engineers to fully understand reports.

The process of remediation can be similarly difficult because a proposed fix can have wide-ranging and unexpected consequences. A small change to a single procedure can potentially affect all functionality that depends on calls to that procedure. To be efficient at deploying the fix, an engineer will want to understand the other components of the software that are most strongly dependent on the change, so that validation activities can be prioritized to focus on those components first. The process of identifying the affected components is sometimes referred to as ripple-effect analysis or impact analysis.

To deal with complexity, programs are usually designed so they can be thought about at different levels of abstraction, and implemented so that those levels are apparent in the source code. This is usually helpful but can sometimes be misleading because the implementation may diverge from the design and the boundaries of the clean abstractions may be violated.

The essence of the issue is that programs can be large complicated beasts with complicated and subtle dependences between their components. New tools are emerging that help engineers penetrate this fog of complexity. Program visualization tools are proving especially useful at helping engineers gain insight into the subtleties of their program. When used appropriately they can amplify the effectiveness of a code analysis tool.

An important property of these tools that makes them effective is that the visualization is completely and automatically generated directly from the code itself. Thus the engineer can see exactly what is in the code instead of an idealized representation that may hide too many essential details. The code can be shown at different levels of abstraction from high-level modules down through compilation units, then individual procedures and finally as the text of the code itself.

Until fairly recently, code visualization tools have been limited in the amount of information they can display. However two trends have converged to make it possible to have tools that can show very large quantities of information, yet still be responsive to user actions. First, new techniques have emerged for automatically eliding information depending on the zoom level. Secondly, powerful video cards with hardware acceleration features for rendering detailed scenes have become ubiquitous, and the tools are now able to take advantage of this. The combination of these factors means that powerful new visualization techniques are feasible. Let’s look at some examples of how visualization can be used to help an engineer interpret the results of a code analysis tool.

Bottom-Up Visualization

Imagine a static analysis tool has reported a potential buffer overrun. The engineer responsible for triaging this warning must ask the following questions:

• Is the warning a real defect? Static analysis tools make approximations that can cause false positives, so it is important to determine this first.

• Is the bug likely to show up in the field?

• Some buffer overruns are harmless, but others may cause crashes or may be critical security vulnerabilities. What are the consequences of this bug being triggered?

• The point at which the buffer overrun occurs is seldom the exact point where the programmer erred. The error may be where the buffer was allocated or where an index into the buffer was calculated. Where was the error that gave rise to this bug?

• How should the defect be fixed?

• Finally, are there other defects like this in other parts of the code?

 

These questions are all best answered by starting from the point where the error occurs and working backward and forward through the low-level components of the code. Take for example a buffer overrun found in an open-source project. The offending code appears in a function named return_append_str as shown here:

if (!dest) {

        newloc = (char *) malloc(strlen(s))+1;

        strcpy(newloc, s);

        return newloc;

    }

 

In this case it is easy to confirm that this is unquestionably a real bug—the +1 is in the wrong place (it should be between the parentheses), so the call to strcpy on the following line will always overflow the buffer by two bytes. The next question is to determine if the defect is likely to show up in the field. Note that the defect is only triggered if the true branch of the conditional is taken. Perhaps this code is never deployed in an environment where that can happen. To answer this question, it is important to consider the many ways in which the function can be called. This is where a visualization tool can begin to be helpful. Figure 1 shows a visualization of the subset of the call graph in the vicinity of the defect. In this figure, functions on the left call functions on the right. From this it can be seen that the only call to return_append_str is from the function named append_str.

The user can then expand the view by working backward in the call tree to show more detail. Once enough detail has been revealed to understand the context, it becomes evident that there are several different ways in which the function containing the bug can be called. The next question is whether some or all of these are dangerous. Figure 2 shows how this can be seen in the visualization.

In this case the user has asked the analysis engine to determine which of the paths leading to return_append_str are dangerous. The red path indicates that the defect is likely to be triggered if that sequence of calls occurs. From here it is possible to show a textual representation of the call path from which it is easy to find the point of error and begin to plan a fix.

Top-Down Visualization

Not all code analysis tasks are suited to a bottom-up approach. Sometimes engineers want to take a high-level view of the code. Large programs can contain hundreds of thousands of procedures, and there may be millions of calls between procedures. Clearly it is infeasible to display all of these at once, so visualization tool designers have developed representations that summarize that information when the program is viewed from a high level, yet allow more detail to be revealed as the user drills down to lower levels.

From a code analysis point of view, a common use case is for a manager to see which high-level modules have the highest density of warnings, and to be able to drill down through sub-modules to low-level components and finally to the code itself.

Figure 3 shows a sequence of screenshots from a visualization tool (this is from CodeSonar) that demonstrates top-down visualization. Here the module hierarchy of the code is derived automatically from the file and directory structure. The leftmost part shows a fully zoomed-out view of the program. When zoomed out, the low-level calling relationships between procedures are projected onto the higher levels. As the user zooms in, more details start to emerge—first subdirectories, then source files, then individual procedures. The rightmost part shows how the visualization can lead directly to the textual representation of the code.

Here the layout of nodes is chosen automatically by a graph-layout algorithm. Tools usually offer users a choice of different layout strategies. For the top-level view, a “cluster” layout where link direction is indicated by tapered lines, as in Figure 3, is often the most appropriate. A left-to-right layout is commonly more useful when showing a small number of nodes, such as when operating in a bottom-up mode.

Operations on the elements of these views can be used to help an engineer plan a fix to a bug. The user can select a function and with a single command can select all other functions that are transitively reachable from that function. These will be in components that may be affected by the proposed fix, so testing activities should prioritize those parts first.

Additional data can be overlaid on the visualization to help users understand what parts of the code warrant attention. The warning density metric mentioned above is appropriate, but standard source code metrics may be useful too. Figure 4 shows a visualization of part of a small program where components containing functions with increasing cyclomatic complexity are highlighted in deeper shades of red. This helps users quickly see risky parts of the code.

Visual representations of structures and relationships are well known to be helpful for users wishing to gain an understanding of complex systems. Tools that generate visualizations of software systems are particularly useful when used in conjunction with code analysis tools. Together they can make the process of improving software quality much more efficient.  

GrammaTech
Davis,CA.
(800) 329-4932
www.grammatech.com