Locating Matching Method Calls by Mining Revision History Data

Developing an appropriate fix for a software bug often requires a detailed examination of the code as well as generation of appropriate test cases. However, certain categories of bugs are usually easy to fix. In this paper we focus on bugs that can be corrected with a one-line code change. As it turns out, one-line source code changes very often represent bug fixes. Moreover, a significant fraction of previously known bug categories can be addressed with one-line fixes. Careless use of file manipulation routines, failing to call free to deallocate a data structure, failing to use strncpy instead of strcpy for safer string manipulation, and using tainted character arrays as the format argument of fprintf calls are all well-known types of bugs that can typically be corrected with a one-line change of the program source.

This paper proposes an analysis of software revision histories to find highly correlated pairs of method calls that naturally form application-specific useful coding patterns. Potential patterns discovered through revision history mining are passed to a runtime analysis tool that looks for pattern violations. We focus our pattern discovery efforts on matching method pairs. Matching pairs such as fopen, fclose, malloc, free, as well as lock, unlock-function calls require exact matching: failing to call the second function in the pair or calling one of the two functions twice in a row is an error. We use common bug fixes as a heuristic that allows us to focus on patterns that caused bugs in the past. The user is presented with a choice of patterns to validate at runtime. Dynamically obtained information about which patterns were violated and which ones held at runtime is presented to the user. This combination of revision history mining and dynamic analysis techniques proves effective for both discovering new application-specific patterns and for finding errors when applied to very large programs with many man-years of development and debugging effort behind them.

To validate our approach, we analyzed Eclipse, a widelyused, mature Java application consisting of more than 2,900,000 lines of code. By mining revision histories, we have discovered a total of 32 previously unknown highly application-specific matching method pairs. Out of these, 10 were dynamically confirmed as valid patterns and a total of 107 previously unknown bugs were found as a result of pattern violations.