20th IEEE International Conference on Software Maintenance, 2004. Proceedings.
Download PDF

Abstract

With the widespread adoption of object-oriented technologies, the lack of computationally efficient and scalable approaches is limiting the ability to model and analyze the history of large object-oriented software systems. This paper proposes an approximate representation of object-oriented code characteristics, inspired by pattern recognition centroids for clustering. An interesting application of such a representation is a linear-time complexity algorithm to detect duplicate or nearly duplicated code in object-oriented systems. The algorithm accuracy and time complexity were assessed on 11 releases of a large software system, the Eclipse Framework.
Like what you’re reading?
Already a member?
Get this article FREE with a new membership!