Book Description
Duplicated passages of source code - code clones - are a common property of software systems. While clones are beneficial in some situations, their presence causes various problems for software maintenance. Most of these problems are strongly related to change and include, for example, the need to propagate changes across duplicated code fragments and the risk of inconsistent changes to clones that are meant to evolve identically. Hence, we need a sophisticated analysis of clone evolution to better understand, assess, and manage duplication in practice. This thesis introduces Clone Evolution Graphs as a technique to model clone relations and their evolution within the history of a system. We present our incremental algorithm for efficient and automated extraction of Clone Evolution Graphs from a system's history. The approach is shown to scale even for large systems with long histories making it applicable to retroactive analysis ofclone evolution as well as live tracking of clones during software maintenance.We have used Clone Evolution Graphs in several studies to analyze versatile aspects of clone evolution in open-source as well as industrial systems. Our results show that the characteristics of clone evolution are quite different between systems, highlighting the need for a sophisticated technique like Clone Evolution Graphs to track clones and analyze their evolution on a per-system basis. We have also shown that Clone Evolution Graphs are well-suited to analyze the change behavior of individual clones and can be used to identify problematic clones within a system. In general, the results of our studies provide new insights into how clones evolve, how they are changed, and how they are removed.