Clone Evolution


Book Description

Duplicated passages of source code - code clones - are a common property of software systems. While clones are beneficial in some situations, their presence causes various problems for software maintenance. Most of these problems are strongly related to change and include, for example, the need to propagate changes across duplicated code fragments and the risk of inconsistent changes to clones that are meant to evolve identically. Hence, we need a sophisticated analysis of clone evolution to better understand, assess, and manage duplication in practice. This thesis introduces Clone Evolution Graphs as a technique to model clone relations and their evolution within the history of a system. We present our incremental algorithm for efficient and automated extraction of Clone Evolution Graphs from a system's history. The approach is shown to scale even for large systems with long histories making it applicable to retroactive analysis ofclone evolution as well as live tracking of clones during software maintenance.We have used Clone Evolution Graphs in several studies to analyze versatile aspects of clone evolution in open-source as well as industrial systems. Our results show that the characteristics of clone evolution are quite different between systems, highlighting the need for a sophisticated technique like Clone Evolution Graphs to track clones and analyze their evolution on a per-system basis. We have also shown that Clone Evolution Graphs are well-suited to analyze the change behavior of individual clones and can be used to identify problematic clones within a system. In general, the results of our studies provide new insights into how clones evolve, how they are changed, and how they are removed.







Code Clone Analysis


Book Description

This is the first book organized around code clone analysis. To cover the broad studies of code clone analysis, this book selects past research results that are important to the progress of the field and updates them with new results and future directions. The first chapter provides an introduction for readers who are inexperienced in the foundation of code clone analysis, defines clones and related terms, and discusses the classification of clones. The chapters that follow are categorized into three main parts to present 1) major tools for code clone analysis, 2) fundamental topics such as evaluation benchmarks, clone visualization, code clone searches, and code similarities, and 3) applications to actual problems. Each chapter includes a valuable reference list that will help readers to achieve a comprehensive understanding of this diverse field and to catch up with the latest research results. Code clone analysis relies heavily on computer science theories such as pattern matching algorithms, computer language, and software metrics. Consequently, code clone analysis can be applied to a variety of real-world tasks in software development and maintenance such as bug finding and program refactoring. This book will also be useful in designing an effective curriculum that combines theory and application of code clone analysis in university software engineering courses.







Evolving Code Clones


Book Description




An Empirical Study for the Impact of Maintenance Activities in Clone Evolution


Book Description

Code clones are duplicated code fragments that are copied to re-use functionality and speed up development. However, due to the duplicate nature of code clones, inconsistent updates can lead to bugs in the software system. Existing research investigates the inconsistent updates through analysis of the updates to code clones and the bug fixes used to fix the inconsistent updates. We extend the work by investigating other factors that affect clone evolution, such as the number of developers. On two levels of analysis, the method and clone class level, we conduct an empirical study on clone evolution. We analyze the factors affecting bug fixes and co-change (i.e. update cloned methods at the same time) using our new metrics. Our metrics are related to the developers, code complexity, and stages of development. We use these metrics to find ways to improve the maintenance of cloned code. We discover that one way to improve maintenance of code clones is the decrease of code complexity. We find that increased code complexity leads to a decrease in co-change, which can lead to bugs in the software. We perform our study on 6 applications. To maximize the number of clones detected, we use two existing code clone detection tools: SimScan and Simian. SimScan was used to find clones in 5 of the applications due to its versatility in finding code clones. Simian was used to detect clones due to its reliability to find code clones regardless of language or compilation problems. To analyze and determine the significance of the metrics, we use the R Statistical Toolkit.




Language Evolution to Reduce Code Cloning


Book Description

Domain-specific languages can significantly speed up the development of software applications. However, it usually takes a few iterations of the language design before it achieves such power. At the same time, many domains tend to evolve quite often today, which implies that domain-specific languages have to evolve accordingly. Thus, being able to evolve a language in a painless manner is crucial. Unfortunately, current state-of-the-art research does not provide enough answers on how to efficiently evolve domain-specific languages. We present an approach to evolving a language in order to reduce the amount of code cloning it introduces. The approach specifically targets those languages whose design causes users to create many duplicated code segments. We target domain-specific languages as they tend to be more challenging to evolve due to their specifics, but the approach may be applicable to general purpose programming languages as well. The approach was tested on a real-world domain-specific language that is used in a financial domain. We proposed three improvements and current users helped us evaluate them. We found that the proposed improvements would reduce code cloning, which provides evidence that the approach can be used in a real-world environment. Furthermore, this work provides a solid basis for further research in the area of application of code cloning detection results. In particular, code cloning detection results and the ideas we presented show potential to be extended and used to facilitate domain analysis.




Empirical Research towards a Relevance Assessment of Software Clones


Book Description

Redundancies in program source code - software clones - are a common phenomenon. Although it is often claimed that software clones decrease the maintainability of software systems and need to be managed, research in the last couple of years showed that not all clones can be considered harmful. A sophisticated assessment of the relevance of software clones and a cost-benefit analysis of clone management is needed to gain a better understanding of cloning and whether it is truly a harmful phenomenon. This thesis introduces techniques to model, analyze, and evaluate versatile aspects of software clone evolution within the history of a system. We present a mapping of non-identical clones across multiple versions of a system, that avoids possible ambiguities of previous approaches. Though processing more data to determine the context of each clone to avoid an ambiguous mapping, the approach is shown to be efficient and applicable to large systems for a retrospective analysis of software clone evolution. The approach has been used in several studies to gain insights into the phenomenon of cloning in open-source as well as industrial software systems. Our results show that non-identical clones require more attention regarding clone management compared to identical clones as they are the dominating clone type for the main share of our subject systems. Using the evolution model to investigate costs and benefits of refactorings that remove clones, we conclude that clone removals could not reduce maintenance costs for most systems under study.




Software Evolution


Book Description

This book focuses on novel trends in software evolution research and its relations with other emerging disciplines. Mens and Demeyer, both authorities in the field of software evolution, do not restrict themselves to the evolution of source code but also address the evolution of other, equally important software artifacts. This book is the indispensable source for researchers and professionals looking for an introduction and comprehensive overview of the state-of-the-art.




Fundamental Approaches to Software Engineering


Book Description

This book constitutes the refereed proceedings of the 13th International Conference on Fundamental Approaches to Software Engineering, FASE 2010, held in Paphos, Cyprus, in March 2010, as part of ETAPS 2010, the European Joint Conferences on Theory and Practice of Software. The 25 papers presented were carefully reviewed and selected from 103 submissions. The volume also contains one invited talk. The topics covered are model transformation, software evolution, graph transformation, modeling concepts, verification, program analysis, testing and debugging, and performance modeling and analysis.