Clustering in Relational Data and Ontologies


Book Description

This dissertation studies the problem of clustering objects represented by relational data. This is a pertinent problem as many real-world data sets can only be represented by relational data for which object-based clustering algorithms are not designed. Relational data are encountered in many fields including biology, management, industrial engineering, and social sciences. Unlike numerical object data, which are represented by a set of feature values (e.g. height, weight, shoe size) of an object, relational object data are the numerical values of (dis) similarity between objects. For this reason, conventional cluster analysis methods such as k-means and fuzzy c-means cannot be used directly with relational data. I focus on three main problems of cluster analysis of relational data: (i) tendency prior to clustering -- how many clusters are there?; (ii) partitioning of objects -- which objects belong to which cluster?; and (iii) validity of the resultant clusters -- are the partitions \good"?Analyses are included in this dissertation that prove that the Visual Assessment of cluster Tendency (VAT) algorithm has a direct relation to single-linkage hierarchical clustering and Dunn's cluster validity index. These analyses are important to the development of two novel clustering algorithms, CLODD-CLustering in Ordered Dissimilarity Data and ReSL-Rectangular Single-Linkage clustering. Last, this dissertation addresses clustering in ontologies; examples include the Gene Ontology, the MeSH ontology, patient medical records, and web documents. I apply an extension to the Self-Organizing Map (SOM) to produce a new algorithm, the OSOM-Ontological Self-Organizing Map. OSOM provides visualization and linguistic summarization of ontology-based data.




Semantic Data Mining


Book Description

Ontologies are now increasingly used to integrate, and organize data and knowledge, particularly in data and knowledge-intensive applications in both research and industry. The book is devoted to semantic data mining – a data mining approach where domain ontologies are used as background knowledge, and where the new challenge is to mine knowledge encoded in domain ontologies and knowledge graphs, rather than only purely empirical data. The introductory chapters of the book provide theoretical foundations of both data mining and ontology representation. Taking a unified perspective, the book then covers several methods for semantic data mining, addressing tasks such as pattern mining, classification and similarity-based approaches. It attempts to provide state-of-the-art answers to specific challenges and peculiarities of data mining with use of ontologies, in particular: How to deal with incompleteness of knowledge and the so-called Open World Assumption? What is a truly “semantic” similarity measure? The book contains several chapters with examples of applications of semantic data mining. The examples start from a scenario with moderate use of lightweight ontologies for knowledge graph enrichment and end with a full-fledged scenario of an intelligent knowledge discovery assistant using complex domain ontologies for meta-mining, i.e., an ontology-based meta-learning approach to full data mining processes. The book is intended for researchers in the fields of semantic technologies, knowledge engineering, data science, and data mining, and developers of knowledge-based systems and applications.




Data Mining in Biomedicine Using Ontologies


Book Description

Presently, a growing number of ontologies are being built and used for annotating data in biomedical research. Thanks to the tremendous amount of data being generated, ontologies are now being used in numerous ways, including connecting different databases, refining search capabilities, interpreting experimental/clinical data, and inferring knowledge. This cutting-edge resource introduces you to latest developments in bio-ontologies. The book provides you with the theoretical foundations and examples of ontologies, as well as applications of ontologies in biomedicine, from molecular levels to clinical levels. You also find details on technological infrastructure for bio-ontologies. This comprehensive, one-stop volume presents a wide range of practical bio-ontology information, offering you detailed guidance in the clustering of biological data, protein classification, gene and pathway prediction, and text mining. More than 160 illustrations support key topics throughout the book.




Relational Data Clustering


Book Description

A culmination of the authors' years of extensive research on this topic, Relational Data Clustering: Models, Algorithms, and Applications addresses the fundamentals and applications of relational data clustering. It describes theoretic models and algorithms and, through examples, shows how to apply these models and algorithms to solve real-world problems. After defining the field, the book introduces different types of model formulations for relational data clustering, presents various algorithms for the corresponding models, and demonstrates applications of the models and algorithms through extensive experimental results. The authors cover six topics of relational data clustering: Clustering on bi-type heterogeneous relational data Multi-type heterogeneous relational data Homogeneous relational data clustering Clustering on the most general case of relational data Individual relational clustering framework Recent research on evolutionary clustering This book focuses on both practical algorithm derivation and theoretical framework construction for relational data clustering. It provides a complete, self-contained introduction to advances in the field.




Growing Information: Part 2


Book Description




Ontologies and Databases


Book Description

Ontologies and Databases brings together in one place important contributions and up-to-date research results in this fast moving area. Ontologies and Databases serves as an excellent reference, providing insight into some of the most challenging research issues in the field.




Bitemporal Data


Book Description

Bitemporal data has always been important. But it was not until 2011 that the ISO released a SQL standard that supported it. Currently, among major DBMS vendors, Oracle, IBM and Teradata now provide at least some bitemporal functionality in their flagship products. But to use these products effectively, someone in your IT organization needs to know more than how to code bitemporal SQL statements. Perhaps, in your organization, that person is you. To correctly interpret business requests for temporal data, to correctly specify requirements to your IT development staff, and to correctly design bitemporal databases and applications, someone in your enterprise needs a deep understanding of both the theory and the practice of managing bitemporal data. Someone also needs to understand what the future may bring in the way of additional temporal functionality, so their enterprise can plan for it. Perhaps, in your organization, that person is you. This is the book that will show the do-it-yourself IT professional how to design and build bitemporal databases and how to write bitemporal transactions and queries, and will show those who will direct the use of vendor-provided bitemporal DBMSs exactly what is going on "under the covers" of that software. Explains the business value of bitemporal data in terms of the information that can be provided by bitemporal tables and not by any other form of temporal data, including history tables, version tables, snapshot tables, or slowly-changing dimensions Provides an integrated account of the mathematics, logic, ontology and semantics of relational theory and relational databases, in terms of which current relational theory and practice can be seen as unnecessarily constrained to the management of nontemporal and incompletely temporal data Explains how bitemporal tables can provide the time-variance and nonvolatility hitherto lacking in Inmon historical data warehouses Explains how bitemporal dimensions can replace slowly-changing dimensions in Kimball star schemas, and why they should do so Describes several extensions to the current theory and practice of bitemporal data, including the use of episodes, "whenever" temporal transactions and queries, and future transaction time Points out a basic error in the ISO’s bitemporal SQL standard, and warns practitioners against the use of that faulty functionality. Recommends six extensions to the ISO standard which will increase the business value of bitemporal data Points towards a tritemporal future for bitemporal data, in which an Aristotelian ontology and a speech-act semantics support the direct management of the statements inscribed in the rows of relational tables, and add the ability to track the provenance of database content to existing bitemporal databases This book also provides the background needed to become a business ontologist, and explains why an IT data management person, deeply familiar with corporate databases, is best suited to play that role. Perhaps, in your organization, that person is you







Advanced Data Mining and Applications


Book Description

The two-volume set LNAI 7120 and LNAI 7121 constitutes the refereed proceedings of the 7th International Conference on Advanced Data Mining and Applications, ADMA 2011, held in Beijing, China, in December 2011. The 35 revised full papers and 29 short papers presented together with 3 keynote speeches were carefully reviewed and selected from 191 submissions. The papers cover a wide range of topics presenting original research findings in data mining, spanning applications, algorithms, software and systems, and applied disciplines.