From People to Entities: New Semantic Search Paradigms for the Web


Book Description

The exponential growth of digital information available in companies and on the Web creates the need for search tools that can respond to the most sophisticated information needs. Many user tasks would be simplified if Search Engines would support typed search, and return entities instead of just Web documents. For example, an executive who tries to solve a problem needs to find people in the company who are knowledgeable about a certain topic._x000D_ In the first part of the book, we propose a model for expert finding based on the well-consolidated vector space model for Information Retrieval and investigate its effectiveness. In the second part of the book, we investigate different methods based on Semantic Web and Natural Language Processing techniques for ranking entities of different types both in Wikipedia and, generally, on the Web. _x000D_ In the third part of this thesis, we study the problem of Entity Retrieval for news applications and the importance of the news trail history (i.e., past related articles) to determine the relevant entities in current articles. We also study opinion evolution about entities: We propose a method for automatically extracting the public opinion about political candidates from the blogosphere.




Semantic Search for Novel Information


Book Description

In this book, new approaches are presented for detecting and extracting simultaneously relevant and novel information from unstructured text documents. A major contribution of these approaches is that the information already provided and the extracted information are modeled semantically. This leads to the following benefits: (a) ambiguities in the language can be resolved; (b) the exact information needs regarding relevance and novelty can be specified; and (c) knowledge graphs can be incorporated. More specifically, this book presents the following scientific contributions: 1. An assessment of the suitability of existing large knowledge graphs (namely, DBpedia, Freebase, OpenCyc, Wikidata, and YAGO) for the task of detecting novel information in text documents. 2. A description of an approach by which emerging entities that are missing in a knowledge graph are detected in a stream of text documents. 3. A suggestion for an approach to extracting novel, relevant, semantically-structured statements from text documents. The developed approaches are suitable for the recommendation of emerging entities and novel statements respectively, for the purpose of knowledge graph population, and for providing assistance to users requiring novel information, such as journalists and technology scouts.




Populating a Linked Data Entity Name System


Book Description

Resource Description Framework (RDF) is a graph-based data model used to publish data as a Web of Linked Data. RDF is an emergent foundation for large-scale data integration, the problem of providing a unified view over multiple data sources. An Entity Name System (ENS) is a thesaurus for entities, and is a crucial component in a data integration architecture. Populating a Linked Data ENS is equivalent to solving an Artificial Intelligence problem called instance matching, which concerns identifying pairs of entities referring to the same underlying entity. This publication presents an instance matcher with 4 properties, namely automation, heterogeneity, scalability and domain independence. Automation is addressed by employing inexpensive but well-performing heuristics to automatically generate a training set, which is employed by other machine learning algorithms in the pipeline. Data-driven alignment algorithms are adapted to deal with structural heterogeneity in RDF graphs. Domain independence is established by actively avoiding prior assumptions about input domains, and through evaluations on 10 RDF test cases. The full system is scaled by implementing it on cloud infrastructure using MapReduce algorithms. Resource Description Framework (RDF) is a graph-based data model used to publish data as a Web of Linked Data. RDF is an emergent foundation for large-scale data integration, the problem of providing a unified view over multiple data sources. An Entity Name System (ENS) is a thesaurus for entities, and is a crucial component in a data integration architecture. Populating a Linked Data ENS is equivalent to solving an Artificial Intelligence problem called instance matching, which concerns identifying pairs of entities referring to the same underlying entity. This publication presents an instance matcher with 4 properties, namely automation, heterogeneity, scalability and domain independence. Automation is addressed by employing inexpensive but well-performing heuristics to automatically generate a training set, which is employed by other machine learning algorithms in the pipeline. Data-driven alignment algorithms are adapted to deal with structural heterogeneity in RDF graphs. Domain independence is established by actively avoiding prior assumptions about input domains, and through evaluations on 10 RDF test cases. The full system is scaled by implementing it on cloud infrastructure using MapReduce algorithms.




Probabilistic Semantic Web


Book Description

The management of uncertainty in the Semantic Web is of foremost importance given the nature and origin of the available data. This book presents a probabilistic semantics for knowledge bases, DISPONTE, which is inspired by the distribution semantics of Probabilistic Logic Programming. The book also describes approaches for inference and learning. In particular, it discusses 3 reasoners and 2 learning algorithms. BUNDLE and TRILL are able to find explanations for queries and compute their probability with regard to DISPONTE KBs while TRILLP compactly represents explanations using a Boolean formula and computes the probability of queries. The system EDGE learns the parameters of axioms of DISPONTE KBs. To reduce the computational cost, EDGEMR performs distributed parameter learning. LEAP learns both the structure and parameters of KBs, with LEAPMR using EDGEMR for reducing the computational cost. The algorithms provide effective techniques for dealing with uncertain KBs and have been widely tested on various datasets and compared with state of the art systems.




Advances in Ontology Design and Patterns


Book Description

The study of patterns in the context of ontology engineering for the semantic web was pioneered more than a decade ago by Blomqvist, Sandkuhl and Gangemi. Since then, this line of research has flourished and led to the development of ontology design patterns, knowledge patterns, and linked data patterns: the patterns as they are known by ontology designers, knowledge engineers, and linked data publishers, respectively. A key characteristic of those patterns is that they are modular and reusable solutions to recurrent problems in ontology engineering and linked data publishing. This book contains recent contributions which advance the state of the art on theory and use of ontology design patterns. The papers collected in this book cover a range of topics, from a method to instantiate content patterns, a proposal on how to document a content pattern, to a number of patterns emerging in ontology modeling in various situations.




Semantic Sentiment Analysis in Social Streams


Book Description

Microblogs and social media platforms are now considered among the most popular forms of online communication. Through a platform like Twitter, much information reflecting people’s opinions and attitudes is published and shared among users on a daily basis. This has recently brought great opportunities to companies interested in tracking and monitoring the reputation of their brands and businesses, and to policy makers and politicians to support their assessment of public opinions about their policies or political issues. A wide range of approaches to sentiment analysis on social media, have been recently built. Most of these approaches rely mainly on the presence of affect words or syntactic structures that explicitly and unambiguously reflect sentiment. However, these approaches are semantically weak, that is, they do not account for the semantics of words when detecting their sentiment in text. In order to address this problem, the author investigates the role of word semantics in sentiment analysis of microblogs. Specifically, Twitter is used as a case study of microblogging platforms to investigate whether capturing the sentiment of words with respect to their semantics leads to more accurate sentiment analysis models on Twitter. To this end, the author proposes several approaches in this book for extracting and incorporating two types of word semantics for sentiment analysis: contextual semantics (i.e., semantics captured from words’ co-occurrences) and conceptual semantics (i.e., semantics extracted from external knowledge sources). Experiments are conducted with both types of semantics by assessing their impact in three popular sentiment analysis tasks on Twitter; entity-level sentiment analysis, tweet-level sentiment analysis and context-sensitive sentiment lexicon adaptation. The findings from this body of work demonstrate the value of using semantics in sentiment analysis on Twitter. The proposed approaches, which consider word semantics for sentiment analysis at both entity and tweet levels, surpass non-semantic approaches in most evaluation scenarios. This book will be of interest to students, researchers and practitioners in the semantic sentiment analysis field.




Semantic Web Enabled Software Engineering


Book Description

Over the last decade, ontology has become an important modeling component in software engineering. Semantic Web Enabled Software Engineering presents some critical findings on opening a new direction of the research of Software Engineering, by exploiting Semantic Web technologies. Most of these findings are from selected papers from the Semantic Web Enabled Software Engineering (SWESE) series of workshops starting from 2005. Edited by two leading researchers, this advanced text presents a unifying and contemporary perspective on the field. The book integrates in one volume a unified perspective on concepts and theories of connecting Software Engineering and Semantic Web. It presents state-of-the-art techniques on how to use Semantic Web technologies in Software Engineering and introduces techniques on how to design ontologies for Software Engineering.




The Semantic Web in Earth and Space Science. Current Status and Future Directions


Book Description

The geosciences are one of the fields leading the way in advancing semantic technologies. This book continues the dialogue and feedback between the geoscience and semantic web communities. Increasing data volumes within the geosciences makes it no longer practical to copy data and perform local analysis. Hypotheses are now being tested through online tools that combine and mine pools of data. This evolution in the way research is conducted is commonly referred to as e-Science. As e-Science has flourished, the barriers to free and open access to data have been lowered and the need for semantics has been heighted. As the volume, complexity, and heterogeneity of data resources grow, geoscientists are creating new capabilities that rely on semantic approaches. Geoscience researchers are actively working toward a research environment of software tools and interfaces to data archives and services with the goals of full-scale semantic integration beginning to take shape. The members of this emerging semantic e-Science community are increasingly in need of semantic-based methodologies, tools and infrastructure. A feedback system between the geo- and computational sciences is forming. Advances in knowledge modeling, logic-based hypothesis checking, semantic data integration, and knowledge discovery are leading to advances in scientific domains, which in turn are validating semantic approaches and pointing to new research directions. We present mature semantic applications within the geosciences and stimulate discussion on emerging challenges and new research directions.




Querying a Web of Linked Data


Book Description

In recent years, an increasing number of organizations and individuals have contributed to the Semantic Web by publishing data according to the Linked Data principles. In addition, a significant body of Semantic Web research exists that studies various aspects of knowledge representation and automated reasoning over collections of such data. However, a challenge that is crucial for achieving the vision of a Semantic Web – but that has not yet been studied to a comparable extent – is to enable automated software agents to operate directly on decentralized Linked Data that is distributed over the WWW. In particular, fundamental questions related to querying this data on the WWW have received very limited research attention. This book contributes towards filling this gap by studying the foundations of declarative queries over Linked Data on the WWW. Our particular focus in this book are approaches to use the SPARQL query language and execute queries by traversing Linked Data live during the query execution process. More specifically, we first provide formal foundations to adapt SPARQL to the given context. Thereafter, we use an abstract machine model to formally show computational feasibility and related properties of the resulting types of SPARQL queries. Additionally, we investigate fundamental properties of applying the traversal-based approach to query execution that is tailored to the use case of querying Linked Data directly on the WWW.




Reasoning Techniques for the Web of Data


Book Description

Linked Data publishing has brought about a novel “Web of Data”: a wealth of diverse, interlinked, structured data published on the Web. These Linked Datasets are described using the Semantic Web standards and are openly available to all, produced by governments, businesses, communities and academia alike. However, the heterogeneity of such data – in terms of how resources are described and identified – poses major challenges to potential consumers. Herein, we examine use cases for pragmatic, lightweight reasoning techniques that leverage Web vocabularies (described in RDFS and OWL) to better integrate large scale, diverse, Linked Data corpora. We take a test corpus of 1.1 billion RDF statements collected from 4 million RDF Web documents and analyse the use of RDFS and OWL therein. We then detail and evaluate scalable and distributed techniques for applying rule-based materialisation to translate data between different vocabularies, and to resolve coreferent resources that talk about the same thing. We show how such techniques can be made robust in the face of noisy and often impudent Web data. We also examine a use case for incorporating a PagerRank-style algorithm to rank the trustworthiness of facts produced by reasoning, subsequently using those ranks to fix formal contradictions in the data. All of our methods are validated against our real world, large scale, open domain, Linked Data evaluation corpus.