Knowledge Graphs and Big Data Processing


Book Description

This open access book is part of the LAMBDA Project (Learning, Applying, Multiplying Big Data Analytics), funded by the European Union, GA No. 809965. Data Analytics involves applying algorithmic processes to derive insights. Nowadays it is used in many industries to allow organizations and companies to make better decisions as well as to verify or disprove existing theories or models. The term data analytics is often used interchangeably with intelligence, statistics, reasoning, data mining, knowledge discovery, and others. The goal of this book is to introduce some of the definitions, methods, tools, frameworks, and solutions for big data processing, starting from the process of information extraction and knowledge representation, via knowledge processing and analytics to visualization, sense-making, and practical applications. Each chapter in this book addresses some pertinent aspect of the data processing chain, with a specific focus on understanding Enterprise Knowledge Graphs, Semantic Big Data Architectures, and Smart Data Analytics solutions. This book is addressed to graduate students from technical disciplines, to professional audiences following continuous education short courses, and to researchers from diverse areas following self-study courses. Basic skills in computer science, mathematics, and statistics are required.




Exploiting Linked Data and Knowledge Graphs in Large Organisations


Book Description

This book addresses the topic of exploiting enterprise-linked data with a particular focus on knowledge construction and accessibility within enterprises. It identifies the gaps between the requirements of enterprise knowledge consumption and “standard” data consuming technologies by analysing real-world use cases, and proposes the enterprise knowledge graph to fill such gaps. It provides concrete guidelines for effectively deploying linked-data graphs within and across business organizations. It is divided into three parts, focusing on the key technologies for constructing, understanding and employing knowledge graphs. Part 1 introduces basic background information and technologies, and presents a simple architecture to elucidate the main phases and tasks required during the lifecycle of knowledge graphs. Part 2 focuses on technical aspects; it starts with state-of-the art knowledge-graph construction approaches, and then discusses exploration and exploitation techniques as well as advanced question-answering topics concerning knowledge graphs. Lastly, Part 3 demonstrates examples of successful knowledge graph applications in the media industry, healthcare and cultural heritage, and offers conclusions and future visions.




Knowledge Graphs


Book Description

A rigorous and comprehensive textbook covering the major approaches to knowledge graphs, an active and interdisciplinary area within artificial intelligence. The field of knowledge graphs, which allows us to model, process, and derive insights from complex real-world data, has emerged as an active and interdisciplinary area of artificial intelligence over the last decade, drawing on such fields as natural language processing, data mining, and the semantic web. Current projects involve predicting cyberattacks, recommending products, and even gleaning insights from thousands of papers on COVID-19. This textbook offers rigorous and comprehensive coverage of the field. It focuses systematically on the major approaches, both those that have stood the test of time and the latest deep learning methods.




Knowledge Graphs


Book Description

This book provides a comprehensive and accessible introduction to knowledge graphs, which have recently garnered notable attention from both industry and academia. Knowledge graphs are founded on the principle of applying a graph-based abstraction to data, and are now broadly deployed in scenarios that require integrating and extracting value from multiple, diverse sources of data at large scale. The book defines knowledge graphs and provides a high-level overview of how they are used. It presents and contrasts popular graph models that are commonly used to represent data as graphs, and the languages by which they can be queried before describing how the resulting data graph can be enhanced with notions of schema, identity, and context. The book discusses how ontologies and rules can be used to encode knowledge as well as how inductive techniques—based on statistics, graph analytics, machine learning, etc.—can be used to encode and extract knowledge. It covers techniques for the creation, enrichment, assessment, and refinement of knowledge graphs and surveys recent open and enterprise knowledge graphs and the industries or applications within which they have been most widely adopted. The book closes by discussing the current limitations and future directions along which knowledge graphs are likely to evolve. This book is aimed at students, researchers, and practitioners who wish to learn more about knowledge graphs and how they facilitate extracting value from diverse data at large scale. To make the book accessible for newcomers, running examples and graphical notation are used throughout. Formal definitions and extensive references are also provided for those who opt to delve more deeply into specific topics.




Big Data Analytics for Time-Critical Mobility Forecasting


Book Description

This book provides detailed descriptions of big data solutions for activity detection and forecasting of very large numbers of moving entities spread across large geographical areas. It presents state-of-the-art methods for processing, managing, detecting and predicting trajectories and important events related to moving entities, together with advanced visual analytics methods, over multiple heterogeneous, voluminous, fluctuating and noisy data streams from moving entities, correlating them with data from archived data sources expressing e.g. entities’ characteristics, geographical information, mobility patterns, mobility regulations and intentional data. The book is divided into six parts: Part I discusses the motivation and background of mobility forecasting supported by trajectory-oriented analytics, and includes specific problems and challenges in the aviation (air-traffic management) and the maritime domains. Part II focuses on big data quality assessment and processing, and presents novel technologies suitable for mobility analytics components. Next, Part III describes solutions toward processing and managing big spatio-temporal data, particularly enriching data streams and integrating streamed and archival data to provide coherent views of mobility, and storing of integrated mobility data in large distributed knowledge graphs for efficient query-answering. Part IV focuses on mobility analytics methods exploiting (online) processed, synopsized and enriched data streams as well as (offline) integrated, archived mobility data, and highlights future location and trajectory prediction methods, distinguishing between short-term and more challenging long-term predictions. Part V examines how methods addressing data management, data processing and mobility analytics are integrated in big data architectures with distinctive characteristics compared to other known big data paradigmatic architectures. Lastly, Part VI covers important ethical issues that research on mobility analytics should address. Providing novel approaches and methodologies related to mobility detection and forecasting needs based on big data exploration, processing, storage, and analysis, this book will appeal to computer scientists and stakeholders in various application domains.




Graph-Powered Machine Learning


Book Description

Upgrade your machine learning models with graph-based algorithms, the perfect structure for complex and interlinked data. Summary In Graph-Powered Machine Learning, you will learn: The lifecycle of a machine learning project Graphs in big data platforms Data source modeling using graphs Graph-based natural language processing, recommendations, and fraud detection techniques Graph algorithms Working with Neo4J Graph-Powered Machine Learning teaches to use graph-based algorithms and data organization strategies to develop superior machine learning applications. You’ll dive into the role of graphs in machine learning and big data platforms, and take an in-depth look at data source modeling, algorithm design, recommendations, and fraud detection. Explore end-to-end projects that illustrate architectures and help you optimize with best design practices. Author Alessandro Negro’s extensive experience shines through in every chapter, as you learn from examples and concrete scenarios based on his work with real clients! Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the technology Identifying relationships is the foundation of machine learning. By recognizing and analyzing the connections in your data, graph-centric algorithms like K-nearest neighbor or PageRank radically improve the effectiveness of ML applications. Graph-based machine learning techniques offer a powerful new perspective for machine learning in social networking, fraud detection, natural language processing, and recommendation systems. About the book Graph-Powered Machine Learning teaches you how to exploit the natural relationships in structured and unstructured datasets using graph-oriented machine learning algorithms and tools. In this authoritative book, you’ll master the architectures and design practices of graphs, and avoid common pitfalls. Author Alessandro Negro explores examples from real-world applications that connect GraphML concepts to real world tasks. What's inside Graphs in big data platforms Recommendations, natural language processing, fraud detection Graph algorithms Working with the Neo4J graph database About the reader For readers comfortable with machine learning basics. About the author Alessandro Negro is Chief Scientist at GraphAware. He has been a speaker at many conferences, and holds a PhD in Computer Science. Table of Contents PART 1 INTRODUCTION 1 Machine learning and graphs: An introduction 2 Graph data engineering 3 Graphs in machine learning applications PART 2 RECOMMENDATIONS 4 Content-based recommendations 5 Collaborative filtering 6 Session-based recommendations 7 Context-aware and hybrid recommendations PART 3 FIGHTING FRAUD 8 Basic approaches to graph-powered fraud detection 9 Proximity-based algorithms 10 Social network analysis against fraud PART 4 TAMING TEXT WITH GRAPHS 11 Graph-based natural language processing 12 Knowledge graphs




Relevant Search


Book Description

Summary Relevant Search demystifies relevance work. Using Elasticsearch, it teaches you how to return engaging search results to your users, helping you understand and leverage the internals of Lucene-based search engines. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the Technology Users are accustomed to and expect instant, relevant search results. To achieve this, you must master the search engine. Yet for many developers, relevance ranking is mysterious or confusing. About the Book Relevant Search demystifies the subject and shows you that a search engine is a programmable relevance framework. You'll learn how to apply Elasticsearch or Solr to your business's unique ranking problems. The book demonstrates how to program relevance and how to incorporate secondary data sources, taxonomies, text analytics, and personalization. In practice, a relevance framework requires softer skills as well, such as collaborating with stakeholders to discover the right relevance requirements for your business. By the end, you'll be able to achieve a virtuous cycle of provable, measurable relevance improvements over a search product's lifetime. What's Inside Techniques for debugging relevance? Applying search engine features to real problems? Using the user interface to guide searchers? A systematic approach to relevance? A business culture focused on improving search About the Reader For developers trying to build smarter search with Elasticsearch or Solr. About the Authors Doug Turnbull is lead relevance consultant at OpenSource Connections, where he frequently speaks and blogs. John Berryman is a data engineer at Eventbrite, where he specializes in recommendations and search. Foreword author, Trey Grainger, is a director of engineering at CareerBuilder and author of Solr in Action. Table of Contents The search relevance problem Search under the hood Debugging your first relevance problem Taming tokens Basic multifield search Term-centric search Shaping the relevance function Providing relevance feedback Designing a relevance-focused search application The relevance-centered enterprise Semantic and personalized search




Big Data Processing with Apache Spark


Book Description

Apache Spark is a popular open-source big-data processing framework thatÕs built around speed, ease of use, and unified distributed computing architecture. Not only it supports developing applications in different languages like Java, Scala, Python, and R, itÕs also hundred times faster in memory and ten times faster even when running on disk compared to traditional data processing frameworks. Whether you are currently working on a big data project or interested in learning more about topics like machine learning, streaming data processing, and graph data analytics, this book is for you. You can learn about Apache Spark and develop Spark programs for various use cases in big data analytics using the code examples provided. This book covers all the libraries in Spark ecosystem: Spark Core, Spark SQL, Spark Streaming, Spark ML, and Spark GraphX.




Systems for Big Graph Analytics


Book Description

There has been a surging interest in developing systems for analyzing big graphs generated by real applications, such as online social networks and knowledge graphs. This book aims to help readers get familiar with the computation models of various graph processing systems with minimal time investment. This book is organized into three parts, addressing three popular computation models for big graph analytics: think-like-a-vertex, think-likea- graph, and think-like-a-matrix. While vertex-centric systems have gained great popularity, the latter two models are currently being actively studied to solve graph problems that cannot be efficiently solved in vertex-centric model, and are the promising next-generation models for big graph analytics. For each part, the authors introduce the state-of-the-art systems, emphasizing on both their technical novelties and hands-on experiences of using them. The systems introduced include Giraph, Pregel+, Blogel, GraphLab, CraphChi, X-Stream, Quegel, SystemML, etc. Readers will learn how to design graph algorithms in various graph analytics systems, and how to choose the most appropriate system for a particular application at hand. The target audience for this book include beginners who are interested in using a big graph analytics system, and students, researchers and practitioners who would like to build their own graph analytics systems with new features.




The Elements of Big Data Value


Book Description

This open access book presents the foundations of the Big Data research and innovation ecosystem and the associated enablers that facilitate delivering value from data for business and society. It provides insights into the key elements for research and innovation, technical architectures, business models, skills, and best practices to support the creation of data-driven solutions and organizations. The book is a compilation of selected high-quality chapters covering best practices, technologies, experiences, and practical recommendations on research and innovation for big data. The contributions are grouped into four parts: · Part I: Ecosystem Elements of Big Data Value focuses on establishing the big data value ecosystem using a holistic approach to make it attractive and valuable to all stakeholders. · Part II: Research and Innovation Elements of Big Data Value details the key technical and capability challenges to be addressed for delivering big data value. · Part III: Business, Policy, and Societal Elements of Big Data Value investigates the need to make more efficient use of big data and understanding that data is an asset that has significant potential for the economy and society. · Part IV: Emerging Elements of Big Data Value explores the critical elements to maximizing the future potential of big data value. Overall, readers are provided with insights which can support them in creating data-driven solutions, organizations, and productive data ecosystems. The material represents the results of a collective effort undertaken by the European data community as part of the Big Data Value Public-Private Partnership (PPP) between the European Commission and the Big Data Value Association (BDVA) to boost data-driven digital transformation.