Methods for Mining and Summarizing Text Conversations


Book Description

Due to the Internet Revolution, human conversational data -- in written forms -- are accumulating at a phenomenal rate. At the same time, improvements in speech technology enable many spoken conversations to be transcribed. Individuals and organizations engage in email exchanges, face-to-face meetings, blogging, texting and other social media activities. The advances in natural language processing provide ample opportunities for these "informal documents" to be analyzed and mined, thus creating numerous new and valuable applications. This book presents a set of computational methods to extract information from conversational data, and to provide natural language summaries of the data. The book begins with an overview of basic concepts, such as the differences between extractive and abstractive summaries, and metrics for evaluating the effectiveness of summarization and various extraction tasks. It also describes some of the benchmark corpora used in the literature. The book introduces extraction and mining methods for performing subjectivity and sentiment detection, topic segmentation and modeling, and the extraction of conversational structure. It also describes frameworks for conducting dialogue act recognition, decision and action item detection, and extraction of thread structure. There is a specific focus on performing all these tasks on conversational data, such as meeting transcripts (which exemplify synchronous conversations) and emails (which exemplify asynchronous conversations). Very recent approaches to deal with blogs, discussion forums and microblogs (e.g., Twitter) are also discussed. The second half of this book focuses on natural language summarization of conversational data. It gives an overview of several extractive and abstractive summarizers developed for emails, meetings, blogs and forums. It also describes attempts for building multi-modal summarizers. Last but not least, the book concludes with thoughts on topics for further development. Table of Contents: Introduction / Background: Corpora and Evaluation Methods / Mining Text Conversations / Summarizing Text Conversations / Conclusions / Final Thoughts




Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications


Book Description

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications brings together all the information, tools and methods a professional will need to efficiently use text mining applications and statistical analysis. Winner of a 2012 PROSE Award in Computing and Information Sciences from the Association of American Publishers, this book presents a comprehensive how-to reference that shows the user how to conduct text mining and statistically analyze results. In addition to providing an in-depth examination of core text mining and link detection tools, methods and operations, the book examines advanced preprocessing techniques, knowledge representation considerations, and visualization approaches. Finally, the book explores current real-world, mission-critical applications of text mining and link detection using real world example tutorials in such varied fields as corporate, finance, business intelligence, genomics research, and counterterrorism activities. The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase dramatically. - Extensive case studies, most in a tutorial format, allow the reader to 'click through' the example using a software program, thus learning to conduct text mining analyses in the most rapid manner of learning possible - Numerous examples, tutorials, power points and datasets available via companion website on Elsevierdirect.com - Glossary of text mining terms provided in the appendix




Veracity of Data


Book Description

On the Web, a massive amount of user-generated content is available through various channels (e.g., texts, tweets, Web tables, databases, multimedia-sharing platforms, etc.). Conflicting information, rumors, erroneous and fake content can be easily spread across multiple sources, making it hard to distinguish between what is true and what is not. This book gives an overview of fundamental issues and recent contributions for ascertaining the veracity of data in the era of Big Data. The text is organized into six chapters, focusing on structured data extracted from texts. Chapter 1 introduces the problem of ascertaining the veracity of data in a multi-source and evolving context. Issues related to information extraction are presented in Chapter 2. Current truth discovery computation algorithms are presented in details in Chapter 3. It is followed by practical techniques for evaluating data source reputation and authoritativeness in Chapter 4. The theoretical foundations and various approaches for modeling diffusion phenomenon of misinformation spreading in networked systems are studied in Chapter 5. Finally, truth discovery computation from extracted data in a dynamic context of misinformation propagation raises interesting challenges that are explored in Chapter 6. This text is intended for a seminar course at the graduate level. It is also to serve as a useful resource for researchers and practitioners who are interested in the study of fact-checking, truth discovery, or rumor spreading.




Data Management in the Cloud


Book Description

Cloud computing has emerged as a successful paradigm of service-oriented computing and has revolutionized the way computing infrastructure is used. This success has seen a proliferation in the number of applications that are being deployed in various cloud platforms. There has also been an increase in the scale of the data generated as well as consumed by such applications. Scalable database management systems form a critical part of the cloud infrastructure. The attempt to address the challenges posed by the management of big data has led to a plethora of systems. This book aims to clarify some of the important concepts in the design space of scalable data management in cloud computing infrastructures. Some of the questions that this book aims to answer are: the appropriate systems for a specific set of application requirements, the research challenges in data management for the cloud, and what is novel in the cloud for database researchers? We also aim to address one basic question: whether cloud computing poses new challenges in scalable data management or it is just a reincarnation of old problems? We provide a comprehensive background study of state-of-the-art systems for scalable data management and analysis. We also identify important aspects in the design of different systems and the applicability and scope of these systems. A thorough understanding of current solutions and a precise characterization of the design space are essential for clearing the "cloudy skies of data management" and ensuring the success of DBMSs in the cloud, thus emulating the success enjoyed by relational databases in traditional enterprise settings. Table of Contents: Introduction / Distributed Data Management / Cloud Data Management: Early Trends / Transactions on Co-located Data / Transactions on Distributed Data / Multi-tenant Database Systems / Concluding Remarks




Generating Plans from Proofs


Book Description

Query reformulation refers to a process of translating a source query—a request for information in some high-level logic-based language—into a target plan that abides by certain interface restrictions. Many practical problems in data management can be seen as instances of the reformulation problem. For example, the problem of translating an SQL query written over a set of base tables into another query written over a set of views; the problem of implementing a query via translating to a program calling a set of database APIs; the problem of implementing a query using a collection of web services. In this book we approach query reformulation in a very general setting that encompasses all the problems above, by relating it to a line of research within mathematical logic. For many decades logicians have looked at the problem of converting "implicit definitions" into "explicit definitions," using an approach known as interpolation. We will review the theory of interpolation, and explain its close connection with query reformulation. We will give a detailed look at how the interpolation-based approach is used to generate translations between logic-based queries over different vocabularies, and also how it can be used to go from logic-based queries to programs.




Non-Volatile Memory Database Management Systems


Book Description

This book explores the implications of non-volatile memory (NVM) for database management systems (DBMSs). The advent of NVM will fundamentally change the dichotomy between volatile memory and durable storage in DBMSs. These new NVM devices are almost as fast as volatile memory, but all writes to them are persistent even after power loss. Existing DBMSs are unable to take full advantage of this technology because their internal architectures are predicated on the assumption that memory is volatile. With NVM, many of the components of legacy DBMSs are unnecessary and will degrade the performance of data-intensive applications. We present the design and implementation of DBMS architectures that are explicitly tailored for NVM. The book focuses on three aspects of a DBMS: (1) logging and recovery, (2) storage and buffer management, and (3) indexing. First, we present a logging and recovery protocol that enables the DBMS to support near-instantaneous recovery. Second, we propose a storage engine architecture and buffer management policy that leverages the durability and byte-addressability properties of NVM to reduce data duplication and data migration. Third, the book presents the design of a range index tailored for NVM that is latch-free yet simple to implement. All together, the work described in this book illustrates that rethinking the fundamental algorithms and data structures employed in a DBMS for NVM improves performance and availability, reduces operational cost, and simplifies software development.




Database Repairs and Consistent Query Answering


Book Description

Integrity constraints are semantic conditions that a database should satisfy in order to be an appropriate model of external reality. In practice, and for many reasons, a database may not satisfy those integrity constraints, and for that reason it is said to be inconsistent. However, and most likely, a large portion of the database is still semantically correct, in a sense that has to be made precise. After having provided a formal characterization of consistent data in an inconsistent database, the natural problem emerges of extracting that semantically correct data, as query answers. The consistent data in an inconsistent database is usually characterized as the data that persists across all the database instances that are consistent and minimally differ from the inconsistent instance. Those are the so-called repairs of the database. In particular, the consistent answers to a query posed to the inconsistent database are those answers that can be simultaneously obtained from all the database repairs. As expected, the notion of repair requires an adequate notion of distance that allows for the comparison of databases with respect to how much they differ from the inconsistent instance. On this basis, the minimality condition on repairs can be properly formulated. In this monograph we present and discuss these fundamental concepts, different repair semantics, algorithms for computing consistent answers to queries, and also complexity-theoretic results related to the computation of repairs and doing consistent query answering. Table of Contents: Introduction / The Notions of Repair and Consistent Answer / Tractable CQA and Query Rewriting / Logically Specifying Repairs / Decision Problems in CQA: Complexity and Algorithms / Repairs and Data Cleaning




Managing Event Information


Book Description

With the proliferation of citizen reporting, smart mobile devices, and social media, an increasing number of people are beginning to generate information about events they observe and participate in. A significant fraction of this information contains multimedia data to share the experience with their audience. A systematic information modeling and management framework is necessary to capture this widely heterogeneous, schemaless, potentially humongous information produced by many different people. This book is an attempt to examine the modeling, storage, querying, and applications of such an event management system in a holistic manner. It uses a semantic-web style graph-based view of events, and shows how this event model, together with its query facility, can be used toward emerging applications like semi-automated storytelling. Table of Contents: Introduction / Event Data Models / Implementing an Event Data Model / Querying Events / Storytelling with Events / An Emerging Application / Conclusion




Declarative Networking


Book Description

Declarative Networking is a programming methodology that enables developers to concisely specify network protocols and services, which are directly compiled to a dataflow framework that executes the specifications. Declarative networking proposes the use of a declarative query language for specifying and implementing network protocols, and employs a dataflow framework at runtime for communication and maintenance of network state. The primary goal of declarative networking is to greatly simplify the process of specifying, implementing, deploying and evolving a network design. In addition, declarative networking serves as an important step towards an extensible, evolvable network architecture that can support flexible, secure and efficient deployment of new network protocols. This book provides an introduction to basic issues in declarative networking, including language design, optimization and dataflow execution. The methodology behind declarative programming of networks is presented, including roots in Datalog, extensions for networked environments, and the semantics of long-running queries over network state. The book focuses on a representative declarative networking language called Network Datalog (NDlog), which is based on extensions to the Datalog recursive query language. An overview of declarative network protocols written in NDlog is provided, and its usage is illustrated using examples from routing protocols and overlay networks. This book also describes the implementation of a declarative networking engine and NDlog execution strategies that provide eventual consistency semantics with significant flexibility in execution. Two representative declarative networking systems (P2 and its successor RapidNet) are presented. Finally, the book highlights recent advances in declarative networking, and new declarative approaches to related problems. Table of Contents: Introduction / Declarative Networking Language / Declarative Networking Overview / Distributed Recursive Query Processing / Declarative Routing / Declarative Overlays / Optimization of NDlog / Recent Advances in Declarative Networking / Conclusion




Multimodal Signal Processing


Book Description

A comprehensive synthesis of recent advances in multimodal signal processing applications for human interaction analysis and meeting support technology. With directly applicable methods and metrics along with benchmark results, this guide is ideal for those interested in multimodal signal processing, its component disciplines and its application to human interaction analysis.