Language Modeling for Information Retrieval


Book Description

A statisticallanguage model, or more simply a language model, is a prob abilistic mechanism for generating text. Such adefinition is general enough to include an endless variety of schemes. However, a distinction should be made between generative models, which can in principle be used to synthesize artificial text, and discriminative techniques to classify text into predefined cat egories. The first statisticallanguage modeler was Claude Shannon. In exploring the application of his newly founded theory of information to human language, Shannon considered language as a statistical source, and measured how weH simple n-gram models predicted or, equivalently, compressed natural text. To do this, he estimated the entropy of English through experiments with human subjects, and also estimated the cross-entropy of the n-gram models on natural 1 text. The ability of language models to be quantitatively evaluated in tbis way is one of their important virtues. Of course, estimating the true entropy of language is an elusive goal, aiming at many moving targets, since language is so varied and evolves so quickly. Yet fifty years after Shannon's study, language models remain, by all measures, far from the Shannon entropy liInit in terms of their predictive power. However, tbis has not kept them from being useful for a variety of text processing tasks, and moreover can be viewed as encouragement that there is still great room for improvement in statisticallanguage modeling.




Introduction to Information Retrieval


Book Description

Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.




Dynamic Information Retrieval Modeling


Book Description

Big data and human-computer information retrieval (HCIR) are changing IR. They capture the dynamic changes in the data and dynamic interactions of users with IR systems. A dynamic system is one which changes or adapts over time or a sequence of events. Many modern IR systems and data exhibit these characteristics which are largely ignored by conventional techniques. What is missing is an ability for the model to change over time and be responsive to stimulus. Documents, relevance, users and tasks all exhibit dynamic behavior that is captured in data sets typically collected over long time spans and models need to respond to these changes. Additionally, the size of modern datasets enforces limits on the amount of learning a system can achieve. Further to this, advances in IR interface, personalization and ad display demand models that can react to users in real time and in an intelligent, contextual way. In this book we provide a comprehensive and up-to-date introduction to Dynamic Information Retrieval Modeling, the statistical modeling of IR systems that can adapt to change. We define dynamics, what it means within the context of IR and highlight examples of problems where dynamics play an important role. We cover techniques ranging from classic relevance feedback to the latest applications of partially observable Markov decision processes (POMDPs) and a handful of useful algorithms and tools for solving IR problems incorporating dynamics. The theoretical component is based around the Markov Decision Process (MDP), a mathematical framework taken from the field of Artificial Intelligence (AI) that enables us to construct models that change according to sequential inputs. We define the framework and the algorithms commonly used to optimize over it and generalize it to the case where the inputs aren't reliable. We explore the topic of reinforcement learning more broadly and introduce another tool known as a Multi-Armed Bandit which is useful for cases where exploring model parameters is beneficial. Following this we introduce theories and algorithms which can be used to incorporate dynamics into an IR model before presenting an array of state-of-the-art research that already does, such as in the areas of session search and online advertising. Change is at the heart of modern Information Retrieval systems and this book will help equip the reader with the tools and knowledge needed to understand Dynamic Information Retrieval Modeling.




An Introduction to Neural Information Retrieval


Book Description

Efficient Query Processing for Scalable Web Search will be a valuable reference for researchers and developers working on This tutorial provides an accessible, yet comprehensive, overview of the state-of-the-art of Neural Information Retrieval.




Information Retrieval


Book Description

An introduction to information retrieval, the foundation for modern search engines, that emphasizes implementation and experimentation. Information retrieval is the foundation for modern search engines. This textbook offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpus—a multiuser open-source information retrieval system developed by one of the authors and available online—provides model implementations and a basis for student work. The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. In addition to its classroom use, Information Retrieval will be a valuable reference for professionals in computer science, computer engineering, and software engineering.




Multilingual Information Retrieval


Book Description

We are living in a multilingual world and the diversity in languages which are used to interact with information access systems has generated a wide variety of challenges to be addressed by computer and information scientists. The growing amount of non-English information accessible globally and the increased worldwide exposure of enterprises also necessitates the adaptation of Information Retrieval (IR) methods to new, multilingual settings. Peters, Braschler and Clough present a comprehensive description of the technologies involved in designing and developing systems for Multilingual Information Retrieval (MLIR). They provide readers with broad coverage of the various issues involved in creating systems to make accessible digitally stored materials regardless of the language(s) they are written in. Details on Cross-Language Information Retrieval (CLIR) are also covered that help readers to understand how to develop retrieval systems that cross language boundaries. Their work is divided into six chapters and accompanies the reader step-by-step through the various stages involved in building, using and evaluating MLIR systems. The book concludes with some examples of recent applications that utilise MLIR technologies. Some of the techniques described have recently started to appear in commercial search systems, while others have the potential to be part of future incarnations. The book is intended for graduate students, scholars, and practitioners with a basic understanding of classical text retrieval methods. It offers guidelines and information on all aspects that need to be taken into consideration when building MLIR systems, while avoiding too many ‘hands-on details’ that could rapidly become obsolete. Thus it bridges the gap between the material covered by most of the classical IR textbooks and the novel requirements related to the acquisition and dissemination of information in whatever language it is stored.




Natural Language Processing and Information Retrieval


Book Description

Natural Language Processing and Information Retrieval is a textbook designed to meet the requirements of engineering students pursuing undergraduate and postgraduate programs in computer science and information technology. The book attempts to bridge the gap between theory and practice and would also serve as a useful reference for professionals and researchers working on language-related projects.




Graph-based Natural Language Processing and Information Retrieval


Book Description

Graph theory and the fields of natural language processing and information retrieval are well-studied disciplines. Traditionally, these areas have been perceived as distinct, with different algorithms, different applications and different potential end-users. However, recent research has shown that these disciplines are intimately connected, with a large variety of natural language processing and information retrieval applications finding efficient solutions within graph-theoretical frameworks. This book extensively covers the use of graph-based algorithms for natural language processing and information retrieval. It brings together topics as diverse as lexical semantics, text summarization, text mining, ontology construction, text classification and information retrieval, which are connected by the common underlying theme of the use of graph-theoretical methods for text and information processing tasks. Readers will come away with a firm understanding of the major methods and applications in natural language processing and information retrieval that rely on graph-based representations and algorithms.




Machine Learning and Statistical Modeling Approaches to Image Retrieval


Book Description

In the early 1990s, the establishment of the Internet brought forth a revolutionary viewpoint of information storage, distribution, and processing: the World Wide Web is becoming an enormous and expanding distributed digital library. Along with the development of the Web, image indexing and retrieval have grown into research areas sharing a vision of intelligent agents. Far beyond Web searching, image indexing and retrieval can potentially be applied to many other areas, including biomedicine, space science, biometric identification, digital libraries, the military, education, commerce, culture and entertainment. Machine Learning and Statistical Modeling Approaches to Image Retrieval describes several approaches of integrating machine learning and statistical modeling into an image retrieval and indexing system that demonstrates promising results. The topics of this book reflect authors' experiences of machine learning and statistical modeling based image indexing and retrieval. This book contains detailed references for further reading and research in this field as well.




Web Semantics for Textual and Visual Information Retrieval


Book Description

Modern society exists in a digital era in which high volumes of multimedia information exists. To optimize the management of this data, new methods are emerging for more efficient information retrieval. Web Semantics for Textual and Visual Information Retrieval is a pivotal reference source for the latest academic research on embedding and associating semantics with multimedia information to improve data retrieval techniques. Highlighting a range of pertinent topics such as automation, knowledge discovery, and social networking, this book is ideally designed for researchers, practitioners, students, and professionals interested in emerging trends in information retrieval.