Cross-Language Information Retrieval


Book Description

Search for information is no longer exclusively limited within the native language of the user, but is more and more extended to other languages. This gives rise to the problem of cross-language information retrieval (CLIR), whose goal is to find relevant information written in a different language to a query. In addition to the problems of monolingual information retrieval (IR), translation is the key problem in CLIR: one should translate either the query or the documents from a language to another. However, this translation problem is not identical to full-text machine translation (MT): the goal is not to produce a human-readable translation, but a translation suitable for finding relevant documents. Specific translation methods are thus required. The goal of this book is to provide a comprehensive description of the specific problems arising in CLIR, the solutions proposed in this area, as well as the remaining problems. The book starts with a general description of the monolingual IR and CLIR problems. Different classes of approaches to translation are then presented: approaches using an MT system, dictionary-based translation and approaches based on parallel and comparable corpora. In addition, the typical retrieval effectiveness using different approaches is compared. It will be shown that translation approaches specifically designed for CLIR can rival and outperform high-quality MT systems. Finally, the book offers a look into the future that draws a strong parallel between query expansion in monolingual IR and query translation in CLIR, suggesting that many approaches developed in monolingual IR can be adapted to CLIR. The book can be used as an introduction to CLIR. Advanced readers can also find more technical details and discussions about the remaining research challenges in the future. It is suitable to new researchers who intend to carry out research on CLIR. Table of Contents: Preface / Introduction / Using Manually Constructed Translation Systems and Resources for CLIR / Translation Based on Parallel and Comparable Corpora / Other Methods to Improve CLIR / A Look into the Future: Toward a Unified View of Monolingual IR and CLIR? / References / Author Biography




Information Retrieval


Book Description

This book is an essential reference to cutting-edge issues and future directions in information retrieval Information retrieval (IR) can be defined as the process of representing, managing, searching, retrieving, and presenting information. Good IR involves understanding information needs and interests, developing an effective search technique, system, presentation, distribution and delivery. The increased use of the Web and wider availability of information in this environment led to the development of Web search engines. This change has brought fresh challenges to a wider variety of users’ needs, tasks, and types of information. Today, search engines are seen in enterprises, on laptops, in individual websites, in library catalogues, and elsewhere. Information Retrieval: Searching in the 21st Century focuses on core concepts, and current trends in the field. This book focuses on: Information Retrieval Models User-centred Evaluation of Information Retrieval Systems Multimedia Resource Discovery Image Users’ Needs and Searching Behaviour Web Information Retrieval Mobile Search Context and Information Retrieval Text Categorisation and Genre in Information Retrieval Semantic Search The Role of Natural Language Processing in Information Retrieval: Search for Meaning and Structure Cross-language Information Retrieval Performance Issues in Parallel Computing for Information Retrieval This book is an invaluable reference for graduate students on IR courses or courses in related disciplines (e.g. computer science, information science, human-computer interaction, and knowledge management), academic and industrial researchers, and industrial personnel tracking information search technology developments to understand the business implications. Intermediate-advanced level undergraduate students on IR or related courses will also find this text insightful. Chapters are supplemented with exercises to stimulate further thinking.




Introduction to Information Retrieval


Book Description

Class-tested and coherent, this textbook teaches classical and web information retrieval, including web search and the related areas of text classification and text clustering from basic concepts. It gives an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections. All the important ideas are explained using examples and figures, making it perfect for introductory courses in information retrieval for advanced undergraduates and graduate students in computer science. Based on feedback from extensive classroom experience, the book has been carefully structured in order to make teaching more natural and effective. Slides and additional exercises (with solutions for lecturers) are also available through the book's supporting website to help course instructors prepare their lectures.




Language Modeling for Information Retrieval


Book Description

A statisticallanguage model, or more simply a language model, is a prob abilistic mechanism for generating text. Such adefinition is general enough to include an endless variety of schemes. However, a distinction should be made between generative models, which can in principle be used to synthesize artificial text, and discriminative techniques to classify text into predefined cat egories. The first statisticallanguage modeler was Claude Shannon. In exploring the application of his newly founded theory of information to human language, Shannon considered language as a statistical source, and measured how weH simple n-gram models predicted or, equivalently, compressed natural text. To do this, he estimated the entropy of English through experiments with human subjects, and also estimated the cross-entropy of the n-gram models on natural 1 text. The ability of language models to be quantitatively evaluated in tbis way is one of their important virtues. Of course, estimating the true entropy of language is an elusive goal, aiming at many moving targets, since language is so varied and evolves so quickly. Yet fifty years after Shannon's study, language models remain, by all measures, far from the Shannon entropy liInit in terms of their predictive power. However, tbis has not kept them from being useful for a variety of text processing tasks, and moreover can be viewed as encouragement that there is still great room for improvement in statisticallanguage modeling.




Multilingual Information Retrieval


Book Description

We are living in a multilingual world and the diversity in languages which are used to interact with information access systems has generated a wide variety of challenges to be addressed by computer and information scientists. The growing amount of non-English information accessible globally and the increased worldwide exposure of enterprises also necessitates the adaptation of Information Retrieval (IR) methods to new, multilingual settings. Peters, Braschler and Clough present a comprehensive description of the technologies involved in designing and developing systems for Multilingual Information Retrieval (MLIR). They provide readers with broad coverage of the various issues involved in creating systems to make accessible digitally stored materials regardless of the language(s) they are written in. Details on Cross-Language Information Retrieval (CLIR) are also covered that help readers to understand how to develop retrieval systems that cross language boundaries. Their work is divided into six chapters and accompanies the reader step-by-step through the various stages involved in building, using and evaluating MLIR systems. The book concludes with some examples of recent applications that utilise MLIR technologies. Some of the techniques described have recently started to appear in commercial search systems, while others have the potential to be part of future incarnations. The book is intended for graduate students, scholars, and practitioners with a basic understanding of classical text retrieval methods. It offers guidelines and information on all aspects that need to be taken into consideration when building MLIR systems, while avoiding too many ‘hands-on details’ that could rapidly become obsolete. Thus it bridges the gap between the material covered by most of the classical IR textbooks and the novel requirements related to the acquisition and dissemination of information in whatever language it is stored.




Advances in Computer and Computational Sciences


Book Description

Exchange of information and innovative ideas are necessary to accelerate the development of technology. With advent of technology, intelligent and soft computing techniques came into existence with a wide scope of implementation in engineering sciences. Keeping this ideology in preference, this book includes the insights that reflect the ‘Advances in Computer and Computational Sciences’ from upcoming researchers and leading academicians across the globe. It contains high-quality peer-reviewed papers of ‘International Conference on Computer, Communication and Computational Sciences (ICCCCS 2016), held during 12-13 August, 2016 in Ajmer, India'. These papers are arranged in the form of chapters. The content of the book is divided into two volumes that cover variety of topics such as intelligent hardware and software design, advanced communications, power and energy optimization, intelligent techniques used in internet of things, intelligent image processing, advanced software engineering, evolutionary and soft computing, security and many more. This book helps the perspective readers’ from computer industry and academia to derive the advances of next generation computer and communication technology and shape them into real life applications.




Artificial Intelligence and Security


Book Description

The 3-volume set CCIS 1252 until CCIS 1254 constitutes the refereed proceedings of the 6th International Conference on Artificial Intelligence and Security, ICAIS 2020, which was held in Hohhot, China, in July 2020. The conference was formerly called “International Conference on Cloud Computing and Security” with the acronym ICCCS. The total of 178 full papers and 8 short papers presented in this 3-volume proceedings was carefully reviewed and selected from 1064 submissions. The papers were organized in topical sections as follows: Part I: artificial intelligence; Part II: artificial intelligence; Internet of things; information security; Part III: information security; big data and cloud computing; information processing.




Evaluating Information Retrieval and Access Tasks


Book Description

This open access book summarizes the first two decades of the NII Testbeds and Community for Information access Research (NTCIR). NTCIR is a series of evaluation forums run by a global team of researchers and hosted by the National Institute of Informatics (NII), Japan. The book is unique in that it discusses not just what was done at NTCIR, but also how it was done and the impact it has achieved. For example, in some chapters the reader sees the early seeds of what eventually grew to be the search engines that provide access to content on the World Wide Web, todays smartphones that can tailor what they show to the needs of their owners, and the smart speakers that enrich our lives at home and on the move. We also get glimpses into how new search engines can be built for mathematical formulae, or for the digital record of a lived human life. Key to the success of the NTCIR endeavor was early recognition that information access research is an empirical discipline and that evaluation therefore lay at the core of the enterprise. Evaluation is thus at the heart of each chapter in this book. They show, for example, how the recognition that some documents are more important than others has shaped thinking about evaluation design. The thirty-three contributors to this volume speak for the many hundreds of researchers from dozens of countries around the world who together shaped NTCIR as organizers and participants. This book is suitable for researchers, practitioners, and students--anyone who wants to learn about past and present evaluation efforts in information retrieval, information access, and natural language processing, as well as those who want to participate in an evaluation task or even to design and organize one.




Advances in Information Retrieval


Book Description

This book constitutes the proceedings of the 35th European Conference on IR Research, ECIR 2013, held in Moscow, Russia, in March 2013. The 55 full papers, 38 poster papers and 10 demonstrations presented in this volume were carefully reviewed and selected from 287 submissions. The papers are organized in the following topical sections: user aspects; multimedia and cross-media IR; data mining; IR theory and formal models; IR system architectures; classification; Web; event detection; temporal IR, and microblog search. Also included are 4 tutorial and 2 workshop presentations.




Information Storage and Retrieval Systems


Book Description

Chapter 1 places into perspective a total Information Storage and Retrieval System. This perspective introduces new challenges to the problems that need to be theoretically addressed and commercially implemented. Ten years ago commercial implementation of the algorithms being developed was not realistic, allowing theoreticians to limit their focus to very specific areas. Bounding a problem is still essential in deriving theoretical results. But the commercialization and insertion of this technology into systems like the Internet that are widely being used changes the way problems are bounded. From a theoretical perspective, efficient scalability of algorithms to systems with gigabytes and terabytes of data, operating with minimal user search statement information, and making maximum use of all functional aspects of an information system need to be considered. The dissemination systems using persistent indexes or mail files to modify ranking algorithms and combining the search of structured information fields and free text into a consolidated weighted output are examples of potential new areas of investigation. The best way for the theoretician or the commercial developer to understand the importance of problems to be solved is to place them in the context of a total vision of a complete system. Understanding the differences between Digital Libraries and Information Retrieval Systems will add an additional dimension to the potential future development of systems. The collaborative aspects of digital libraries can be viewed as a new source of information that dynamically could interact with information retrieval techniques.