Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics


Book Description

This book constitutes the proceedings of the Third International Conference of the CLEF Initiative, CLEF 2012, held in Rome, Italy, in September 2012. The 14 papers and 3 poster abstracts presented were carefully reviewed and selected for inclusion in this volume. Furthermore, the books contains 2 keynote papers. The papers are organized in topical sections named: benchmarking and evaluation initiatives; information access; and evaluation methodologies and infrastructure.




Information Access Evaluation. Multilinguality, Multimodality, and Visualization


Book Description

This book constitutes the refereed proceedings of the 4th International Conference of the CLEF Initiative, CLEF 2013, held in Valencia, Spain, in September 2013. The 32 papers and 2 keynotes presented were carefully reviewed and selected for inclusion in this volume. The papers are organized in topical sections named: evaluation and visualization; multilinguality and less-resourced languages; applications; and Lab overviews.




Information Retrieval Evaluation in a Changing World


Book Description

This volume celebrates the twentieth anniversary of CLEF - the Cross-Language Evaluation Forum for the first ten years, and the Conference and Labs of the Evaluation Forum since – and traces its evolution over these first two decades. CLEF’s main mission is to promote research, innovation and development of information retrieval (IR) systems by anticipating trends in information management in order to stimulate advances in the field of IR system experimentation and evaluation. The book is divided into six parts. Parts I and II provide background and context, with the first part explaining what is meant by experimental evaluation and the underlying theory, and describing how this has been interpreted in CLEF and in other internationally recognized evaluation initiatives. Part II presents research architectures and infrastructures that have been developed to manage experimental data and to provide evaluation services in CLEF and elsewhere. Parts III, IV and V represent the core of the book, presenting some of the most significant evaluation activities in CLEF, ranging from the early multilingual text processing exercises to the later, more sophisticated experiments on multimodal collections in diverse genres and media. In all cases, the focus is not only on describing “what has been achieved”, but above all on “what has been learnt”. The final part examines the impact CLEF has had on the research world and discusses current and future challenges, both academic and industrial, including the relevance of IR benchmarking in industrial settings. Mainly intended for researchers in academia and industry, it also offers useful insights and tips for practitioners in industry working on the evaluation and performance issues of IR tools, and graduate students specializing in information retrieval.







Multilingual and Multimodal Information Access Evaluation


Book Description

This book constitutes the refereed proceedings of the Second International Conference on Multilingual and Multimodal Information Access Evaluation, in continuation of the popular CLEF campaigns and workshops that have run for the last decade, CLEF 2011, held in Amsterdem, The Netherlands, in September 2011. The 14 revised full papers presented together with 2 keynote talks were carefully reviewed and selected from numerous submissions. The papers accepted for the conference included research on evaluation methods and settings, natural language processing within different domains and languages, multimedia and reflections on CLEF. Two keynote speakers highlighted important developments in the field of evaluation: the role of users in evaluation and a framework for the use of crowdsourcing experiments in the setting of retrieval evaluation.




Information Access Evaluation -- Multilinguality, Multimodality, and Interaction


Book Description

This book constitutes the refereed proceedings of the 5th International Conference of the CLEF Initiative, CLEF 2014, held in Sheffield, UK, in September 2014. The 11 full papers and 5 short papers presented were carefully reviewed and selected from 30 submissions. They cover a broad range of issues in the fields of multilingual and multimodal information access evaluation, also included are a set of labs and workshops designed to test different aspects of mono and cross-language information retrieval systems




Current Challenges in Patent Information Retrieval


Book Description

This second edition provides a systematic introduction to the work and views of the emerging patent-search research and innovation communities as well as an overview of what has been achieved and, perhaps even more importantly, of what remains to be achieved. It revises many of the contributions of the first edition and adds a significant number of new ones. The first part “Introduction to Patent Searching” includes two overview chapters on the peculiarities of patent searching and on contemporary search technology respectively, and thus sets the scene for the subsequent parts. The second part on “Evaluating Patent Retrieval” then begins with two chapters dedicated to patent evaluation campaigns, followed by two chapters discussing complementary issues from the perspective of patent searchers and from the perspective of related domains, notably legal search. “High Recall Search” includes four completely new chapters dealing with the issue of finding only the relevant documents in a reasonable time span. The last (and with six papers the largest) part on “Special Topics in Patent Information Retrieval” covers a large spectrum of research in the patent field, from classification and image processing to translation. Lastly, the book is completed by an outlook on open issues and future research. Several of the chapters have been jointly written by intellectual property and information retrieval experts. However, members of both communities with a background different to that of the primary author have reviewed the chapters, making the book accessible to both the patent search community and to the information retrieval research community. It also not only offers the latest findings for academic researchers, but is also a valuable resource for IP professionals wanting to learn about current IR approaches in the patent domain.




Information Science and Applications


Book Description

This book presents selected papers from the 10th International Conference on Information Science and Applications (ICISA 2019), held on December 16–18, 2019, in Seoul, Korea, and provides a snapshot of the latest issues regarding technical convergence and convergences of security technologies. It explores how information science is at the core of most current research as well as industrial and commercial activities. The respective chapters cover a broad range of topics, including ubiquitous computing, networks and information systems, multimedia and visualization, middleware and operating systems, security and privacy, data mining and artificial intelligence, software engineering and web technology, as well as applications and problems related to technology convergence, which are reviewed and illustrated with the aid of case studies. Researchers in academia, industry, and at institutes focusing on information science and technology will gain a deeper understanding of the current state of the art in information strategies and technologies for convergence security. ​




Analyzing Non-Textual Content Elements to Detect Academic Plagiarism


Book Description

Identifying plagiarism is a pressing problem for research institutions, publishers, and funding bodies. Current detection methods focus on textual analysis and find copied, moderately reworded, or translated content. However, detecting more subtle forms of plagiarism, including strong paraphrasing, sense-for-sense translations, or the reuse of non-textual content and ideas, remains a challenge. This book presents a novel approach to address this problem—analyzing non-textual elements in academic documents, such as citations, images, and mathematical content. The proposed detection techniques are validated in five evaluations using confirmed plagiarism cases and exploratory searches for new instances. The results show that non-textual elements contain much semantic information, are language-independent, and resilient to typical tactics for concealing plagiarism. Incorporating non-textual content analysis complements text-based detection approaches and increases the detection effectiveness, particularly for disguised forms of plagiarism. The book introduces the first integrated plagiarism detection system that combines citation, image, math, and text similarity analysis. Its user interface features visual aids that significantly reduce the time and effort users must invest in examining content similarity.




Simulating Information Retrieval Test Collections


Book Description

Simulated test collections may find application in situations where real datasets cannot easily be accessed due to confidentiality concerns or practical inconvenience. They can potentially support Information Retrieval (IR) experimentation, tuning, validation, performance prediction, and hardware sizing. Naturally, the accuracy and usefulness of results obtained from a simulation depend upon the fidelity and generality of the models which underpin it. The fidelity of emulation of a real corpus is likely to be limited by the requirement that confidential information in the real corpus should not be able to be extracted from the emulated version. We present a range of methods exploring trade-offs between emulation fidelity and degree of preservation of privacy. We present three different simple types of text generator which work at a micro level: Markov models, neural net models, and substitution ciphers. We also describe macro level methods where we can engineer macro properties of a corpus, giving a range of models for each of the salient properties: document length distribution, word frequency distribution (for independent and non-independent cases), word length and textual representation, and corpus growth. We present results of emulating existing corpora and for scaling up corpora by two orders of magnitude. We show that simulated collections generated with relatively simple methods are suitable for some purposes and can be generated very quickly. Indeed it may sometimes be feasible to embed a simple lightweight corpus generator into an indexer for the purpose of efficiency studies. Naturally, a corpus of artificial text cannot support IR experimentation in the absence of a set of compatible queries. We discuss and experiment with published methods for query generation and query log emulation. We present a proof-of-the-pudding study in which we observe the predictive accuracy of efficiency and effectiveness results obtained on emulated versions of TREC corpora. The study includes three open-source retrieval systems and several TREC datasets. There is a trade-off between confidentiality and prediction accuracy and there are interesting interactions between retrieval systems and datasets. Our tentative conclusion is that there are emulation methods which achieve useful prediction accuracy while providing a level of confidentiality adequate for many applications. Many of the methods described here have been implemented in the open source project SynthaCorpus, accessible at: https://bitbucket.org/davidhawking/synthacorpus/