Text Corpora and Multilingual Lexicography


Book Description

The contributions in this volume (first published as a Special Issue of International Journal of Corpus Linguistics 6 (2001)) evolved from the EU-funded project Trans-European Language Resources Infrastructure (TELRI) and deal with various aspects of multilingual corpus linguistics. The topics reach from building parallel corpora over annotation issues and questions concerning terminology extraction to bilingual and multilingual lexicography; the statistical properties of parallel corpora and the practice of translators; and the role of corpus linguistics for multilingual language technology.




Text Corpora and Multilingual Lexicography


Book Description

The contributions in this volume (first published as a Special Issue of International Journal of Corpus Linguistics 6 (2001)) evolved from the EU-funded project Trans-European Language Resources Infrastructure (TELRI) and deal with various aspects of multilingual corpus linguistics. The topics reach from building parallel corpora over annotation issues and questions concerning terminology extraction to bilingual and multilingual lexicography; the statistical properties of parallel corpora and the practice of translators; and the role of corpus linguistics for multilingual language technology.







Studies on Multilingual Lexicography


Book Description

Given the new technological advances and their influence and imprint in the design and development of dictionaries and lexicographic resources, it seems important to put together a series of publications that address this new situation, dealing in particular with multilingual and electronic lexicography in an increasingly digital, multilingual and multicultural society. This is the main objective of this volume, which is structured in two central aspects. In the first of them the concept of multilingual lexicography is discussed in regard to the influence that the Internet and the application of digital technologies have exercised and continue to exercise both in the conception and design of dictionaries and new lexicographic application tools as well as the emergence of new types of users and forms of consultation. The role of the dictionary must necessarily be related to social development and changes. In the second thematic section, different dictionaries and resources that focus on a multilingual and electronic approach to the linguistic data for their lexicographical treatment and consultation are presented.




Parallel Corpora for Contrastive and Translation Studies


Book Description

This volume assesses the state of the art of parallel corpus research as a whole, reporting on advances in both recent developments of parallel corpora – with some particular references to comparable corpora as well– and in ways of exploiting them for a variety of purposes. The first part of the book is devoted to new roles that parallel corpora can and should assume in translation studies and in contrastive linguistics, to the usefulness and usability of parallel corpora, and to advances in parallel corpus alignment, annotation and retrieval. There follows an up-to-date presentation of a number of parallel corpus projects currently being carried out in Europe, some of them multimodal, with certain chapters illustrating case studies developed on the basis of the corpora at hand. In most of these chapters, attention is paid to specific technical issues of corpus building. The third part of the book reflects on specific applications and on the creation of bilingual resources from parallel corpora. This volume will be welcomed by scholars, postgraduate and PhD students in the fields of contrastive linguistics, translation studies, lexicography, language teaching and learning, machine translation, and natural language processing.




Multilingual Corpora and Multilingual Corpus Analysis


Book Description

This volume deals with different aspects of the creation and use of multilingual corpora. The term 'multilingual corpus' is understood in a comprehensive sense, meaning any systematic collection of empirical language data enabling linguists to carry out analyses of multilingual individuals, multilingual societies or multilingual communication. The individual contributions are thus concerned with a variety of spoken and written corpora ranging from learner and attrition corpora, language contact corpora and interpreting corpora to comparable and parallel corpora. The overarching aim of the volume is first to take stock of the variety of existing multilingual corpora, documenting possible corpus designs and uses, second to discuss methodological and technological challenges in the creation and analysis of multilingual corpora, and third to provide examples of linguistic analyses that were carried out on the basis of multilingual corpora.




Developing Linguistic Corpora


Book Description

A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study. In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose. This guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will find the guidelines here useful. It is also aimed at those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their studies.




Lexis in Contrast


Book Description

This work explores recent trends in cross-linguistic lexical studies. Topics include: lexis and contrastive linguistics; the revival of contrastive linguistics; multilingual corpora; theoretical and methodological issues; and types of cross-linguistic correspondence.




Parallel Text Processing


Book Description

With the rising importance of multilingualism in language industries, brought about by global markets and world-wide information exchange, parallel corpora, i.e. corpora of texts accompanied by their translation, have become key resources in the development of natural language processing tools. The applications based upon parallel corpora are numerous and growing in number: multilingual lexicography and terminology, machine and human translation, cross-language information retrieval, language learning, etc. The book's chapters have been commissioned from major figures in the field of parallel corpus building and exploitation, with the aim of showing the state of the art in parallel text alignment and use ten to fifteen years after the first parallel-text alignment techniques were developed. Within the book, the following broad themes are addressed: (i) techniques for the alignment of parallel texts at various levels such as sentence, clause, and word; (ii) the use of parallel texts in fields as diverse as translation, lexicography, and information retrieval; (iii) available corpus resources and the evaluation of alignment methods. The book will be of interest to researchers and advanced students of computational linguistics, terminology, lexicography and translation, both in academia and industry.




Terms in Context


Book Description

Terms in Context applies the methodology that has been developed over the last two decades in corpus linguistics to the relatively new and still little developed field of corpus-based terminography. While corpora are already being used by some terminologists for the identification of terms and retrieval of contextual fragments, this book describes the first attempt to use corpora for terminography in much the same way as large general reference corpora are already being used for general language lexicography. The author goes beyond the standard problem of identifying terms as opposed to non-terminological lexical items in text and focuses on identifying metalanguage patterns which point to the presence in text of (parts of) reusable definitions of terms. The author examines these patterns and shows how the information which they contain can be retrieved and used as input for terminological entries. Terms in Context should be of interest to ‘traditional’ terminologists who have not previously considered adopting a corpus-based approach to their work or at least not on the scale proposed here; to ‘modern’ terminologists who use text primarily for the identification of terms and the retrieval of contextual examples; to those in the corpus linguistic community who have hitherto used general language corpora for the purposes of lexicography and have not previously considered using special purpose corpora for more specific lexicography studies; and to academics in the ESP/LSP community who are interested in showing students how to use text as a means of ascertaining the meaning of terms.