Corpus Linguistics and Textual History


Book Description

Over the years the use of computers for research has become increasingly important in Biblical Studies. However, a combination of computational linguistics with diachronic text-critical and text-historical approaches has hardly ever taken place. Quite often, there is mutual misunderstanding between computational linguistics and more traditional approaches in the field of linguistics and textual analysis. For example, in computer-assisted research of modern text corpora it is common to treat the text as an unequivocal and unidimensional sequence of characters. In Biblical Studies, however, either text is considered an abstraction, the result of a scholarly reconstruction based on the extant textual witnesses. Here a fundamental difference in approach reveals itself. The present volume tries to overcome the misunderstanding between the various disciplines and to establish how a fruitful interaction of information technology, linguistics and textual criticism, can contribute to the analysis of ancient texts. It addresses questions concerning the confrontation between synchronic and diachronic approaches, the role of linguistic analysis in the interpretation of texts, and the interaction of linguistic theory and the analysis of linguistic data. The first section of this volume contains the papers presented at the CALAP seminar 2003. In the second section different aspects of the interdisciplinary analysis are applied to a selected passage from the Peshitta of Kings.




History, Features, and Typology of Language Corpora


Book Description

This book discusses key issues of corpus linguistics like the definition of the corpus, primary features of a corpus, and utilization and limitations of corpora. It presents a unique classification scheme of language corpora to show how they can be studied from the perspective of genre, nature, text type, purpose, and application. A reference to parallel translation corpus is mandatory in the discussion of corpus generation, which the authors thoroughly address here, with a focus on Indian language corpora and English. Web-text corpus, a new development in corpus linguistics, is also discussed with elaborate reference to Indian web text corpora. The book also presents a short history of corpus generation and provides scenarios before and after the advent of computer-generated digital corpora. This book has several important features: it discusses many technical issues of the field in a lucid manner; contains extensive new diagrams and charts for easy comprehension; and presents discussions in simplified English to cater to the needs of non-native English readers. This is an important resource authored by academics who have many years of experience teaching and researching corpus linguistics. Its focus on Indian languages and on English corpora makes it applicable to students of graduate and postgraduate courses in applied linguistics, computational linguistics and language processing in South Asia and across countries where English is spoken as a first or second language.




Corpus linguistics


Book Description

Corpora are used widely in linguistics, but not always wisely. This book attempts to frame corpus linguistics systematically as a variant of the observational method. The first part introduces the reader to the general methodological discussions surrounding corpus data as well as the practice of doing corpus linguistics, including issues such as the scientific research cycle, research design, extraction of corpus data and statistical evaluation. The second part consists of a number of case studies from the main areas of corpus linguistics (lexical associations, morphology, grammar, text and metaphor), surveying the range of issues studied in corpus linguistics while at the same time showing how they fit into the methodology outlined in the first part.




Corpus Linguistics


Book Description

An investigation into the way people use language in speech and writing, this volume introduces the corpus-based approach, which is based on analysis of large databases of real language examples stored on computer.




Corpus Linguistics and 17th-Century Prostitution


Book Description

Corpus linguistics has much to offer history, being as both disciplines engage so heavily in analysis of large amounts of textual material. This book demonstrates the opportunities for exploring corpus linguistics as a method in historiography and the humanities and social sciences more generally. Focussing on the topic of prostitution in 17th-century England, it shows how corpus methods can assist in social research, and can be used to deepen our understanding and comprehension. McEnery and Baker draw principally on two sources – the newsbook Mercurius Fumigosis and the Early English Books Online Corpus. This scholarship on prostitution and the sex trade offers insight into the social position of women in history.




Corpus Linguistics in Literary Analysis


Book Description

Corpus Linguistics and The Study of Literature provides a theoretical introduction to corpus stylistics and also demonstrates its application by presenting corpus stylistic analyses of literary texts and corpora. The first part of the book addresses theoretical issues such as the relationship between subjectivity and objectivity in corpus linguistic analyses, criteria for the evaluation of results from corpus linguistic analyses and also discusses units of meaning in language. The second part of the book takes this theory and applies it to Northanger Abbey by Jane Austen and to two corpora consisting of: Austen's six novels; and texts that are contemporary with Austen. The analyses demonstrate the impact of various features of text on literary meanings and how corpus tools can extract new critical angles. This book will be a key read for upper level undergraduates and postgraduates working in corpus linguistics and in stylistics on linguistics and language studies courses. The editorial board includes: Paul Baker (Lancaster), Frantisek Cermak (Prague), Susan Conrad (Portland), Geoffrey Leech (Lancaster), Dominique Maingueneau (Paris XII), Christian Mair (Freiburg), Alan Partington (Bologna), Elena Tognini-Bonelli (Siena and TWC), Ruth Wodak (Lancaster), and Feng Zhiwei (Beijing). The Corpus and Discourse series consists of two strands. The first, Research in Corpus and Discourse , features innovative contributions to various aspects of corpus linguistics and a wide range of applications, from language technology via the teaching of a second language to a history of mentalities. The second strand, Studies in Corpus and Discourse , is comprised of key texts bridging the gap between social studies and linguistics. Although equally academically rigorous, this strand will be aimed at a wider audience of academics and postgraduate students working in both disciplines.




Developing Linguistic Corpora


Book Description

A linguistic corpus is a collection of texts which have been selected and brought together so that language can be studied on the computer. Today, corpus linguistics offers some of the most powerful new procedures for the analysis of language, and the impact of this dynamic and expanding sub-discipline is making itself felt in many areas of language study. In this volume, a selection of leading experts in various key areas of corpus construction offer advice in a readable and largely non-technical style to help the reader to ensure that their corpus is well designed and fit for the intended purpose. This guide is aimed at those who are at some stage of building a linguistic corpus. Little or no knowledge of corpus linguistics or computational procedures is assumed, although it is hoped that more advanced users will find the guidelines here useful. It is also aimed at those who are not building a corpus, but who need to know something about the issues involved in the design of corpora in order to choose between available resources and to help draw conclusions from their studies.




Corpus-linguistic applications


Book Description

This volume provides an overview of four currently booming areas in the discipline of corpus linguistics. The first section is concerned with studies of the history and development of morphological and syntactic phenomena in English, Spanish, and Mandarin Chinese. The second section contains case studies investigating the functions and contexts of use of different morphological and syntactic forms in English, Spanish, Russian, and Mandarin Chinese. The third section contains studies in the field of genre and register from settings as diverse as health, call center, academic, and legal discourse. The final section features papers refining existing, and exploring new, corpus-linguistic methods: dispersions, text mining, corpus similarity, as well as the development of extraction patterns and the evaluation of tagging methods.




Writing History in Late Modern English


Book Description

This volume focuses on the relationship and interaction of language and science between 1700 and 1900. It pays particular attention to English History writing in late Modern English as compiled in the Corpus of History English Texts (CHET), a newly released sub-corpus of the Coruña Corpus of English Scientific Writing. The chapters cover methodological issues, the period and the status of the discipline itself, as well as pilot studies for the description of scientific discourse using CHET. They embrace topics in several linguistic fields: discourse analysis, syntax, semantics, morpho-syntax. The studies take into account extralinguistic parameters of texts, such as year of publication, sex of the author, geographical provenance of authors and the communicative formats/genres to which the text sample belongs. In the particular case of CHET, the collected samples can be grouped in eight different categories and such categories, as well as the above-mentioned metadata information, can be used to search the corpus. The book is of interest for scholars specialised in corpus linguistics and historical linguistics, as well as linguists in general. The metadata information used for analysis can also be of interest for historians and historians of science in particular.The Corpus of History English Texts (CHET), accompanied by the Coruña Corpus Tool (CCT), purpose-designed software by IrLab, is accessible online at the Repositorio Universidade Coruña at http://hdl.handle.net/2183/21849




Corpus Linguistics


Book Description

This handbook provides an up-to-date survey of corpus linguistics. Spoken, written, and multimodal corpora serve as the bases for quantitative and qualitative research on many issues of linguistic interest. The two volumes together comprise 61 articles by renowned experts from around the world. They sketch the history of corpus linguistics and its relationship with neighbouring disciplines, show its potential, discuss its problems, and describe various methods of collecting, annotating, and searching corpora, as well as processing corpus data.