Creating and Digitizing Language Corpora, Volume 2


Book Description

A range of electronic corpora has become increasingly accessible via the WWW and CD-ROM. This development coincided with improvements in the standards governing the collecting, encoding and archiving of such data. Less attention, however, has been paid to making other types of digital data available. This is especially true of that which one might describe as 'unconventional', namely, the fragmentary texts and voices left to us as accidents of history. This book is a first step toward developing similar standards for enriching and preserving these neglected resources.




Creating and Digitizing Language Corpora


Book Description

A range of electronic corpora has become accessible via the WWW and CD-ROM. This coincides with improvements in standards governing the collecting, encoding and archiving of such data. This book develops similar standards for enriching and preserving 'unconventional' data': the fragmentary texts and voices left to us as accidents of history.




Creating and Digitizing Language Corpora


Book Description

This book unites a range of approaches to the collection and digitization of diverse language corpora. Its specific focus is on best practices identified in the exploitation of these resources in landmark impact initiatives across different parts of the globe. The development of increasingly accessible digital corpora has coincided with improvements in the standards governing the collection, encoding and archiving of ‘Big Data’. Less attention has been paid to the importance of developing standards for enriching and preserving other types of corpus data, such as that which captures the nuances of regional dialects, for example. This book takes these best practices another step forward by addressing innovative methods for enhancing and exploiting specialized corpora so that they become accessible to wider audiences beyond the academy.




The Handbook of Language Variation and Change


Book Description

Reflecting a multitude of developments in the study of language change and variation over the last ten years, this extensively updated second edition features a number of new chapters and remains the authoritative reference volume on a core research area in linguistics. A fully revised and expanded edition of this acclaimed reference work, which has established its reputation based on its unrivalled scope and depth of analysis in this interdisciplinary field Includes seven new chapters, while the remainder have undergone thorough revision and updating to incorporate the latest research and reflect numerous developments in the field Accessibly structured by theme, covering topics including data collection and evaluation, linguistic structure, language and time, language contact, language domains, and social differentiation Brings together an experienced, international editorial and contributor team to provides an unrivalled learning, teaching and reference tool for researchers and students in sociolinguistics




Corpus Linguistics. Volume 2


Book Description

In vielen Bereichen der Linguistik werden Textkorpora, Sprachkorpora oder multimodale Korpora heute als empirische Basis verwendet. Aufbauend auf Methoden des 19. Jahrhunderts haben sich dabei mit dem Aufkommen von elektronischen Korpora seit den 1940ern neue Standards für linguistische Annotation und Vorverarbeitung sowie für qualitative und quantitative Untersuchungen entwickelt. Das Handbuch bietet einen umfassenden Überblick über Geschichte, Methoden und Anwendungen der Korpuslinguistik. Die einzelnen Überblicks- und Spezialartikel sind von Experten und Expertinnen der jeweiligen Gebiete geschrieben. Dabei wird auf klare und umfassende Darstellung, eine gute Vernetzung zwischen den Artikel und weiterführende Hinweise Wert gelegt.







Recent Advances in Corpus Linguistics


Book Description

This book is a selection of studies presented at the 33rd International Conference of the International Computer Archive of Modern and Medieval English (ICAME), hosted by the University of Leuven (30 May - 3 June 2012). The strictly refereed and extensively revised contributions collected here represent recent advances in corpus linguistics, both in the development of specialist corpora and in ways of exploiting them for specific purposes. The first part focuses on “Corpus development and corpus interrogation” and features papers on the compilation of new, highly specialized corpora which aim to fill gaps in historical databases, and on new ways of extracting relevant patterns automatically from computerized datasets. The second part, devoted to “Specialist corpora”, presents detailed descriptive studies on grammatical patterns in World Englishes, on neology, and – using a contrastive approach – on prepositions and cohesive conjunctions. The third and final part on “Second language acquisition” groups together studies situated at the intersection of corpus linguistics and educational linguistics and dealing with markers of relevance and lesser relevance in lectures, deceptive cognates, the automatic annotation of native and non-native uses of demonstrative this and that, and measuring learners’ progress in speech and in writing. Each contribution in its own way reports on novel ways of getting mileage out of specialist corpora, and collectively the contributions attest to the rude health of computerized corpus linguistic studies.




Corpus Linguistics


Book Description

Throughout history, linguists and literary scholars have been impelled by curiosity about particular linguistic or literary phenomena to seek to observe them in action in original texts. The fruits of each earlier enquiry in turn nourish the desire to continue to acquire knowledge, through further observation of newer linguistic facts. As time goes by, the corpus linguist operates increasingly in the awareness of what has gone before. Corpus Linguistics, thirty years on, is less an innocent sortie into corpus territory on the basis of a hunch than an informed, critical reassessment of existing analytical orthodoxy, in the light of new data coming on stream. This volume comprises twenty-two articles penned by members of the ICAME (International Computer Archive of Modern and Mediaeval English) association, which together provide a critical and informed reappraisal of the facts, data, methods and tools of Corpus Linguistics which are available today. Authors reconsider the boundaries of the discipline, exploring its areas of commonality with Sociolinguistics, Language Variation, Discourse Linguistics, and Lexical Statistics and showing how that commonality is potentially of immense benefit to practitioners in the fields concerned. The volume culminates in the report of a timely and novel expert panel discussion on the role of Corpus Linguistics in the study of English as a global language. This encompasses issues such as English as an international lingua franca, ‘norms’ for global English, and the question of ‘ownership’, or who qualifies as a native speaker.




Quantitative Corpus Linguistics with R


Book Description

As in its first edition, the new edition of Quantitative Corpus Linguistics with R demonstrates how to process corpus-linguistic data with the open-source programming language and environment R. Geared in general towards linguists working with observational data, and particularly corpus linguists, it introduces R programming with emphasis on: data processing and manipulation in general; text processing with and without regular expressions of large bodies of textual and/or literary data, and; basic aspects of statistical analysis and visualization. This book is extremely hands-on and leads the reader through dozens of small applications as well as larger case studies. Along with an array of exercise boxes and separate answer keys, the text features a didactic sequential approach in case studies by way of subsections that zoom in to every programming problem. The companion website to the book contains all relevant R code (amounting to approximately 7,000 lines of heavily commented code), most of the data sets as well as pointers to others, and a dedicated Google newsgroup. This new edition is ideal for both researchers in corpus linguistics and instructors who want to promote hands-on approaches to data in corpus linguistics courses.




The Dynamics of Linguistic Variation


Book Description

Variability is characteristic of any living language. This volume approaches the ‘life cycle’ of linguistic variability in English using data sources that range from electronic corpora to the internet. In the spirit of the 1968 Weinreich, Labov and Herzog classic, the fifteen contributions divide into three sections, each highlighting different stages in the dynamics of English across time and space. They show, first, how increase in variability can be initiated by processes that give rise to new patterns of discourse, which can ultimately crystallize into new grammatical elements. The next phase is the spread of linguistic features and patterns of discourse, both new and well established, through the social and regional varieties of English. The final phase in this ebb and flow of linguistic variability consists of processes promoting some variable features over others across registers and regional and social varieties, thus resulting in reduced variation and increased linguistic homogeneity.