Syntactic Wordclass Tagging


Book Description

In both the linguistic and the language engineering community, the creation and use of annotated text collections (or annotated corpora) is currently a hot topic. Annotated texts are of interest for research as well as for the development of natural language pro cessing (NLP) applications. Unfortunately, the annotation of text material, especially more interesting linguistic annotation, is as yet a difficult task and can entail a substan tial amount of human involvement. Allover the world, work is being done to replace as much as possible of this human effort by computer processing. At the frontier of what can already be done (mostly) automatically we find syntactic wordclass tagging, the annotation of the individual words in a text with an indication of their morpho syntactic classification. This book describes the state of the art in syntactic wordclass tagging. As an attempt to give an overall view of the field, this book is of interest to (at least) two, possibly very different, types of reader. The first type consists of those people who are using, or are planning to use, tagged material and taggers. They will want to know what the possibilities and impossibilities of tagging are, but are not necessarily interested in the internal working of automatic taggers. This, on the other hand, is the main interest of our second type of reader, the builders of automatic taggers and other natural language processing software.




Corpus Based Studies in English


Book Description

Corpus-based Studies in English contains selected papers from the seventeenth International Conference on English Language Research on Computerized Corpora (ICAME 17). The topics include parsing and annotation of corpora, discourse studies, lexicography, translation studies, parallel corpora, language variation and change, national varieties, methodology and English language teaching. The papers on parsing and annotation include discussions of the treatment of irregular forms, semantic/pragmatic labels in air traffic control, a comparison of tagging systems and a presentation of T-tag lexicon construction. The papers on discourse and lexicography include a study of like as a discourse marker, thesaural relations and the lexicalisation of NPs. In translation studies one paper discusses explicitness as a universal feature of translation and the paper on parallel corpora contrasts English and Norwegian. Many papers deal with variation and change; here we find a discussions of dialogue vs. non-dialogue in modern English fiction and an account of verbal disputes in adolescent English; the historical studies deal with e.g. text type evolution, multi-verb words, normalization in Middle English prose and modalities in Early Modern English. The methodology papers discuss the use in corpus analysis of inferential statistics, probabilistic approaches to anaphora resolution and multi-method approaches to data. The ELT paper compares the use of the progressive in native and non-native compositions.




What's in a Word-list?


Book Description

The frequency with which particular words are used in a text can tell us something meaningful both about that text and also about its author because their choice of words is seldom random. Focusing on the most frequent lexical items of a number of generated word frequency lists can help us to determine whether all the texts are written by the same author. Alternatively, they might wish to determine whether the most frequent words of a given text (captured by its word frequency list) are suggestive of potentially meaningful patterns that could have been overlooked had the text been read manually. This edited collection brings together cutting-edge research written by leading experts in the field on the construction of word-lists for the analysis of both frequency and keyword usage. Taken together, these papers provide a comprehensive and up-to-date survey of the most exciting research being conducted in this subject.




New Directions in English Language Corpora


Book Description

The future of English linguistics as envisaged by the editors of Topics in English Linguistics lies in empirical studies which integrate work in English linguistics into general and theoretical linguistics on the one hand, and comparative linguistics on the other. The TiEL series features volumes that present interesting new data and analyses, and above all fresh approaches that contribute to the overall aim of the series, which is to further outstanding research in English linguistics.




Multi-word Verbs in Early Modern English


Book Description

In a revision of her doctoral thesis (no date or institution cited), which itself grew out of the project to compile the database Lampeter Corpus of Early Modern English Tracts (1640-1740), Claridge looks at the use of such multi-word verbs as get clear, wish for, and make merry as they appear in the database. She considers both syntax and semantics, which she shows merge to some extent, but takes semantics to be the primary and thus the more important level because people know how they are going to say something before they know what they are going to say. Annotation copyrighted by Book News Inc., Portland, OR




Teaching and Learning by Doing Corpus Analysis


Book Description

From the contents: Guy ASTON: The learner as corpus designer. - Antoinette RENOUF: The time dimension in modern English corpus linguistics. - Mike SCOTT: Picturing the key words of a very large corpus and their lexical upshots or getting at the guardian's view of the world. - Lou BURNARD: The BNC: where did we go wrong? Corpus-based teaching material. - Averil COXHEAD: The academic word list: a corpus-based word list for academic purposes.




The Phraseological View of Language


Book Description

This volume presents the results of the international symposium Chunks in Corpus Linguistics and Cognitive Linguistics, held at the University of Erlangen-Nuremberg to honour John Sinclair's contribution to the development of linguistics in the second half of the twentieth century. The main theme of the book, highlighting important aspects of Sinclair's work, is the idiomatic character of language with a focus on chunks (in the sense of prefabricated items) as extended units of meaning. To pay tribute to Sinclair's enormous impact on research in this field, the volume contains two contributions which deal explicitly with his work, including material from unpublished manuscripts. Beyond that, the articles cover different aspects of chunks ranging from more theoretically-oriented to more applied papers, in which foreign language teaching and the computational application of the insights about the nature of language provided by corpus research play an important role. The volume demonstrates the wide applicability and relevance of the notion of chunks by bringing together research from different fields of linguistics such as theoretical linguistics, psycholinguistics, computational linguistics and foreign language teaching, and thus provides an interdisciplinary view on the impact of idiomaticity in language.




Out of Corpora


Book Description

Main headings: Introduction. - I. Representing language use. - II. Grammar and lexis in English corpora. - III. Contrastive and translation studies. - IV. English abroad. - List of Stig Johansson's publications (selection).