Syntactic Wordclass Tagging


Book Description

In both the linguistic and the language engineering community, the creation and use of annotated text collections (or annotated corpora) is currently a hot topic. Annotated texts are of interest for research as well as for the development of natural language pro cessing (NLP) applications. Unfortunately, the annotation of text material, especially more interesting linguistic annotation, is as yet a difficult task and can entail a substan tial amount of human involvement. Allover the world, work is being done to replace as much as possible of this human effort by computer processing. At the frontier of what can already be done (mostly) automatically we find syntactic wordclass tagging, the annotation of the individual words in a text with an indication of their morpho syntactic classification. This book describes the state of the art in syntactic wordclass tagging. As an attempt to give an overall view of the field, this book is of interest to (at least) two, possibly very different, types of reader. The first type consists of those people who are using, or are planning to use, tagged material and taggers. They will want to know what the possibilities and impossibilities of tagging are, but are not necessarily interested in the internal working of automatic taggers. This, on the other hand, is the main interest of our second type of reader, the builders of automatic taggers and other natural language processing software.




Syntactic Wordclass Tagging


Book Description




A resource-light approach to morpho-syntactic tagging


Book Description

While supervised corpus-based methods are highly accurate for different NLP tasks, including morphological tagging, they are difficult to port to other languages because they require resources that are expensive to create. As a result, many languages have no realistic prospect for morpho-syntactic annotation in the foreseeable future. The method presented in this book aims to overcome this problem by significantly limiting the necessary data and instead extrapolating the relevant information from another, related language. The approach has been tested on Catalan, Portuguese, and Russian. Although these languages are only relatively resource-poor, the same method can be in principle applied to any inflected language, as long as there is an annotated corpus of a related language available. Time needed for adjusting the system to a new language constitutes a fraction of the time needed for systems with extensive, manually created resources: days instead of years. This book touches upon a number of topics: typology, morphology, corpus linguistics, contrastive linguistics, linguistic annotation, computational linguistics and Natural Language Processing (NLP). Researchers and students who are interested in these scientific areas as well as in cross-lingual studies and applications will greatly benefit from this work. Scholars and practitioners in computer science and linguistics are the prospective readers of this book.




The Oxford Handbook of Word Classes


Book Description

This handbook explores multiple facets of the study of word classes, also known as parts of speech or lexical categories. These categories are of fundamental importance to linguistic theory and description, both formal and functional, and for both language-internal analyses and cross-linguistic comparison. The volume consists of five parts that investigate word classes from different angles. Chapters in the first part address a range of fundamental issues including diversity and unity in word classes around the world, categorization at different levels of structure, the distinction between lexical and functional words, and hybrid categories. Part II examines the treatment of word classes across a wide range of contemporary linguistic theories, such as Cognitive Grammar, Minimalist Syntax, and Lexical Functional Grammar, while the focus of Part III is on individual word classes, from major categories such as verb and noun to minor ones such as adpositions and ideophones. Part IV provides a number of cross-linguistic case studies, exploring word classes in families including Afroasiatic, Sinitic, Mayan, Austronesian, and in sign languages. Chapters in the final part of the book discuss word classes from the perspective of various sub-disciplines of linguistics, ranging from first and second language acquisition to computational and corpus linguistics. Together, the contributions showcase the importance of word classes for the whole discipline of linguistics, while also highlighting the many ongoing debates in the areas and outlining fruitful avenues for future research.




Corpus Linguistics and Beyond


Book Description




Language Corpora Annotation and Processing


Book Description

This book addresses the research, analysis, and description of the methods and processes that are used in the annotation and processing of language corpora in advanced, semi-advanced, and non-advanced languages. It provides the background information and empirical data needed to understand the nature and depth of problems related to corpus annotation and text processing and shows readers how the linguistic elements found in texts are analyzed and applied to develop language technology systems and devices. As such, it offers valuable insights for researchers, educators, and students of linguistics and language technology.




Human Language Technology. Challenges for Computer Science and Linguistics


Book Description

This book constitutes the refereed proceedings of the 4th Language and Technology Conference: Challenges for Computer Science and Linguistics, LTC 2009, held in Poznan, Poland, in November 2009. The 52 revised and in many cases substantially extended papers presented in this volume were carefully reviewed and selected from 103 submissions. The contributions are organized in topical sections on speech processing, computational morphology/lexicography, parsing, computational semantics, dialogue modeling and processing, digital language resources, WordNet, document processing, information processing, and machine translation.




Learning Language in Logic


Book Description

The two-volume set LNCS 1842/1843 constitutes the refereed proceedings of the 6th European Conference on Computer Vision, ECCV 2000, held in Dublin, Ireland in June/July 2000. The 116 revised full papers presented were carefully selected from a total of 266 submissions. The two volumes offer topical sections on recognitions and modelling; stereoscopic vision; texture and shading; shape; structure from motion; image features; active, real-time, and robot vision; segmentation and grouping; vision systems engineering and evaluation; calibration; medical image understanding; and visual motion.







Practical Corpus Linguistics


Book Description

This is the first book of its kind to provide a practical and student-friendly guide to corpus linguistics that explains the nature of electronic data and how it can be collected and analyzed. Designed to equip readers with the technical skills necessary to analyze and interpret language data, both written and (orthographically) transcribed Introduces a number of easy-to-use, yet powerful, free analysis resources consisting of standalone programs and web interfaces for use with Windows, Mac OS X, and Linux Each section includes practical exercises, a list of sources and further reading, and illustrated step-by-step introductions to analysis tools Requires only a basic knowledge of computer concepts in order to develop the specific linguistic analysis skills required for understanding/analyzing corpus data