A resource-light approach to morpho-syntactic tagging


Book Description

While supervised corpus-based methods are highly accurate for different NLP tasks, including morphological tagging, they are difficult to port to other languages because they require resources that are expensive to create. As a result, many languages have no realistic prospect for morpho-syntactic annotation in the foreseeable future. The method presented in this book aims to overcome this problem by significantly limiting the necessary data and instead extrapolating the relevant information from another, related language. The approach has been tested on Catalan, Portuguese, and Russian. Although these languages are only relatively resource-poor, the same method can be in principle applied to any inflected language, as long as there is an annotated corpus of a related language available. Time needed for adjusting the system to a new language constitutes a fraction of the time needed for systems with extensive, manually created resources: days instead of years. This book touches upon a number of topics: typology, morphology, corpus linguistics, contrastive linguistics, linguistic annotation, computational linguistics and Natural Language Processing (NLP). Researchers and students who are interested in these scientific areas as well as in cross-lingual studies and applications will greatly benefit from this work. Scholars and practitioners in computer science and linguistics are the prospective readers of this book.




Portable Language Technology


Book Description

This dissertation describes experiments with Russian, Czech, Polish, Spanish, Portuguese, and Catalan. However, the general method proposed can be applied to any fusional language.




Handbook of Linguistic Annotation


Book Description

This handbook offers a thorough treatment of the science of linguistic annotation. Leaders in the field guide the reader through the process of modeling, creating an annotation language, building a corpus and evaluating it for correctness. Essential reading for both computer scientists and linguistic researchers.Linguistic annotation is an increasingly important activity in the field of computational linguistics because of its critical role in the development of language models for natural language processing applications. Part one of this book covers all phases of the linguistic annotation process, from annotation scheme design and choice of representation format through both the manual and automatic annotation process, evaluation, and iterative improvement of annotation accuracy. The second part of the book includes case studies of annotation projects across the spectrum of linguistic annotation types, including morpho-syntactic tagging, syntactic analyses, a range of semantic analyses (semantic roles, named entities, sentiment and opinion), time and event and spatial analyses, and discourse level analyses including discourse structure, co-reference, etc. Each case study addresses the various phases and processes discussed in the chapters of part one.




Routledge Encyclopedia of Translation Technology


Book Description

Routledge Encyclopedia of Translation Technology, second edition, provides a state-of-the-art survey of the field of computer-assisted translation. It is the first definitive reference to provide a comprehensive overview of the general, regional, and topical aspects of this increasingly significant area of study. The Encyclopedia is divided into three parts: Part 1 presents general issues in translation technology, such as its history and development, translator training, and various aspects of machine translation, including a valuable case study of its teaching at a major university; Part 2 discusses national and regional developments in translation technology, offering contributions covering the crucial territories of China, Canada, France, Hong Kong, Japan, South Africa, Taiwan, the Netherlands and Belgium, the United Kingdom, and the United States; Part 3 evaluates specific matters in translation technology, with entries focused on subjects such as alignment, concordancing, localization, online translation, and translation memory. The new edition has five additional chapters, with many chapters updated and revised, drawing on the expertise of over 50 contributors from around the world and an international panel of consultant editors to provide a selection of chapters on the most pertinent topics in the discipline. All the chapters are self-contained, extensively cross-referenced, and include useful and up-to-date references and information for further reading. It will be an invaluable reference work for anyone with a professional or academic interest in the subject.




Language Engineering for Lesser-studied Languages


Book Description

"Technologies enabling computers to process specific languages facilitate economic and political progress of societies where these languages are spoken. Development of methods and systems for language processing is therefore a worthy goal for national governments as well as for business entities and scientific and educational institutions in every country in the world. As work on systems and resources for the 'lower-density' languages becomes more widespread, an important question is how to leverage the results and experience accumulated by the field of computational linguistics for the major languages in the development of resources and systems for lower-density languages. This issue has been at the core of the NATO Advanced Studies Institute on language technologies for middle- and low-density languages held in Georgia in October 2007. This publication is a collection - of publication-oriented versions - of the lectures presented there and is a useful source of knowledge about many core facets of modern computational-linguistic work. By the same token, it can serve as a reference source for people interested in learning about strategies that are best suited for developing computational-linguistic capabilities for lesser-studied languages - either 'from scratch' or using components developed for other languages. The book should also be quite useful in teaching practical system- and resource-building topics in computational linguistics."--Site Web de l'éditeur.




Systems and Frameworks for Computational Morphology


Book Description

This book constitutes the refereed proceedings of the 4th International Workshop on Systems and Frameworks for Computational Morphology, SFCM 2015, held in Stuttgart, Germany, in September 2015. The 5 revised full papers and 5 short papers presented were carefully reviewed and selected from 16 submissions. The SFCM Workshops focus on linguistically motivated morphological analysis and generation, computational frameworks for implementing such systems, and linguistic frameworks suitable for computational implementation. SFCM 2015 and the papers presented in this volume aim at broadening the scope to include research on very underresourced languages, interactions between computational morphology and formal, quantitative, and descriptive morphology, as well as applications of computational morphology in the Digital Humanities.




Descriptive Grammar of Bangla


Book Description

Bangla is spoken as the majority language in Bangladesh and the state of West Bengal in India, and as a minority language in several other Indian states. With almost 200 million native speakers, it ranks among the top ten languages in the world in number of speakers. Based on both primary and secondary materials, the CASL Bangla grammar provides comprehensive coverage of the phonology, orthography, morphology, and syntax of Bangla. Plentiful examples of naturally-occurring sentences provide native orthography, Romanization, and morpheme-by-morpheme glossing along with free translations. Unlike many Romanizations of Bangla, our system eschews Sanskritic influence and instead reflects actual Bangla phonology. We also offer comparative information of use to linguists, highlighting features of Bangla shared with the South Asian sprachbund, such as light verb constructions, as well as those that differentiate Bangla from its Indo-Aryan relatives; for example, its unique NP structure. Written in an accessible style from a theory-neutral perspective, this work will be of use to linguistic researchers, language scholars, and students of Bangla. A formal grammar focusing on the morphology is an available companion work.




Descriptive Grammar of Pashto and its Dialects


Book Description

Pashto/Pushto/Pukhto is a group of varieties used by as many as 30 million people in Afghanistan and Pakistan, yet a grammar describing these varieties collectively has not been published. The CASL Pashto grammar originates from extensive use of both primary and secondary materials. It attends to features of both spoken and written forms of Pashto and exemplifies the latter generously with naturally-occurring sentences. Detailed descriptions are provided of the phonology and orthography and of the inflectional and derivational morphology applied to all major word classes, with special attention to the complex morphology of verb formation and descriptions of the multiple pronominal systems. Notes on some of the prominent syntactic constructions are provided as a descriptive basis for learners of Pashto and for those interested in syntactic properties characteristic of South Asian languages. For the first time, the highly distinctive Middle dialects, including Waziri, receive attention next to the other major dialect groups. A formal grammar focusing on the morphology is an available companion work.




Human Language Technology. Challenges for Computer Science and Linguistics


Book Description

This book constitutes the refereed proceedings of the 4th Language and Technology Conference: Challenges for Computer Science and Linguistics, LTC 2009, held in Poznan, Poland, in November 2009. The 52 revised and in many cases substantially extended papers presented in this volume were carefully reviewed and selected from 103 submissions. The contributions are organized in topical sections on speech processing, computational morphology/lexicography, parsing, computational semantics, dialogue modeling and processing, digital language resources, WordNet, document processing, information processing, and machine translation.




Syntactic Wordclass Tagging


Book Description

In both the linguistic and the language engineering community, the creation and use of annotated text collections (or annotated corpora) is currently a hot topic. Annotated texts are of interest for research as well as for the development of natural language pro cessing (NLP) applications. Unfortunately, the annotation of text material, especially more interesting linguistic annotation, is as yet a difficult task and can entail a substan tial amount of human involvement. Allover the world, work is being done to replace as much as possible of this human effort by computer processing. At the frontier of what can already be done (mostly) automatically we find syntactic wordclass tagging, the annotation of the individual words in a text with an indication of their morpho syntactic classification. This book describes the state of the art in syntactic wordclass tagging. As an attempt to give an overall view of the field, this book is of interest to (at least) two, possibly very different, types of reader. The first type consists of those people who are using, or are planning to use, tagged material and taggers. They will want to know what the possibilities and impossibilities of tagging are, but are not necessarily interested in the internal working of automatic taggers. This, on the other hand, is the main interest of our second type of reader, the builders of automatic taggers and other natural language processing software.