Regression Modeling for Linguistic Data


Book Description

The first comprehensive textbook on regression modeling for linguistic data offers an incisive conceptual overview along with worked examples that teach practical skills for realistic data analysis. In the first comprehensive textbook on regression modeling for linguistic data in a frequentist framework, Morgan Sonderegger provides graduate students and researchers with an incisive conceptual overview along with worked examples that teach practical skills for realistic data analysis. The book features extensive treatment of mixed-effects regression models, the most widely used statistical method for analyzing linguistic data. Sonderegger begins with preliminaries to regression modeling: assumptions, inferential statistics, hypothesis testing, power, and other errors. He then covers regression models for non-clustered data: linear regression, model selection and validation, logistic regression, and applied topics such as contrast coding and nonlinear effects. The last three chapters discuss regression models for clustered data: linear and logistic mixed-effects models as well as model predictions, convergence, and model selection. The book’s focused scope and practical emphasis will equip readers to implement these methods and understand how they are used in current work. The only advanced discussion of modeling for linguists Uses R throughout, in practical examples using real datasets Extensive treatment of mixed-effects regression models Contains detailed, clear guidance on reporting models Equal emphasis on observational data and data from controlled experiments Suitable for graduate students and researchers with computational interests across linguistics and cognitive science




Mixed-Effects Regression Models in Linguistics


Book Description

When data consist of grouped observations or clusters, and there is a risk that measurements within the same group are not independent, group-specific random effects can be added to a regression model in order to account for such within-group associations. Regression models that contain such group-specific random effects are called mixed-effects regression models, or simply mixed models. Mixed models are a versatile tool that can handle both balanced and unbalanced datasets and that can also be applied when several layers of grouping are present in the data; these layers can either be nested or crossed. In linguistics, as in many other fields, the use of mixed models has gained ground rapidly over the last decade. This methodological evolution enables us to build more sophisticated and arguably more realistic models, but, due to its technical complexity, also introduces new challenges. This volume brings together a number of promising new evolutions in the use of mixed models in linguistics, but also addresses a number of common complications, misunderstandings, and pitfalls. Topics that are covered include the use of huge datasets, dealing with non-linear relations, issues of cross-validation, and issues of model selection and complex random structures. The volume features examples from various subfields in linguistics. The book also provides R code for a wide range of analyses.




Regression Modeling for Linguistic Data


Book Description

The first comprehensive textbook on regression modeling for linguistic data offers an incisive conceptual overview along with worked examples that teach practical skills for realistic data analysis. In the first comprehensive textbook on regression modeling for linguistic data in a frequentist framework, Morgan Sonderegger provides graduate students and researchers with an incisive conceptual overview along with worked examples that teach practical skills for realistic data analysis. The book features extensive treatment of mixed-effects regression models, the most widely used statistical method for analyzing linguistic data. Sonderegger begins with preliminaries to regression modeling: assumptions, inferential statistics, hypothesis testing, power, and other errors. He then covers regression models for non-clustered data: linear regression, model selection and validation, logistic regression, and applied topics such as contrast coding and nonlinear effects. The last three chapters discuss regression models for clustered data: linear and logistic mixed-effects models as well as model predictions, convergence, and model selection. The book’s focused scope and practical emphasis will equip readers to implement these methods and understand how they are used in current work. The only advanced discussion of modeling for linguists Uses R throughout, in practical examples using real datasets Extensive treatment of mixed-effects regression models Contains detailed, clear guidance on reporting models Equal emphasis on observational data and data from controlled experiments Suitable for graduate students and researchers with computational interests across linguistics and cognitive science




Analyzing Linguistic Data


Book Description

Statistical analysis is a useful skill for linguists and psycholinguists, allowing them to understand the quantitative structure of their data. This textbook provides a straightforward introduction to the statistical analysis of language. Designed for linguists with a non-mathematical background, it clearly introduces the basic principles and methods of statistical analysis, using 'R', the leading computational statistics programme. The reader is guided step-by-step through a range of real data sets, allowing them to analyse acoustic data, construct grammatical trees for a variety of languages, quantify register variation in corpus linguistics, and measure experimental data using state-of-the-art models. The visualization of data plays a key role, both in the initial stages of data exploration and later on when the reader is encouraged to criticize various models. Containing over 40 exercises with model answers, this book will be welcomed by all linguists wishing to learn more about working with and presenting quantitative data.




Supervised Machine Learning for Text Analysis in R


Book Description

Text data is important for many domains, from healthcare to marketing to the digital humanities, but specialized approaches are necessary to create features for machine learning from language. Supervised Machine Learning for Text Analysis in R explains how to preprocess text data for modeling, train models, and evaluate model performance using tools from the tidyverse and tidymodels ecosystem. Models like these can be used to make predictions for new observations, to understand what natural language features or characteristics contribute to differences in the output, and more. If you are already familiar with the basics of predictive modeling, use the comprehensive, detailed examples in this book to extend your skills to the domain of natural language processing. This book provides practical guidance and directly applicable knowledge for data scientists and analysts who want to integrate unstructured text data into their modeling pipelines. Learn how to use text data for both regression and classification tasks, and how to apply more straightforward algorithms like regularized regression or support vector machines as well as deep learning approaches. Natural language must be dramatically transformed to be ready for computation, so we explore typical text preprocessing and feature engineering steps like tokenization and word embeddings from the ground up. These steps influence model results in ways we can measure, both in terms of model metrics and other tangible consequences such as how fair or appropriate model results are.




Statistics for Linguistics with R


Book Description

This book is an introduction to statistics for linguists using the open source software R. It is aimed at students and instructors/professors with little or no statistical background and is written in a non-technical and reader-friendly/accessible style. It first introduces in detail the overall logic underlying quantitative studies: exploration, hypothesis formulation and operationalization, and the notion and meaning of significance tests. It then introduces some basics of the software R relevant to statistical data analysis. A chapter on descriptive statistics explains how summary statistics for frequencies, averages, and correlations are generated with R and how they are graphically represented best. A chapter on analytical statistics explains how statistical tests are performed in R on the basis of many different linguistic case studies: For nearly every single example, it is explained what the structure of the test looks like, how hypotheses are formulated, explored, and tested for statistical significance, how the results are graphically represented, and how one would summarize them in a paper/article. A chapter on selected multifactorial methods introduces how more complex research designs can be studied: methods for the study of multifactorial frequency data, correlations, tests for means, and binary response data are discussed and exemplified step-by-step. Also, the exploratory approach of hierarchical cluster analysis is illustrated in detail. The book comes with many exercises, boxes with short think breaks and warnings, recommendations for further study, and answer keys as well as a statistics for linguists newsgroup on the companion website. The volume is aimed at beginners on every level of linguistic education: undergraduate students, graduate students, and instructors/professors and can be used in any research methods and statistics class for linguists. It presupposes no quantitative/statistical knowledge whatsoever and, unlike most competing books, begins at step 1 for every method and explains everything explicitly.




Statistics for Linguists: An Introduction Using R


Book Description

Statistics for Linguists: An Introduction Using R is the first statistics textbook on linear models for linguistics. The book covers simple uses of linear models through generalized models to more advanced approaches, maintaining its focus on conceptual issues and avoiding excessive mathematical details. It contains many applied examples using the R statistical programming environment. Written in an accessible tone and style, this text is the ideal main resource for graduate and advanced undergraduate students of Linguistics statistics courses as well as those in other fields, including Psychology, Cognitive Science, and Data Science.




Classification and Modeling with Linguistic Information Granules


Book Description

Many approaches have already been proposed for classification and modeling in the literature. These approaches are usually based on mathematical mod els. Computer systems can easily handle mathematical models even when they are complicated and nonlinear (e.g., neural networks). On the other hand, it is not always easy for human users to intuitively understand mathe matical models even when they are simple and linear. This is because human information processing is based mainly on linguistic knowledge while com puter systems are designed to handle symbolic and numerical information. A large part of our daily communication is based on words. We learn from various media such as books, newspapers, magazines, TV, and the Inter net through words. We also communicate with others through words. While words play a central role in human information processing, linguistic models are not often used in the fields of classification and modeling. If there is no goal other than the maximization of accuracy in classification and model ing, mathematical models may always be preferred to linguistic models. On the other hand, linguistic models may be chosen if emphasis is placed on interpretability.




Writing about Quantitative Research in Applied Linguistics


Book Description

With increasing pressure on academics and graduate students to publish in peer reviewed journals, this book offers a much-needed guide to writing about and publishing quantitative research in applied linguistics. With annotated examples and useful resources, this book will be indispensable to graduate students and seasoned researchers alike.




How to do Linguistics with R


Book Description

This book provides a linguist with a statistical toolkit for exploration and analysis of linguistic data. It employs R, a free software environment for statistical computing, which is increasingly popular among linguists. How to do Linguistics with R: Data exploration and statistical analysis is unique in its scope, as it covers a wide range of classical and cutting-edge statistical methods, including different flavours of regression analysis and ANOVA, random forests and conditional inference trees, as well as specific linguistic approaches, among which are Behavioural Profiles, Vector Space Models and various measures of association between words and constructions. The statistical topics are presented comprehensively, but without too much technical detail, and illustrated with linguistic case studies that answer non-trivial research questions. The book also demonstrates how to visualize linguistic data with the help of attractive informative graphs, including the popular ggplot2 system and Google visualization tools. This book has a companion website: http://doi.org/10.1075/z.195.website