Three Essays in Causal Inference


Book Description

This thesis is a collection of three essays on causal inference. Chapter 1 considers the problem of constructing confidence intervals or bands for the quantiles of treatment effects under settings where point identification is impossible. I show that under settings where selection is only on observables bounds for the entire quantile function can nonetheless be estimated, and this enables the estimation of confidence bands. I also extend these results to instrumental variable settings. Computational complexity analysis demonstrates that the methodology I propse is computationally attractive. Chapters 2 and 3 consider extending the synthetic control approach of Abadie, Diamond, and Haimueller (2010) to two different settings where individual-level data is available. In Chapter 2 I consider estimating average treatment effects by constructing for every subject in the treatment group a synthetic twin composed of individuals in the control group. I show that the resulting estimator is unbiased when selection is dependent only on observables. I also show that matching estimators and OLS estimators can be viewed as special cases of synthetic control estimators. Furthermore, I demonstrate that the estimator is highly scalable computationally. In Chapter 3, I consider settings where either panel data or repeated cross-sectional data is available. I show that the synthetic control estimator in this setting can yield asymptotically valid standard errors when aggregation is done from individual-level data, unlike the original work of Abadie, Diamond, and Hainmueller (2010). To demonstrate asymptotic properties, two types of asymptotic analysis are carried out: one appropriate when the number of observations at each point in time in each subpopulation tends to infinity, and one suitable for stationary aggregate data and in which the number of pre-intervention periods gets large.




Three Essays on Causal Inference


Book Description

This thesis describes three research projects in causal inference, all related to the problem of contrasting the average counterfactual outcomes on two sides of a binary decision. In the first project, we discuss estimation of the average causal effect in a randomized control trial. Here, we find that statisticians find themselves in a kind of statistical paradise: a simple model-based procedure delivers correct confidence intervals even if the experimental participants are not randomly sampled and mis-specified models are used. In the second project, we consider the problem of testing for a treatment effect using observational data with no hidden confounders. Conceptually, this is no different from a rather complicated RCT, and one might expect that a return to statistical paradise is possible. Unfortunately, this is not the case: we show that even intuitively reasonable uses of correct models may still yield misleading conclusions. The final project looks at observational data with unobserved confounding and gives methods for computing bounds on average causal effects. Here, we discover some never-before-seen robustness properties unique to the partially-identified setting.




Three Essays on Causal Inference in Comparative Political Behavior


Book Description

This dissertation contains three independent essays, each applying statistical methods for causal inference in observational studies to central topics in comparative political behavior.




Three Essays on Causal Inference for Observational Studies


Book Description

Finally, the third paper in this thesis addresses the question of unintended consequences in school segregation due to the introduction of a targeted voucher scheme. I use a difference-in-difference approach, in combination with matching on time-stable covariates, to estimate the effect that the 2008 Chilean voucher policy had on both average students' household income and academic performance at the school level. Results show that even though the policy had a positive effect on schools' standardized test scores, closing the gap between schools that subscribed to the policy compared to those that did not, there was also an increase in the differences between socioeconomic characteristics at the school level, such as average household income.




Three Essays in Robust Causal Inference


Book Description

Economics research often addresses questions with an implicit or explicit policy goal. When such a goal involves an active intervention, such as the assignment of a particular treatment variable to participants, the analysis of its effects requires the tools of causal inference. In such settings, the opportunity to use experimental or observational data to tease out policy parameters of interest requires a combination of statistical and causal assumptions. In reduced form work, where an explicit economic theory is not laid out to allow identification of policy parameters from data, the investigation of the causal assumptions becomes a critical exercise for the credibility of the results. Many robustness exercises evaluate the effect that relaxing and/or modifying assumptions produces on the results of the study. The scope of these exercises is very broad, reflecting the need to tailor specific robustness exercises to whichever assumptions are most likely to be violated in a given domain. This dissertation is a collection of three essays on robust causal inference that share a unifying theme: preserving the nonparametric nature of the robustness exercise. This aspect has both a theoretical and practical relevance. First, causal assumptions are usually nonparametric: robustness exercises that restrict to parametric cases might lead to misleading insights. Further, economics research has started to incorporate more flexible nonparametric and semi-parametric techniques which may call for robustness exercises that are readily applicable to these approaches. Because robustness exercises are context specific, each of these essays addresses a separate aspect of it. Chapter 1 investigates how changes in the distribution of covariates may invalidate given experimental results, with implications for evidence based policy-making. It proposes an explicit metric of robustness that measures the distance of the closest distribution of covariates for which experimental results are violated. Chapter 2 analyses the practice of robustness checks as a way to validate a researcher's identification strategy. It details out the limitations of these exercises in detecting failure of identification and proposes a non-parametric robustness test that bypasses functional form assumptions. Finally, Chapter 3 focuses on the robustness of Marginal Treatment Effect identification when the instrumental variables fail to incentivize treatment for a subset of the population. It provides two alternative identification results which can be relevant in practice.




Three Essays on Causal Inference with High-dimensional Data and Machine Learning Methods


Book Description

This dissertation consists of three chapters that study causal inference when applying machinelearning methods. In Chapter 1, I propose an orthogonal extension of the semiparametric difference-in-differences estimator proposed in Abadie (2005). The proposed estimator enjoys the so-called Neyman-orthogonality (Chernozhukov et al. 2018) and thus it allows researchers to flexibly use a rich set of machine learning (ML) methods in the first-step estimation. It is particularly useful when researchers confront a high-dimensional data set when the number of potential control variables is larger than the sample size and the conventional nonparametric estimation methods, such as kernel and sieve estimators, do not apply. I apply this orthogonal difference-in-differences estimator to evaluate the effect of tariff reduction on corruption. The empirical results show that tariff reduction decreases corruption in large magnitude. In Chapter 2, I study the estimation and inference of the mode treatment effect. Mean,median, and mode are three essential measures of the centrality of probability distributions. In program evaluation, the average treatment effect (mean) and the quantile treatment effect (median) have been intensively studied in the past decades. The mode treatment effect, however, has long been neglected in program evaluation. This paper fills the gap by discussing both the estimation and inference of the mode treatment effect. I propose both traditional kernel and machine learning methods to estimate the mode treatment effect. I also derive the asymptotic properties of the proposed estimators and find that both estimators follow the asymptotic normality but with the rate of convergence slower than the regular rate N^1/2, which is different from the rates of the classical average and quantile treatment effect estimators. In Chapter 3 (joint with Liqiang Shi), we study the estimation and inference of the doublyrobust extension of the semiparametric quantile treatment effect estimation discussed in Firpo (2007). This proposed estimator allows researchers to use a rich set of machine learning methods in the first-step estimation, while still obtaining valid inferences. Researchers can include as many control variables as they consider necessary, without worrying about the over-fitting problem which frequently happens in the traditional estimation methods. This paper complements Belloni et al. (2017), which provided a very general framework to discuss the estimation and inference of many different treatment effects when researchers apply machine learning methods.




Three Essays on Causal Inference for Marketing Applications


Book Description

In my dissertation consisting of three research projects, I focus on solving problems which deal with reliably estimating the impact of a change in policy in quasi-experimental setup. I utilize cutting edge methods in econometrics and machine learning to quantify causal effects of policy changes, understand the mechanism behind the effect and most importantly highlight the implications for the managers and policy makers. My first research paper, “A Study of the Effects of Legalization of Recreational Marijuana on Sales of Cigarettes” attempts to establish a causal link between the legalization of recreational marijuana and the sales of cigarettes in retail stores. Recreational marijuana legalization (RML) has been on the rise in the recent years and many arguments have been put forth to support or counter this move. We explore the possibility of RML impacting cigarette consumption. This is important for understanding the impact on health care expenditures related to smoking, which is about $330 billion in the US. Our results show that in states that have passed RML, there is a 7% increase in cigarette sales. This is an important finding since it reverses a decline in cigarette sales in recent years. Therefore, we conclude that states should exercise caution while considering legalization of recreational use of marijuana. My second project, “Effects of Social Media Fights and New Product Launches in the Fast Food Industry” examines the effects of engaging in ‘Twitter feuds’ with competition during new product launches. We propose a viable mechanism that explains how seemingly harmless banter of social media could have unforeseen impact on a firm’s business. Through empirical evidence from recent incidents, we show that Twitter activity has a spillover into traditional media which leads to surge in online search. Online search activity is followed by the offline sales as documented in literature as well as evidenced from our unique foot traffic data. Next, we document the long-term effects of this menu innovation in causal framework, well beyond the initial frenzy, with a novel synthetic difference-in-differences (SDID) method proposed by Arkhangelsky et al. (2021). Results show that the launch led to a 30% increase in store visits up to six months after the launch. Overall, these findings underscore the importance of savvy social media presence especially during a product launch- which could be driver for peaked interest leading to impact on overall business. The flip side for competitors is that initiating seemingly harmless banter, unlike in the offline setting, could end up providing free publicity to one’s rivals. Overall, we highlight the enormous potential of social media to affect business and advise caution to brand managers before engaging in any activity. My third project “A study of wear out and heterogeneous effects of unlimited shipping program on customer engagement in the online retail industry” we study effects of a variation of free shipping promotion in the online retail industry. Free shipping promotions have become popular among online retailers. Most online shoppers expect deliveries without additional costs and cite it as a primary concern while shopping online. Many online retailers across industries have implemented long term free shipping programs on all purchases with fixed annual fees. In this paper, we analyze benefits associated with such programs for the retailers and also shed light on the potential pitfalls, using data from a leading online retailer in the UK. Our results indicate that that there is a significant decay in customer spending after initial days and the effects wear out completely short way through the promotion period. Moreover, changes in purchase behavior (significantly lower basket size after enrolling for free shipping) could hurt the retailer. Thus, online retailers should be cautious when offering long term free shipping promotion. In the next part of the paper, we use pre-promotion engagement as a moderating factor to capture heterogeneous effects of free shipping programs across customers, using Honest Causal Forests approach. Our results show that free shipping promotions work better (higher revenues, smaller drop in basket size) for customers with relatively lower engagement with the retailer in the prepromotion period. Online retailers could use these findings to devise their targeting strategy for free shipping promotions.







Essays on Causal Inference and Econometrics


Book Description

This dissertation is a collection of three essays on the econometric analysis of causal inference methods. Chapter 1 examines the identification and estimation of the structural function in fuzzy RD designs with a continuous treatment variable. We show that the nonlinear and nonseparable structural function can be nonparametrically identified at the RD cutoff under shape restrictions, including monotonicity and smoothness conditions. Based on the nonparametric identification equation, we propose a three-step semiparametric estimation procedure and establish the asymptotic normality of the estimator. The semiparametric estimator achieves the same convergence rate as in the case of a binary treatment variable. As an application of the method, we estimate the causal effect of sleep time on health status by using the discontinuity in natural light timing at time zone boundaries. Chapter 2 examines the local linear regression (LLR) estimate of the conditional distribution function F(y|x). We derive three uniform convergence results: the uniform bias expansion, the uniform convergence rate, and the uniform asymptotic linear representation. The uniformity in the above results is with respect to both x and y and therefore has not previously been addressed in the literature on local polynomial regression. Such uniform convergence results are especially useful when the conditional distribution estimator is the first stage of a semiparametric estimator. Chapter 3 studies the estimation of causal parameters in the generalized local average treatment effect model, a generalization of the classical LATE model encompassing multi-valued treatment and instrument. We derive the efficient influence function (EIF) and the semiparametric efficiency bound for two types of parameters: local average structural function (LASF) and local average structural function for the treated (LASF-T). The moment condition generated by the EIF satisfies two robustness properties: double robustness and Neyman orthogonality. Based on the robust moment condition, we propose the double/debiased machine learning (DML) estimators for LASF and LASF-T. We also propose null-restricted inference methods that are robust against weak identification issues. As an empirical application, we study the effects across different sources of health insurance by applying the developed methods to the Oregon Health Insurance Experiment.