欢迎投稿(荐稿)计量经济圈,计量相关都行
邮箱:econometrics666@sina.cn
编辑整理: @计量经济圈(ID: econometrics666); 来源: 熊彼特的厨房 公众号
RDD断点回归设计越来越来越流行,在因果识别中的地位很高,因为是在一个断点处的两端进行比较,那么这个treatment效应能够较好的识别,下面这篇文章将会领着计量经济圈的圈友去认识RDD并且知道操作的步骤和Stata代码。想要进一步了解RDD这种方法以及其他因果推断技巧的,进入计量经济圈社群交流(文后)。
目前常见的用于RDD计算的R包分别有rdd, rdrobust, rddtools。Thoemmes等 (2017)的论文比较了这三个包的异同。
Thoemmes(2017)从实践的角度,进一步将RD的步骤分为三类:Assumption Checks,Estimation, Sensitivity Checks。我认为这里的简要归纳要比Lee和Lemieux(2010)更清楚简洁。
Assumption Checks实际上对应了RD的三个前提假设,分别是配置变量存在断点,对应着McCrary(2008)的断点检验;实验效应发生在我们假定的断点处,而不是其他断点处,这对应着安慰剂检验(Placebo Test),在上篇第二部分有展示;实验变量仅仅对我们关心的结果变量有影响,但是对其他前定变量无影响,其实就是检验断点附近前定变量的连续性。
Estimation是对因果效应的估计,又分为参数法(Global Regression)和非参数法(Local Linear Regression),后者对应着不同带宽的选择。
Sensitivity Checks是排除结果对的模型依赖性(Model Dependency)主要是对不同估计方法、带宽选择以及模型中多项式次数的鲁棒性检查,详见上篇推文的第二部分。
当前有三个R语言包可以实现RDD,Thoemmes详细比较了三个包在以上三方面的异同,并归纳如下表。
综合比较而言,rdd的上手成本是最低的也是最基本的选择,如果你熟悉AER,sandwich这几个包的话,那几乎零成本。 rdrobust 在带宽选择和画图方面具备优势,是唯一个输出各类带宽选择及其置信区间的包,当然也是因为这个包CCT开发的。rddtools的强项在于可以自动进行预设与敏感性检查,它可以输出不同带宽选择的估计结果,还可以自动进行安稳剂检验,这是其他包不具备的。如果对于R语言比较陌生的新手,可以选择rddapp这个带有界面的包进行操作。
作者使用一个数据实例展示了三个包联合使用的效果,这里简单归纳一下。
首先展示一下配置变量与结果变量散点图,然后是三步实际操作。
1 第一步:预设检查
(1)断点检验
(2)安慰剂检测
(3)其他前定变量检测
没有画图,文中直接汇报了检验的p值
2 第二步:估计
3 第三步:敏感测试
借着这个图多说一句,带宽的选择实际上是Bias与显著性的权衡取舍,BW越小越无偏,但是也意味着SE会更大,从而导致越发不显著
当然目前还有很多是这几个R包实现不了的,作者列举了两个,一是多个配置变量的估计,二是关于RDD统计解释力(Statistical Power)的估计。
上述步骤实现的R代码如下:
#### PREPERATION ####
library(ggplot2)
library(dplyr)
library(rdd)
library(rddtools)
library(rdrobust)
# set working directory to file location
setwd("C:\\Users\\fjt36-admin\\Dropbox\\WORK\\2016 Winter Cornell\\rdd software jebs")
# read Stata Dataset generated by "./ICPSR_04091/DS0001/04091-0001-Setup.do"
dat = foreign::read.dta(’ICPSR_04091/ICPSR_04091.dta’)
#distribution of treatment, and treatment effect from randomized experiment
xtabs( ~ DC_TRT, dat)
lm_rct1 = lm(SBIQ48 ~ DC_TRT, dat)
summary(lm_rct1)
#treatment effect of randomized experiment interacted with mother’s IQ at time 0
#IQ is centered to provide average treatment effect
lm_rct2 = lm(SBIQ48 ~ DC_TRT * scale(MOMWAIS0), dat)
summary(lm_rct2)
#subset data to create artifical sharp RDD
#here assuming that only mothers with an IQ lower than the median IQ (85) are
#selected for the treament
dat_s = filter(dat,
(MOMWAIS0 >= median(MOMWAIS0) & !DC_TRT) |
(MOMWAIS0 < median(MOMWAIS0) & DC_TRT),!is.na(SBIQ48))
#table to confirm RDD
table(dat_s$DC_TRT, dat_s$MOMWAIS0)
table(dat_s$DC_TRT)
#graphical inspection of all data and then data of only the sharp RDD, here using
#loess smoother with no binning
dat$DC_TRT_LAB <- factor(dat$DC_TRT, labels = c("Control", "Treatment"))
dat_s$DC_TRT_LAB <-
factor(dat_s$DC_TRT, labels = c("Control", "Treatment"))
p1 <- ggplot(dat) +
aes(
x = MOMWAIS0,
y = SBIQ48,
group = DC_TRT_LAB,
color = DC_TRT_LAB,
shape = DC_TRT_LAB
) +
geom_point(aes(shape = DC_TRT_LAB), size = 2) +
geom_smooth(method = ’loess’,
formula = y ~ x,
aes(linetype = DC_TRT_LAB)) + theme_bw() + ylab("Child IQ at age 2") +
xlab("Mother’s IQ\n\n(a)") +
xlim(c(60, 110)) + ylim(c(60, 130)) +
scale_colour_manual(values = c("darkgrey", "black")) +
theme(legend.position = "none")
p2 <- ggplot(dat_s) +
aes(
x = MOMWAIS0,
y = SBIQ48,
group = DC_TRT_LAB,
color = DC_TRT_LAB,
shape = DC_TRT_LAB
) +
geom_point(aes(shape = DC_TRT_LAB), size = 2) +
geom_smooth(method = ’loess’,
formula = y ~ x,
aes(linetype = DC_TRT_LAB)) + theme_bw() + ylab("Child IQ at age 2") +
xlab("Mother’s IQ\n\n(b)") + xlim(c(60, 110)) + ylim(c(60, 130)) +
scale_colour_manual(values = c("darkgrey", "black")) +
theme(legend.title = element_blank()) +
scale_shape_discrete(
name = "DC_TRT",
breaks = c("0", "1"),
labels = c("Treatment", "Control")
)
multiplot(p1, p2, cols = 2)
#naive treatment effect from RDD data
lm_naive = lm(SBIQ48 ~ DC_TRT, dat_s)
summary(lm_naive)
#RDD effect using global, parametric approach "by hand" without any package
lm_rct3 = lm(SBIQ48 ~ DC_TRT * scale(MOMWAIS0), dat_s)
summary(lm_rct3)
#### "rdd" package ####
#McCrary sorting test as implemented in RDD
rdd::DCdensity(dat_s$MOMWAIS0, median(dat$MOMWAIS0), ext.out = TRUE)
# M1, using defaults
lm_rdd = rdd::RDestimate(SBIQ48 ~ MOMWAIS0, dat_s,
cutpoint = median(dat$MOMWAIS0))
summary(lm_rdd)
#plot of RDD as implemented in RDD
plot(lm_rdd)
#single non-outcome covariate test
lm_rdd_nonoutcome = rdd::RDestimate(APGAR5 ~ MOMWAIS0, dat_s,
cutpoint = median(dat$MOMWAIS0))
summary(lm_rdd_nonoutcome)
#### "rdrobust" package ####
# M2
llm_rdrobust1 = rdrobust::rdrobust(dat_s$SBIQ48, dat_s$MOMWAIS0,
c = median(dat$MOMWAIS0))
print(llm_rdrobust1)
#plot of the RDD as implemented in rdrobust, with addition of confidence interval
rdrobust::rdplot(dat_s$SBIQ48,
dat_s$MOMWAIS0,
c = median(dat$MOMWAIS0),
ci = 95)
# M3
bw = rdrobust::rdbwselect_2014(dat_s$SBIQ48,
dat_s$MOMWAIS0,
c = 85,
bwselect = ’IK’)
llm_rdrobust2 = rdrobust::rdrobust(
dat_s$SBIQ48,
dat_s$MOMWAIS0,
c = median(dat$MOMWAIS0),
h = bw$bws[1],
b = bw$bws[2]
)
print(llm_rdrobust2)
# M4
llm_rdrobust3 = rdrobust::rdrobust(dat_s$SBIQ48,
dat_s$MOMWAIS0,
c = median(dat$MOMWAIS0),
h = lm_rdd$bw[1])
print(llm_rdrobust3)
#### "rddtools" package ####
dat_rddtools = rddtools::rdd_data(
y = SBIQ48,
x = MOMWAIS0,
data = dat_s,
cutpoint = median(dat$MOMWAIS0)
)
# M5
llm_rddtools = rddtools::rdd_reg_np(dat_rddtools)
print(llm_rddtools)
# rddtools uses the same density test and plot as rdd
# (call DCDensity from rdd package)
rddtools::dens_test(llm_rddtools)
#default plot of rddtools package
plot(llm_rddtools)
#the default behavior of plotPlacebo() triggers a error, due to NA returned from
# the recalculations of bw by rddtools::rdd_bw_ik()
rddtools::plotPlacebo(llm_rddtools,
same_bw = T,
from = .25,
to = .75)
#sensitivity to bandwidth choice
rddtools::plotSensi(llm_rddtools,
from = 50,
to = 2,
by = -1)
# M6
lm_rddtools1 = rddtools::rdd_reg_lm(dat_rddtools)
print(lm_rddtools1)
# M7
lm_rddtools2 = rddtools::rdd_reg_lm(dat_rddtools,
bw = rddtools::rdd_bw_ik(dat_rddtools))
print(lm_rddtools2)
# M8
lm_rddtools3 = rddtools::rdd_reg_lm(dat_rddtools, bw = lm_rdd$bw[1])
print(lm_rddtools3)
#### "rddapp" package ####
#M9
#this is assuming that the beta version of rddapp is installed - request from authors
lm_rddapp = rddapp::RDestimate(SBIQ48 ~ MOMWAIS0, dat_s,
cutpoint = median(dat$MOMWAIS0))
summary(lm_rddapp)
计量经济圈是中国计量第一大社区,我们致力于推动中国计量理论和实证技能的提升,圈子以海内外高校研究生和教师为主。我们已经建立了计量经济圈社群,受到海内外高校研究生群体的欢迎,大家在里面分享前沿计量资料和热烈讨论各种计量方法。只要能够沉下心来积淀一段时间,我们相信这是对你的研究工作是大有裨益的,欢迎的你加入到咱们这个大家庭(戳这里)。进去之后一定要看“群公告”,不然接收不了群息,也不知道怎么进入咱们的微信群和计量论坛。
![]()
帮点击一下下面的小广告,谢谢支持!
特别声明:以上内容(如有图片或视频亦包括在内)为自媒体平台“网易号”用户上传并发布,本平台仅提供信息存储服务。
Notice: The content above (including the pictures and videos if any) is uploaded and posted by a user of NetEase Hao, which is a social media platform and only provides information storage services.