双重差分学习手册
本文由计量经济学服务中心综合整理
转载请注明来源
一、简介
现代计量经济学和统计学的发展为我们的研究提供了可行的工具。倍差法来源于计量经济学的综列数据模型,是政策分析和工程评估中广为使用的一种计量经济方法。主要是应用于在混合截面数据集中,评价某一事件或政策的影响程度。该方法的基本思路是将调查样本分为两组,一组是政策或工程作用对象即“作用组”,一组是非政策或工程作用对象即“对照组”。根据作用组和对照组在政策或工程实施前后的相关信息,可以计算作用组在政策或工程实施前后某个指标(如收入)的变化量(收入增长量),同时计算对照组在政策或工程实施前后同一指标的变化量。然后计算上述两个变化量的差值(即所谓的“倍差值”)。这就是所谓的双重差分估计量(Difference in Differences,简记DD或DID),因为它是处理组差分与控制组差分之差。该法最早由Ashenfelter(1978)引入经济学,而国内最早的应用或为周黎安、陈烨(2005)。
常用的倍差法主要包括双重倍差法和三重倍差法。双重差分法(Difference-in-difference,DID)有几种其他的称谓:倍差法、差分再差分等。该方法的原理非常简单,它要求数据期至少有两期,所有的样本被分为两类:实验组和控制组,其中实验组在第一期是没有受到政策影响,此后政策开始实施,第二期就是政策实施后的结果,控制组由于一直没有受政策干预,因此其第一期和第二期都是没有政策干预的结果。双重差分方法的测算也非常简单,两次差分的效应就是政策效应。
双重差分法的假定,为了使用OLS一致地估计方程,需要作以下两个假定。
假定1:此模型设定正确。特别地,无论处理组还是控制组,其时间趋势项都是。此假定即“平行趋势假定”(parallel trend assumption)。 DID最为重要和关键的前提条件:共同趋势(Common Trends)
双重差分法并不要求实验组和控制组是完全一致的,两组之间可以存在一定的差异,但是双重差分方法要求这种差异不随着时间产生变化,也就是说,处理组和对照组在政策实施之前必须具有相同的发展趋势。
假定2:暂时性冲击与政策虚拟变量不相关。这是保证双向固定效应为一致估计量(consist estimator)的重要条件。 在此,可以允许个体固定效应与政策虚拟变量相关(可通过双重差分或组内变换消去,或通过LSDV法控制)。
DID允许根据个体特征进行选择,只要此特征不随时间而变;这是DID的最大优点,即可以部分地缓解因 “选择偏差”(selection bias)而导致的内生性(endogeneity)。
二、命令介绍
diff can be installed or updated from the SSC archive by running the command:
ssc install diff, replace 下载安装方法(外部命令)
The diff syntax is detailed as follows:
diff outcome_var [if] [in] [weight] ,[ options]
模型必选项介绍:
outcome_var :结果变量
period(varname) :实验期变量
treated(varname) :处理变量
cov(varlist) :协变量。
period(varname) Indicates the binary period variable (0: before; 1: after). Note: if your data contains a periodical frequency (monthly, quarterly, yearly, etc.), it is suggested to specify option period(varname) and include a binary variable for each frequency in option cov(varlist).
treated(varname) Indicates the binary treatment variable (0: controls; 1:treated).
可选项介绍:
cov(varlist),协变量,加上kernel可以估计倾向得分
kernel, 执行双重差分倾向得分匹配
id(varname),kernel选项要求使用
bw(#) ,核函数的带宽,默认是0.06
ktype(kernel),核函数的类型. The types are epanechnikov (the default), gaussian, biweight, uniform and tricube.
rcs Indicates that the kernel is set for repeated cross section. This option does not require option id(varname). Option rcs strongly assumes that covariates in cov(varlist) do not vary over time.
qdid(quantile),执行分位数双重差分Performs the Quantile Difference in Differences estimation at the specified quantile from 0.1 to 0.9 (quantile 0.5 performs the QDID at the medeian). You may combine this option with kernel and cov. qdid does not support weights nor robust standard errors. This option uses [R] qreg and [R] bsqreg for bootstrapped standard errors
pscore(varname) Supplied Propensity Score.提供倾向得分
logit,进行倾向得分计算,默认probit回归Specifies logit estimation of the Propensity Score. The default is Probit.
support Performs diff on the common support of the propensity score given the option kernel.
addcov(varlist) Indicates additional covariates in addition to those specified in the estimation of the propensity score. Also use this option to specify time fixed-effects in the case of multiple time-frequency data (e.g. monthly, yearly, quarterly, etc.).
ddd(varname),三重差分 Additional category for triple difference estimation. treated(varname) is deemed as the first category and ddd(varname) the second category. This option is not compatible with options kernel, test or qdid(quantile).
SE/Robust
cluster(varname) Calculates clustered Std. Errors by varname.计算聚类标准误。
robust Calculates robust Std. Errors.稳健标准误
bs performs a Bootstrap estimation of coefficients and standard errors.
reps(int) Specifies the number of repetitions when the bs is selected. The default are 50 repetitions.
Balancing test 平衡检验
test Performs a balancing t-test of the difference in the means of the covariates between the control and treated groups in period == 0. The option test combined with kernel performs the balancing t-test with the weighted covariates. See [R] ttest
Reporting
report Displays the inference of the included covariates or the estimation of the Propensity Score when option kernel is specified.
nostar Removes the inference stars from the p-values.
三、案例介绍
案例数据介绍:cardkrueger1994
背景介绍:In this case, the authors study the impact of the increase in the minimum wage in the state of New Jersey -the treated group- on the employment level at the fast food industry. They compare the changes in the number of employees at the restaurants in this treated group to the ones of the neighbor state, Pennsylvania -the control group-. They collect a baseline in February, 1992, and a follow-up in November.
数据结构如下:
1、DID with no covariates不带协变量的估计
diff fte, t(treated) p(t)
bootstrapped 稳健标准误
2、DID with covariates带协变量的估计
diff fte, t(treated) p(t) cov(bk kfc roys)
diff fte, t(treated) p(t) cov(bk kfc roys) report
diff fte, t(treated) p(t) cov(bk kfc roys) report bs
3、Kernel Propensity Score Diff-in-Diff
diff fte, t(treated) p(t) cov(bk kfc roys) kernel rcs
diff fte, t(treated) p(t) cov(bk kfc roys) kernel rcs support
diff fte, t(treated) p(t) cov(bk kfc roys) kernel rcs support addcov(wendys)
diff fte, t(treated) p(t) kernel rcs ktype(gaussian) pscore(_ps)
diff fte, t(treated) p(t) cov(bk kfc roys) kernel rcs support addcov(wendys) bs reps(50)
4、 Quantile Diff-in-Diff 分位数双重差分法
diff fte, t(treated) p(t) qdid(0.25)
diff fte, t(treated) p(t) qdid(0.50)
diff fte, t(treated) p(t) qdid(0.75)
diff fte, t(treated) p(t) qdid(0.50) cov(bk kfc roys)
diff fte, t(treated) p(t) qdid(0.50) cov(bk kfc roys) kernel id(id) diff fte, t(treated) p(t) qdid(0.50) cov(bk kfc roys) kernel rcs
5、Balancing test of covariates.包含协变量的控制组与实验组之间差异检验
diff fte, t(treated) p(t) cov(bk kfc roys wendys) test
diff fte, t(treated) p(t) cov(bk kfc roys wendys) test id(id) kernel
diff fte, t(treated) p(t) cov(bk kfc roys wendys) test kernel rcs
6. Triple differences (consider bk is a second treatment category).
三重差分法
diff fte, t(treated) p(t) ddd(bk)