不可不知的双重差分法(DID)经典案例合集
综合整理自:
2、双重差分学习手册
目录
1 、简介
2、命令介绍
1 、简介
现代计量经济学和统计学的发展为我们的研究提供了可行的工具。倍差法来源于计量经济学的综列数据模型,是政策分析和工程评估中广为使用的一种计量经济方法。主要是应用于在混合截面数据集中,评价某一事件或政策的影响程度。该方法的基本思路是将调查样本分为两组,一组是政策或工程作用对象即“作用组”,一组是非政策或工程作用对象即“对照组”。根据作用组和对照组在政策或工程实施前后的相关信息,可以计算作用组在政策或工程实施前后某个指标(如收入)的变化量(收入增长量),同时计算对照组在政策或工程实施前后同一指标的变化量。然后计算上述两个变化量的差值(即所谓的“倍差值”)。这就是所谓的双重差分估计量(Difference in Differences,简记DD或DID),因为它是处理组差分与控制组差分之差。该法最早由Ashenfelter(1978)引入经济学,而国内最早的应用或为周黎安、陈烨(2005)。
常用的倍差法主要包括双重倍差法和三重倍差法。双重差分法(Difference-in-difference,DID)有几种其他的称谓:倍差法、差分再差分等。该方法的原理非常简单,它要求数据期至少有两期,所有的样本被分为两类:实验组和控制组,其中实验组在第一期是没有受到政策影响,此后政策开始实施,第二期就是政策实施后的结果,控制组由于一直没有受政策干预,因此其第一期和第二期都是没有政策干预的结果。双重差分方法的测算也非常简单,两次差分的效应就是政策效应。
双重差分法的假定,为了使用OLS一致地估计方程,需要作以下两个假定。
假定1:此模型设定正确。特别地,无论处理组还是控制组,其时间趋势项都是。此假定即“平行趋势假定”(parallel trend assumption)。DID最为重要和关键的前提条件:共同趋势(Common Trends)
双重差分法并不要求实验组和控制组是完全一致的,两组之间可以存在一定的差异,但是双重差分方法要求这种差异不随着时间产生变化,也就是说,处理组和对照组在政策实施之前必须具有相同的发展趋势。
假定2:暂时性冲击与政策虚拟变量不相关。这是保证双向固定效应为一致估计量(consist estimator)的重要条件。在此,可以允许个体固定效应与政策虚拟变量相关(可通过双重差分或组内变换消去,或通过LSDV法控制)。
DID允许根据个体特征进行选择,只要此特征不随时间而变;这是DID的最大优点,即可以部分地缓解因 “选择偏差”(selection bias)而导致的内生性(endogeneity)。
2、命令介绍
下载安装命令方法为:
ssc install diff, replace 下载安装方法(外部命令)
语法格式为:
diff outcome_var [if] [in] [weight] ,[ options]
模型必选项介绍:
其中“outcome_var”表示结果变量
“treat(varname) ”为必选项,用来指定处理变量
“period(varame)”用来指定实验期虚拟变量(1=实验期,0=非实验期)
可选项介绍:
cov(varlist),协变量,加上kernel可以估计倾向得分
kernel, 执行双重差分倾向得分匹配
id(varname),kernel选项要求使用
bw(#) ,核函数的带宽,默认是0.06
ktype(kernel),核函数的类型
qdid(quantile),执行分位数双重差分
pscore(varname) 提供倾向得分
logit,进行倾向得分计算,默认probit回归
ddd(varname),三重差分
SE/Robust
cluster(varname) 计算聚类标准误。
robust 稳健标准误
3、操作应用案例1:垃圾焚化炉的区位对住房价格的影响
本文以伍德里奇书籍配套数据为例,介绍新建垃圾焚烧厂对房价影响。
原文为:
House Prices during Siting Decision Stages: The Case of an Incinerator from Rumor through Operation
Author:Katherine A. Kiel Katherine T. McClain
垃圾焚化炉的区位对住房价格的影响
基尔和麦克菜思( Kiel and Mcclain,1995)曾研究,在马萨诸塞州北安德沃市,一个新建的垃圾焚化炉对住房价值的影响。他们利用多年的数据并作了相当复杂的计量经济分析,我们将只利用两年的数据和些简化模型,但我们的分析仍与之相似。
1978年开始传言要在北安德沃市兴建一座垃圾焚化炉,而于1981年开始动工,人们预料动工后不久化炉便会投入运转;事实上1985年オ开始运转。我们将利用1978年住房出售的价格数据和1981年售价的另一个样本数据。我们的假设是,靠近焚化炉的房价相对远离焚化炉的房价要下跌。
为便于说明,若房子位于焚化炉3英里以内,我们便称之为靠近。
我们先来看看距离的远近对房价的美元影响。这就要求我们用不变美元来度量价格。我们一律用波士顿住房价格指数接1978年美元计算房价,令rprice为真实住房价格。
一位天真的分析者会仅仅使用1981年的数据并估计一个非常简单的模型
其中, nearinc是在住房靠近焚化炉时等于1,否则等于0的一个二值变量,用 KIELMC中的数据估计这个方程,
因为这是一个仅对单个虚拟变量的简单回归,所以截距就是远离焚化炉的住房平均售价,而 nearinc的系数则代表靠近焚化炉与远离焚化炉的住房平均售价之差。估计结果表明,前者的平均售价比后者的要低30688.27美元。统计量的绝对值大于5;从而我们可以强有力地拒绝靠近焚化炉的住房与远离的有相同价值这一原假设
1978年回归结果表明靠近焚化炉的住房比远离他的平均房价低了18824.37美元;而且这一差额也是统计显著的。这正符合焚化炉本来就要建造在房价低地带的观点。
这样一来,我们怎么能说新建一个焚化炉会压低房价呢?关键在于看到 nearinc的系数在1978-1981年的化,1981年的平均房价差异比1978年的要大得多(30688.27美元与18824.37美元),即使把差异折算成不靠近焚化炉的平均房价的百分比也不算小, nearinc的两个系数之差是 =(-306827)-(-18824.37)=-11863.9, 这便是焚化炉对其附近房价之影响的估计。在经验经济学中,,被称为倍差估计量( difference-in fferences estimator),这就是双重差分法。
3.1、导入并且查看数据
use "C:\Users\admin\Desktop\KIELMC.DTA".
ed
desc
数据介绍:在这里面的我们将1978年作为y81=0,然后1981这样的一个新建垃圾焚烧厂建立时期等于1。nearinc表示新建垃圾焚烧厂的距离,lprice 表示房价
3.2、nearinc表示=1 if dist <= 15840
然后查看新建垃圾焚烧厂的分布
tabulate nearinc
=1 if dist |
<= 15840 | Freq. Percent Cum.
------------+-----------------------------------
0 | 225 70.09 70.09
1 | 96 29.91 100.00
------------+-----------------------------------
Total | 321 100.00
3.3、基本回归分析
传统做法只进行1981年的回归分析,方程为:
regress lprice nearinc if y81
regress lprice nearinc if y81==0
regress lprice nearinc if y81
Source | SS df MS Number of obs = 142
-------------+---------------------------------- F(1, 140) = 38.85
Model | 4.65649365 1 4.65649365 Prob > F = 0.0000
Residual | 16.7808607 140 .119863291 R-squared = 0.2172
-------------+---------------------------------- Adj R-squared = 0.2116
Total | 21.4373543 141 .152037974 Root MSE = .34621
------------------------------------------------------------------------------
lprice | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
nearinc | -.402572 .0645888 -6.23 0.000 -.5302675 -.2748765
_cons | 11.74242 .0342802 342.54 0.000 11.67465 11.81019
------------------------------------------------------------------------------
.
.
.
.
.
. regress lprice nearinc if y81==0
Source | SS df MS Number of obs = 179
-------------+---------------------------------- F(1, 177) = 40.31
Model | 4.44632519 1 4.44632519 Prob > F = 0.0000
Residual | 19.5249099 177 .110310226 R-squared = 0.1855
-------------+---------------------------------- Adj R-squared = 0.1809
Total | 23.9712351 178 .13466986 Root MSE = .33213
------------------------------------------------------------------------------
lprice | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
nearinc | -.339923 .0535412 -6.35 0.000 -.4455842 -.2342618
_cons | 11.28542 .0299471 376.84 0.000 11.22632 11.34452
------------------------------------------------------------------------------
.
3.4、双重差分
regress lprice y81 nearinc y81nrinc
Source | SS df MS Number of obs = 321
-------------+---------------------------------- F(3, 317) = 73.15
Model | 25.1332147 3 8.37773824 Prob > F = 0.0000
Residual | 36.3057706 317 .114529245 R-squared = 0.4091
-------------+---------------------------------- Adj R-squared = 0.4035
Total | 61.4389853 320 .191996829 Root MSE = .33842
------------------------------------------------------------------------------
lprice | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
y81 | .4569953 .0453207 10.08 0.000 .3678279 .5461627
nearinc | -.339923 .0545555 -6.23 0.000 -.4472595 -.2325865
y81nrinc | -.062649 .0834408 -0.75 0.453 -.2268167 .1015187
_cons | 11.28542 .0305145 369.84 0.000 11.22539 11.34546
------------------------------------------------------------------------------
.
上述回归分析结果与did命令结果一致,可以发现y81nrinc变量是不显著的,可以加入其他影响变量。
. regress lprice y81 nearinc y81nrinc age agesq lintst lland larea rooms baths
Source | SS df MS Number of obs = 321
-------------+---------------------------------- F(10, 310) = 116.91
Model | 48.5621258 10 4.85621258 Prob > F = 0.0000
Residual | 12.8768595 310 .041538256 R-squared = 0.7904
-------------+---------------------------------- Adj R-squared = 0.7837
Total | 61.4389853 320 .191996829 Root MSE = .20381
------------------------------------------------------------------------------
lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
y81 | .425974 .0284999 14.95 0.000 .3698963 .4820518
nearinc | .032232 .0474876 0.68 0.498 -.0612067 .1256708
y81nrinc | -.1315133 .0519712 -2.53 0.012 -.2337743 -.0292524
age | -.0083591 .0014111 -5.92 0.000 -.0111358 -.0055825
agesq | .0000376 8.67e-06 4.34 0.000 .0000206 .0000547
lintst | -.0614482 .0315075 -1.95 0.052 -.1234438 .0005474
lland | .099845 .024491 4.08 0.000 .0516554 .1480346
larea | .3507722 .0514865 6.81 0.000 .2494649 .4520794
rooms | .0473344 .0173274 2.73 0.007 .0132402 .0814285
baths | .0942767 .0277256 3.40 0.001 .0397225 .1488309
_cons | 7.651756 .4158832 18.40 0.000 6.833445 8.470067
--------------------------------------------------------------------------
加入协变量使用双重差分命令diff进行分析,结果为:
diff lprice,t(nearinc) p(y81 ) robust
DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 321
Before After
Control: 123 102 225
Treated: 56 40 96
179 142
--------------------------------------------------------
Outcome var. | lprice | S. Err. | |t| | P>|t|
----------------+---------+---------+---------+---------
Before | | | |
Control | 11.285 | | |
Treated | 10.946 | | |
Diff (T-C) | -0.340 | 0.062 | -5.45 | 0.000***
After | | | |
Control | 11.742 | | |
Treated | 11.340 | | |
Diff (T-C) | -0.403 | 0.071 | 5.65 | 0.000***
| | | |
Diff-in-Diff | -0.063 | 0.095 | 0.66 | 0.509
--------------------------------------------------------
R-square: 0.41
* Means and Standard Errors are estimated by linear regression
**Robust Std. Errors
**Inference: *** p<0.01; ** p<0.05; * p<0.1
.
. diff lprice,t(nearinc) p(y81 ) ///
> cov(age agesq lintst lland larea rooms baths) robust
DIFFERENCE-IN-DIFFERENCES WITH COVARIATES
DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 321
Before After
Control: 123 102 225
Treated: 56 40 96
179 142
--------------------------------------------------------
Outcome var. | lprice | S. Err. | |t| | P>|t|
----------------+---------+---------+---------+---------
Before | | | |
Control | 7.652 | | |
Treated | 7.684 | | |
Diff (T-C) | 0.032 | 0.063 | 0.51 | 0.611
After | | | |
Control | 8.078 | | |
Treated | 7.978 | | |
Diff (T-C) | -0.099 | 0.060 | 1.65 | 0.099*
| | | |
Diff-in-Diff | -0.132 | 0.060 | 2.20 | 0.029**
--------------------------------------------------------
R-square: 0.79
* Means and Standard Errors are estimated by linear regression
**Robust Std. Errors
**Inference: *** p<0.01; ** p<0.05; * p<0.1
4、双重差分案例2:最低工资法能否会降低对低技能工人的需求?
案例数据介绍:cardkrueger1994
背景介绍:在这种情况下,作者研究提高最低工资的影响在新泽西州——治疗组在快餐行业的就业水平。他们将接受治疗的这一组餐厅员工数量的变化与相邻州宾夕法尼亚州(对照组)的员工数量的变化进行了比较。他们在1992年2月收集了基线,并在11月收集了后续数据。
1992年4月,新泽西州通过最低工资法案,将最低工资从4.25美元提高到5.05美元,而相邻的宾夕法尼亚州的最低工资却保持不变。因此,Card and Kruger考虑了一个自然实验,即将新泽西州作为实验组,而宾州作为控制组,收集了两州不同快餐店在实施新法前后前后雇佣人数的数据,并采用双重差分法进行估计。
该数据集共包含522家快餐,并涉及两个时期(1992年2月和1992年11月,以t表示,分别赋值为0和1)。treated用以区分实验组和控制组,其中1表示新泽西,0表示宾州。因变量为fte(full time employment),用以刻画快餐店的雇佣人数。数据集还包括其余4个控制变量,均为快餐店的品牌,包括bk(Burger King),kfc(Kentuky Fried Chiken ),roys(Roy Rogers),wendys(Wendy's)。
首先我们先定义t和treated的交互项,并用进行双重差分估计:
use "http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta"
生成实验组和法案实施时期的交互项
gen gd=t*treated // (定义交叉项gd)
手工进行DID估计,并使用稳健标准误
reg fte gd treated t, r
结果为:
gen gd=t*treated
. reg fte gd treated t, r
Linear regression Number of obs = 801
F(3, 797) = 1.43
Prob > F = 0.2330
R-squared = 0.0080
Root MSE = 9.003
------------------------------------------------------------------------------
| Robust
fte | Coefficient std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
gd | 2.913982 1.736818 1.68 0.094 -.4952963 6.323261
treated | -2.883534 1.403338 -2.05 0.040 -5.638209 -.1288592
t | -2.40651 1.594091 -1.51 0.132 -5.535623 .7226031
_cons | 19.94872 1.317281 15.14 0.000 17.36297 22.53447
------------------------------------------------------------------------------
.
上述结果显示,政策效应(did)在10%的显著性水平上显著,且系数为正(2.914),表明最低工资法案政策实施后,快餐店的雇佣人数不会减少,反而会在一定程度上增多。不过,这个结论未考虑其他控制变量的影响。
接着我们引入快餐品牌的虚拟变量作为控制变量,再次回归
reg fte gd treated t bk kfc roys,r
Linear regression Number of obs = 801
F(6, 794) = 57.30
Prob > F = 0.0000
R-squared = 0.1878
Root MSE = 8.1617
------------------------------------------------------------------------------
| Robust
fte | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
gd | 2.93502 1.543422 1.90 0.058 -.0946504 5.96469
treated | -2.323906 1.253701 -1.85 0.064 -4.784867 .1370549
t | -2.402678 1.410265 -1.70 0.089 -5.170966 .3656108
bk | .9168795 .9382545 0.98 0.329 -.9248729 2.758632
kfc | -9.204856 .8991089 -10.24 0.000 -10.96977 -7.439945
roys | -.8970458 1.041071 -0.86 0.389 -2.940623 1.146532
_cons | 21.16069 1.307146 16.19 0.000 18.59482 23.72656
------------------------------------------------------------------------------
使用diff命令进行操作,结果为:
*-2、双重差分
diff fte, t(treated) p(t) robust
****结果为:
*-----------------------------------result.begin--------------------------------
diff fte, t(treated) p(t) robust
DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 801
Before After
Control: 78 77 155
Treated: 326 320 646
404 397
--------------------------------------------------------
Outcome var. | fte | S. Err. | |t| | P>|t|
----------------+---------+---------+---------+---------
Before | | | |
Control | 19.949 | | |
Treated | 17.065 | | |
Diff (T-C) | -2.884 | 1.403 | -2.05 | 0.040**
After | | | |
Control | 17.542 | | |
Treated | 17.573 | | |
Diff (T-C) | 0.030 | 1.023 | 0.03 | 0.976
| | | |
Diff-in-Diff | 2.914 | 1.737 | 1.68 | 0.094*
--------------------------------------------------------
R-square: 0.01
* Means and Standard Errors are estimated by linear regression
**Robust Std. Errors
**Inference: *** p<0.01; ** p<0.05; * p<0.1
*-----------------------------------result.over--------------------------------
4.2、DID with covariates带协变量的估计
diff fte, t(treated) p(t) cov(bk kfc roys)
diff fte, t(treated) p(t) cov(bk kfc roys) report
diff fte, t(treated) p(t) cov(bk kfc roys) report bs
结果为:
. diff fte, t(treated) p(t) cov(bk kfc roys)
DIFFERENCE-IN-DIFFERENCES WITH COVARIATES
DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 801
Before After
Control: 78 77 155
Treated: 326 320 646
404 397
--------------------------------------------------------
Outcome var. | fte | S. Err. | |t| | P>|t|
----------------+---------+---------+---------+---------
Before | | | |
Control | 21.161 | | |
Treated | 18.837 | | |
Diff (T-C) | -2.324 | 1.031 | -2.25 | 0.024**
After | | | |
Control | 18.758 | | |
Treated | 19.369 | | |
Diff (T-C) | 0.611 | 1.037 | 0.59 | 0.556
| | | |
Diff-in-Diff | 2.935 | 1.460 | 2.01 | 0.045**
--------------------------------------------------------
R-square: 0.19
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
.
4.3、Kernel Propensity Score Diff-in-Diff
diff fte, t(treated) p(t) cov(bk kfc roys) kernel rcs
diff fte, t(treated) p(t) cov(bk kfc roys) kernel rcs support
diff fte, t(treated) p(t) cov(bk kfc roys) kernel rcs support addcov(wendys)
diff fte, t(treated) p(t) kernel rcs ktype(gaussian) pscore(_ps)
diff fte, t(treated) p(t) cov(bk kfc roys) kernel rcs support addcov(wendys) bs reps(50)
结果为:
. diff fte, t(treated) p(t) cov(bk kfc roys) kernel rcs
KERNEL PROPENSITY SCORE MATCHING DIFFERENCE-IN-DIFFERENCES
Repeated Cross Section - rcs option
Matching iterations: control group at base line...
..............................................................................................
> ............................................................................................
> ............................................................................................
> ................................................
Matching iterations: control group at follow up...
..............................................................................................
> ............................................................................................
> ............................................................................................
> ..........................................
Matching iterations: treated group at baseline...
..............................................................................................
> ............................................................................................
> ............................................................................................
> ................................................
DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 801
Before After
Control: 78 77 155
Treated: 326 320 646
404 397
--------------------------------------------------------
Outcome var. | fte | S. Err. | |t| | P>|t|
----------------+---------+---------+---------+---------
Before | | | |
Control | 20.040 | | |
Treated | 17.405 | | |
Diff (T-C) | -2.636 | 0.939 | -2.81 | 0.005***
After | | | |
Control | 17.341 | | |
Treated | 17.573 | | |
Diff (T-C) | 0.232 | 0.948 | 0.24 | 0.807
| | | |
Diff-in-Diff | 2.867 | 1.334 | 2.15 | 0.032**
--------------------------------------------------------
R-square: 0.01
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
.
4.4、 Quantile Diff-in-Diff 分位数双重差分法
diff fte, t(treated) p(t) qdid(0.25)
diff fte, t(treated) p(t) qdid(0.50)
diff fte, t(treated) p(t) qdid(0.75)
diff fte, t(treated) p(t) qdid(0.50) cov(bk kfc roys)
diff fte, t(treated) p(t) qdid(0.50) cov(bk kfc roys) kernel id(id) diff fte, t(treated) p(t) qdid(0.50) cov(bk kfc roys) kernel rcs
结果为
diff fte, t(treated) p(t) qdid(0.25)
DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 801
Before After
Control: 78 77 155
Treated: 326 320 646
404 397
--------------------------------------------------------
Outcome var. | fte | S. Err. | |t| | P>|t|
----------------+---------+---------+---------+---------
Before | | | |
Control | 12.500 | | |
Treated | 11.000 | | |
Diff (T-C) | -1.500 | 1.584 | -0.95 | 0.344
After | | | |
Control | 11.500 | | |
Treated | 11.500 | | |
Diff (T-C) | -0.000 | 1.658 | 0.00 | 1.000
| | | |
Diff-in-Diff | 1.500 | 2.293 | 0.65 | 0.513
--------------------------------------------------------
R-square: 0.00
* Values are estimated at the .25 quantile
**Inference: *** p<0.01; ** p<0.05; * p<0.1
.
4.5、Balancing test of covariates.包含协变量的控制组与实验组之间差异检验
diff fte, t(treated) p(t) cov(bk kfc roys wendys) test
diff fte, t(treated) p(t) cov(bk kfc roys wendys) test id(id) kernel
diff fte, t(treated) p(t) cov(bk kfc roys wendys) test kernel rcs
diff fte, t(treated) p(t) cov(bk kfc roys wendys) test
TWO-SAMPLE T TEST
Number of observations (baseline): 404
Before After
Control: 78 - 78
Treated: 326 - 326
404 -
t-test at period = 0:
----------------------------------------------------------------------------------------------
Variable(s) | Mean Control | Mean Treated | Diff. | |t| | Pr(|T|>|t|)
---------------------+------------------+--------------+------------+---------+---------------
fte | 19.949 | 17.065 | -2.884 | 2.44 | 0.0150**
bk | 0.443 | 0.411 | -0.032 | 0.52 | 0.6035
kfc | 0.152 | 0.205 | 0.054 | 1.08 | 0.2818
roys | 0.215 | 0.248 | 0.033 | 0.61 | 0.5448
wendys | 0.190 | 0.136 | -0.054 | 1.22 | 0.2241
----------------------------------------------------------------------------------------------
*** p<0.01; ** p<0.05; * p<0.1
.
4.6. Triple differences (consider bk is a second treatment category).
三重差分法
diff fte, t(treated) p(t) ddd(bk)
diff fte, t(treated) p(t) ddd(bk)
TRIPLE DIFFERENCE (DDD) ESTIMATION RESULTS
Notation of DDD:
Control (A) treated = 0 and bk = 1
Control (B) treated = 0 and bk = 0
Treated (A) treated = 1 and bk = 1
Treated (B) treated = 1 and bk = 0
Number of observations in the DDD: 801
Before After
Control (A):34 35 69
Control (B):44 42 86
Treated (A):133 132 265
Treated (B):193 188 381
404 397
--------------------------------------------------------
Outcome var. | fte | S. Err. | |t| | P>|t|
----------------+---------+---------+---------+---------
Before | | | |
Control (A) | 25.654 | | |
Control (B) | 15.540 | | |
Treated (A) | 18.547 | | |
Treated (B) | 16.044 | | |
Diff (T-C) | -7.612 | 2.206 | 3.45 | 0.001***
After | | | |
Control (A) | 22.193 | | |
Control (B) | 13.667 | | |
Treated (A) | 19.913 | | |
Treated (B) | 15.930 | | |
Diff (T-C) | -4.543 | 2.214 | 2.05 | 0.040**
| | | |
DDD | 3.069 | 3.125 | 0.98 | 0.326
--------------------------------------------------------
R-square: 0.09
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
.
5、双重差分案例:Stata中双重差分操流程及代码
本文由计量经济学服务中心综合整理
部分来源:社会科学中的数据可视化(id:SKSJKSH)、普林斯顿大学教程https://dss.princeton.edu/training/DID101.pdf
转载请注明来源
一、简介
现代计量经济学和统计学的发展为我们的研究提供了可行的工具。倍差法来源于计量经济学的综列数据模型,是政策分析和工程评估中广为使用的一种计量经济方法。主要是应用于在混合截面数据集中,评价某一事件或政策的影响程度。该方法的基本思路是将调查样本分为两组,一组是政策或工程作用对象即“作用组”,一组是非政策或工程作用对象即“对照组”。根据作用组和对照组在政策或工程实施前后的相关信息,可以计算作用组在政策或工程实施前后某个指标(如收入)的变化量(收入增长量),同时计算对照组在政策或工程实施前后同一指标的变化量。然后计算上述两个变化量的差值(即所谓的“倍差值”)。这就是所谓的双重差分估计量(Difference in Differences,简记DD或DID),因为它是处理组差分与控制组差分之差。该法最早由Ashenfelter(1978)引入经济学,而国内最早的应用或为周黎安、陈烨(2005)。
常用的倍差法主要包括双重倍差法和三重倍差法。双重差分法(Difference-in-difference,DID)有几种其他的称谓:倍差法、差分再差分等。该方法的原理非常简单,它要求数据期至少有两期,所有的样本被分为两类:实验组和控制组,其中实验组在第一期是没有受到政策影响,此后政策开始实施,第二期就是政策实施后的结果,控制组由于一直没有受政策干预,因此其第一期和第二期都是没有政策干预的结果。双重差分方法的测算也非常简单,两次差分的效应就是政策效应。
双重差分法的假定,为了使用OLS一致地估计方程,需要作以下两个假定。
假定1:此模型设定正确。特别地,无论处理组还是控制组,其时间趋势项都是。此假定即“平行趋势假定”(parallel trend assumption)。DID最为重要和关键的前提条件:共同趋势(Common Trends)
双重差分法并不要求实验组和控制组是完全一致的,两组之间可以存在一定的差异,但是双重差分方法要求这种差异不随着时间产生变化,也就是说,处理组和对照组在政策实施之前必须具有相同的发展趋势。
假定2:暂时性冲击与政策虚拟变量不相关。这是保证双向固定效应为一致估计量(consist estimator)的重要条件。在此,可以允许个体固定效应与政策虚拟变量相关(可通过双重差分或组内变换消去,或通过LSDV法控制)。
DID允许根据个体特征进行选择,只要此特征不随时间而变;这是DID的最大优点,即可以部分地缓解因 “选择偏差”(selection bias)而导致的内生性(endogeneity)。
二、双重差分操作案例
Difference in differences (DID) Estimation step‐by‐step双重差分操作步骤
首先我们读入所需数据,生成政策前后以及控制组虚拟变量,并将它们相乘产生交互项。
方法一:
Getting sample data调用数据
use "http://dss.princeton.edu/training/Panel101.dta", clear
Create a dummy variable to indicate the time when the treatment started. Lets assume that treatment started in 1994. In this case, years before 1994 will have a value of 0 and 1994+ a 1. If you already have this skip this step.设置虚拟变量,政策执行时间为1994年
gen time = (year>=1994) & !missing(year)
*Create a dummy variable to identify the group exposed to the treatment. In this example lets assumed that countries with code 5,6, and 7 were treated (=1). Countries 1-4 were not treated (=0). If you already have this skip this step生成地区的虚拟变量
gen treated = (country>4) & !missing(country)
* Create an interaction between time and treated. We will call this interaction ‘did’ 产生交互项
gen did = time*treated
Estimating the DID estimator随后将这三个变量作为解释变量,y作为被解释变量进行回归:
reg y time treated did, r
结果为:
did的系数显著为负,表明政策实施对Y有显著的(10%显著性水平下)负效应
方法二:diff
The command diff is user‐defined for Stata,To install type
ssc install diff下载外部命令方法
diff y, t(treated) p(time)
三、双重差分平行趋势检验
平行趋势检验
首先生成年份虚拟变量与实验组虚拟变量的交互项,此处选在政策前后各3年进行对比。
gen period = year - 1994
forvalues i = 3(-1)1{
gen pre_`i' = (period == -`i' & treated == 1) }
gen current = (period == 0 & treated == 1)
forvalues j = 1(1)3{
gen time_`j' = (period == `j' & treated == 1)
}
随后将这些交互项作为解释变量进行回归,并将结果储存在reg中以备后续检验。
xtreg y time treated pre_* current time_* i.year, fe
est sto reg
采用coefplot命令进行绘图,观察是否1994年前的回归系数均在0轴附近波动,在1994年后回归系数显著为负。
coefplot reg, keep(pre_* current post_*) vertical recast(connect) yline(0) xline(3, lp(dash))
结果发现系数在政策前的确在0附近波动,而政策后一年系数显著为负,但很快又回到0附近。这说明实验组和控制组的确是可以进行比较的,而政策效果可能出现在颁布后一年,随后又很快消失。