双重差分倾向得分匹配(PSM-DID)
以下文章来源于计量经济学 ,作者小计量
PSM-DID基本介绍
双重差分PSM模型是由Heckman et al(1997,1998)提出的。假设存在两期面板数据,实验前的时期记为t’,实验后的数据记为t。对于控制组合处理组在t’时期,其潜在结果均为yot‘,但是在t时期的时候存在两种潜在结果即,控制组为y0t,处理组为y1t。
双重差分PSM模型成立的假设为:
如果以上假定成立,则可以得到ATT的一致估计:
步骤
双重差分PSM的估计步骤大致如下:
(1)根据处理变量D和协变量X计算倾向得分
(2)对于处理组的每个个体i确定与其匹配的全部控制组个体(即确定集合Sp)
(3)对于处理组的每位个体i,计算其结果变量前后变化
(4)对于处理组的每个个体i,计算与其匹配的全部控制个体的前后变化
(5)针对(3)和(4)中的公式,根据以上公式进行倾向得分核匹配或局部线性回归匹配,即可得到ATT
优点:
控制不可观测但不随时间变化的组间差异。例如处理组和控制组来自两个不通过的区域,或者处理组或者控制组使用了两套调查问卷。
操作
***PSM_DID
ssc install diff
help diff
双重差分语法格式
diff outcome_var ,treat(varname) period(varame) id(varname) ///
kernel ktype(kernel) cov(varlist) report logit support test
语法格式解释
其中“outcome_var”表示结果变量
“treat(varname) ”为必选项,用来指定处理变量,
“period(varame)”用来指定实验期虚拟变量(1=实验期,0=非实验期),
“id(varname)”用来指定个体id(这是进行匹配的前提),
“kernel”表示使用核匹配方法(diff命令不提供其他匹配方法),
“cov(varlist)”用来指定倾向得分的协变量,
“report”表示汇报倾向得分的估计结果,
“logit”表示使用logit计算得分,默认选项为probit,
“support”表示仅使用共同取值范围内的观测值进行匹配,
“test”表示检验倾向得分匹配之后的,各变量在实验组和控制在分布是否平衡。
演示
***PSM_DID
ssc install diff
help diff
***双重差分语法格式***
diff outcome_var ,treat(varname) period(varame) id(varname) ///
kernel ktype(kernel) cov(varlist) report logit support test
use cardkrueger1994.dta
bro
des
sum
diff fte ,t(treated) p(t) kernel id(id) logit cov(bk kfc roys) ///
report support
diff fte ,t(treated) p(t) kernel id(id) logit cov(bk kfc roys) ///
report support test
结果为:
. use http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta
(Dataset from Card&Krueger (1994))
. bro
.
.
.
. des
Contains data from http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta
Observations: 820 Dataset from Card&Krueger (1994)
Variables: 8 27 May 2011 20:36
-------------------------------------------------------------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-------------------------------------------------------------------------------------------------------------------------------------
id int %8.0g Store ID
t byte %8.0g Feb. 1992 = 0; Nov. 1992 = 1
treated long %8.0g treated New Jersey = 1; Pennsylvania = 0
fte float %9.0g Output: Full Time Employment
bk byte %8.0g Burger King == 1
kfc byte %8.0g Kentuky Fried Chiken == 1
roys byte %8.0g Roy Rogers == 1
wendys byte %8.0g Wendy's == 1
-------------------------------------------------------------------------------------------------------------------------------------
Sorted by: id t
.
.
.
. sum
Variable | Obs Mean Std. dev. Min Max
-------------+---------------------------------------------------------
id | 820 246.5073 148.1413 1 522
t | 820 .5 .5003052 0 1
treated | 820 .8073171 .3946469 0 1
fte | 801 17.59457 9.022517 0 80
bk | 820 .4170732 .4933761 0 1
-------------+---------------------------------------------------------
kfc | 820 .195122 .3965364 0 1
roys | 820 .2414634 .4282318 0 1
wendys | 820 .1463415 .3536639 0 1
. diff fte ,t(treated) p(t) kernel id(id) logit cov(bk kfc roys) report support
KERNEL PROPENSITY SCORE MATCHING DIFFERENCE-IN-DIFFERENCES
Estimation on common support
Report - Propensity score estimation with logit command
Atention: _pscore is estimated at baseline
Iteration 0: log likelihood = -198.21978
Iteration 1: log likelihood = -196.77862
Iteration 2: log likelihood = -196.7636
Iteration 3: log likelihood = -196.7636
Logistic regression Number of obs = 404
LR chi2(3) = 2.91
Prob > chi2 = 0.4053
Log likelihood = -196.7636 Pseudo R2 = 0.0073
------------------------------------------------------------------------------
treated | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
bk | .3108387 .3561643 0.87 0.383 -.3872306 1.008908
kfc | .6814511 .4335455 1.57 0.116 -.1682824 1.531185
roys | .520356 .4011747 1.30 0.195 -.265932 1.306644
_cons | 1.05315 .2998708 3.51 0.000 .465414 1.640886
------------------------------------------------------------------------------
Matching iterations...
.....................................................................................................................................
> ...................................................................................................................................
> ..............................................................
DIFFERENCE-IN-DIFFERENCES ESTIMATION RESULTS
Number of observations in the DIFF-IN-DIFF: 795
Before After
Control: 78 76 154
Treated: 326 315 641
404 391
--------------------------------------------------------
Outcome var. | fte | S. Err. | |t| | P>|t|
----------------+---------+---------+---------+---------
Before | | | |
Control | 20.040 | | |
Treated | 17.065 | | |
Diff (T-C) | -2.975 | 0.943 | -3.16 | 0.002***
After | | | |
Control | 17.449 | | |
Treated | 17.499 | | |
Diff (T-C) | 0.050 | 0.955 | 0.05 | 0.958
| | | |
Diff-in-Diff | 3.026 | 1.342 | 2.25 | 0.024**
--------------------------------------------------------
R-square: 0.02
* Means and Standard Errors are estimated by linear regression
**Inference: *** p<0.01; ** p<0.05; * p<0.1
.
. diff fte ,t(treated) p(t) kernel id(id) logit cov(bk kfc roys) report support test
Report - Propensity score estimation with logit command
Atention: _pscore is estimated at baseline
Iteration 0: log likelihood = -198.21978
Iteration 1: log likelihood = -196.77862
Iteration 2: log likelihood = -196.7636
Iteration 3: log likelihood = -196.7636
Logistic regression Number of obs = 404
LR chi2(3) = 2.91
Prob > chi2 = 0.4053
Log likelihood = -196.7636 Pseudo R2 = 0.0073
------------------------------------------------------------------------------
treated | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
bk | .3108387 .3561643 0.87 0.383 -.3872306 1.008908
kfc | .6814511 .4335455 1.57 0.116 -.1682824 1.531185
roys | .520356 .4011747 1.30 0.195 -.265932 1.306644
_cons | 1.05315 .2998708 3.51 0.000 .465414 1.640886
------------------------------------------------------------------------------
Matching iterations...
.....................................................................................................................................
> ...................................................................................................................................
> ..............................................................
TWO-SAMPLE T TEST
Test on common support
Number of observations (baseline): 404
Before After
Control: 78 - 78
Treated: 326 - 326
404 -
t-test at period = 0:
----------------------------------------------------------------------------------------------
Weighted Variable(s) | Mean Control | Mean Treated | Diff. | |t| | Pr(|T|>|t|)
---------------------+------------------+--------------+------------+---------+---------------
fte | 20.040 | 17.065 | -2.975 | 2.89 | 0.0041***
bk | 0.468 | 0.408 | -0.060 | 1.21 | 0.2259
kfc | 0.144 | 0.209 | 0.064 | 1.69 | 0.0911*
roys | 0.272 | 0.252 | -0.020 | 0.46 | 0.6462
----------------------------------------------------------------------------------------------
*** p<0.01; ** p<0.05; * p<0.1
Attention: option kernel weighs variables in cov(varlist)
Means and t-test are estimated by linear regression