其他
倾向得分匹配法(PSM)举例及 stata 实现
倾向得分匹配法(PSM)举例及 stata 实现
推荐阅读:
倾向匹配得分教程(附PSM操作应用、平衡性检验、共同取值范围、核密度函数图)
政策背景:
国家支持工作示范项目(National Supported Work,NSW)
研究目的:
检验接受该项目(培训)与不接受该项目(培训)对工资的影响。
基本思想:
分析接受培训组(处理组,treatment group)接受培训行为与不接受培训行为在工资表现上的差异。但是,现实可以观测到的是处理组接受培训的事实,而处理组没有接受培训会怎样是不可能观测到的,这种状态也成为反事实(counterfactual)。匹配法就是为了解决这种不可观测事实的方法。在倾向得分匹配方法(Propensity Score Matching)中,根据处理指示变量将样本分为两个组,一是处理组,在本例中就是在 NSW 实施后接受培训的组;二是对照组(comparison group),
在本例中就是在 NSW 实施后不接受培训的组。
倾向得分匹配方法的基本思想是,在处理组和对照组样本通过一定的方式匹配后,在其他条件完全相同的情况下,通过接受培训的组(处理组)与不接受培训的组(对照组)在工资表现上的差异来判断接受培训的行为与工资之间的因果关系
. desc
Contains data from C:\Users\Metrics\Desktop\计量经济学\高级\A15-psm\data\ldw_exper.dta
obs: 445
vars: 12 30 Jan 2013 12:47
size: 12,015
--------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
--------------------------------------------------------------------------------------
t byte %8.0g participation in job training program
age byte %8.0g age
educ byte %8.0g years of education
black byte %8.0g indicator for African-American
hisp byte %8.0g indicator for Hispanic
married byte %8.0g indicator for married
nodegree byte %8.0g indicator for more than grade school but
less than high-school education
re74 float %9.0g real earnings in 1974 (in thousands of
1978 $)
re75 float %9.0g real earnings in 1975 (in thousands of
1978 $)
re78 float %9.0g real earnings in 1978 (in thousands of
1978 $)
u74 float %9.0g indicator for unemployed in 1974
u75 float %9.0g indicator for unemployed in 1975
--------------------------------------------------------------------------------------
Sorted by:
按处理组分类统计
bysort t :sum age educ nodegree black hisp married u74 u75
--------------------------------------------------------------------------------------
-> t = 0
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
age | 260 25.05385 7.057745 17 55
educ | 260 10.08846 1.614325 3 14
nodegree | 260 .8346154 .3722439 0 1
black | 260 .8269231 .3790434 0 1
hisp | 260 .1076923 .3105893 0 1
-------------+---------------------------------------------------------
married | 260 .1538462 .3614971 0 1
u74 | 260 .75 .4338478 0 1
u75 | 260 .6846154 .4655651 0 1
--------------------------------------------------------------------------------------
-> t = 1
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
age | 185 25.81622 7.155019 17 48
educ | 185 10.34595 2.01065 4 16
nodegree | 185 .7081081 .4558666 0 1
black | 185 .8432432 .3645579 0 1
hisp | 185 .0594595 .2371244 0 1
-------------+---------------------------------------------------------
married | 185 .1891892 .3927217 0 1
u74 | 185 .7081081 .4558666 0 1
u75 | 185 .6 .4912274 0 1
.
.
.
描述性分析
tabulate t, summarize(re78) means standard
结果为:
. tabulate t, summarize(re78) means standard
participati | Summary of real
on in job | earnings in 1978 (in
training | thousands of 1978 $)
program | Mean Std. Dev.
------------+------------------------
0 | 4.5548023 5.4838368
1 | 6.3491454 7.8674047
------------+------------------------
Total | 5.3007651 6.6314934
设置种子数
set seed 20180105 //产生随机数种子
gen u=runiform()
sort u //排序
或者order u
上述命令是为了生成伪随机数,满足01的均匀分布
生成宏变量
local v1 "t"
local v2 "age edu black hisp married re74 re75 u74 u75"
global x "`v1' `v2' "
倾向匹配得分
psmatch2 $x, out(re78) neighbor(1) ate ties logit common // 1:1 匹配
$表示引用宏变量,
psmatch2 $x, out(re78) neighbor(1) ate ties logit common // 1:1 匹
等价于
psmatch2 t age edu black hisp married re74 re75 u74 u75, out(re78) neighbor(1) ate ties logit common
结果为:
psmatch2 $x, out(re78) neighbor(1) ate ties logit common
Logistic regression Number of obs = 445
LR chi2(9) = 11.70
Prob > chi2 = 0.2308
Log likelihood = -296.25026 Pseudo R2 = 0.0194
------------------------------------------------------------------------------
t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0142619 .0142116 1.00 0.316 -.0135923 .0421162
educ | .0499776 .0564116 0.89 0.376 -.060587 .1605423
black | -.347664 .3606532 -0.96 0.335 -1.054531 .3592032
hisp | -.928485 .50661 -1.83 0.067 -1.921422 .0644523
married | .1760431 .2748817 0.64 0.522 -.3627151 .7148012
re74 | -.0339278 .0292559 -1.16 0.246 -.0912683 .0234127
re75 | .01221 .0471351 0.26 0.796 -.0801731 .1045932
u74 | -.1516037 .3716369 -0.41 0.683 -.8799987 .5767913
u75 | -.3719486 .317728 -1.17 0.242 -.9946841 .2507869
_cons | -.4736308 .8244205 -0.57 0.566 -2.089465 1.142204
------------------------------------------------------------------------------
There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.
--------------------------------------------------------------------------------------
> --
Variable Sample | Treated Controls Difference S.E. T-st
> at
----------------------------+---------------------------------------------------------
> --
re78 Unmatched | 6.34914538 4.55480228 1.79434311 .632853552 2.
> 84
ATT | 6.40495818 4.99436488 1.4105933 .839875971 1.
> 68
ATU | 4.52683013 6.15618973 1.6293596 .
> .
ATE | 1.53668776 .
> .
----------------------------+---------------------------------------------------------
> --
Note: S.E. does not take into account that the propensity score is estimated.
psmatch2: | psmatch2: Common
Treatment | support
assignment | Off suppo On suppor | Total
-----------+----------------------+----------
Untreated | 11 249 | 260
Treated | 2 183 | 185
-----------+----------------------+----------
Total | 13 432 | 445
下面用pstest查看匹配效果是否较好的平衡了数据
. pstest age edu black hisp married re74 re75 u74 u75, both graph
--------------------------------------------------------------------------------------
> --
Unmatched | Mean %reduct | t-test | V(T)/
Variable Matched | Treated Control %bias |bias| | t p>|t| | V(C)
--------------------------+----------------------------------+---------------+--------
> --
age U | 25.816 25.054 10.7 | 1.12 0.265 | 1.03
M | 25.781 25.383 5.6 47.7 | 0.52 0.604 | 0.91
| | |
educ U | 10.346 10.088 14.1 | 1.50 0.135 | 1.55*
M | 10.322 10.415 -5.1 63.9 | -0.49 0.627 | 1.52*
| | |
black U | .84324 .82692 4.4 | 0.45 0.649 | .
M | .85246 .86339 -2.9 33.0 | -0.30 0.765 | .
| | |
hisp U | .05946 .10769 -17.5 | -1.78 0.076 | .
M | .06011 .04372 5.9 66.0 | 0.71 0.481 | .
| | |
married U | .18919 .15385 9.4 | 0.98 0.327 | .
M | .18579 .19126 -1.4 84.5 | -0.13 0.894 | .
| | |
re74 U | 2.0956 2.107 -0.2 | -0.02 0.982 | 0.74*
M | 2.0672 1.9222 2.7 -1166.6 | 0.27 0.784 | 0.88
| | |
re75 U | 1.5321 1.2669 8.4 | 0.87 0.382 | 1.08
M | 1.5299 1.6446 -3.6 56.7 | -0.32 0.748 | 0.82
| | |
u74 U | .70811 .75 -9.4 | -0.98 0.326 | .
M | .71038 .75956 -11.1 -17.4 | -1.06 0.288 | .
| | |
u75 U | .6 .68462 -17.7 | -1.85 0.065 | .
M | .60656 .63388 -5.7 67.7 | -0.54 0.591 | .
| | |
--------------------------------------------------------------------------------------
> --
* if variance ratio outside [0.75; 1.34] for U and [0.75; 1.34] for M
-----------------------------------------------------------------------------------
Sample | Ps R2 LR chi2 p>chi2 MeanBias MedBias B R %Var
-----------+-----------------------------------------------------------------------
Unmatched | 0.019 11.75 0.227 10.2 9.4 33.1* 0.82 50
Matched | 0.008 3.87 0.920 4.9 5.1 20.6 1.09 25
-----------------------------------------------------------------------------------
* if B>25%, R outside [0.5; 2]
psgraph