倾向得分匹配法(PSM)举例及 stata 实现
倾向得分匹配法(PSM)举例及 stata 实现
国家支持工作示范项目(National Supported Work,NSW)
分析接受培训组(处理组,treatment group)接受培训行为与不接受培训行为在工资表现上的差异。但是,现实可以观测到的是处理组接受培训的事实,而处理组没有接受培训会怎样是不可能观测到的,这种状态也成为反事实(counterfactual)。匹配法就是为了解决这种不可观测事实的方法。在倾向得分匹配方法(Propensity Score Matching)中,根据处理指示变量将样本分为两个组,一是处理组,在本例中就是在 NSW 实施后接受培训的组;二是对照组(comparison group),
在本例中就是在 NSW 实施后不接受培训的组。
. desc
Contains data from C:\Users\Metrics\Desktop\计量经济学\高级\A15-psm\data\ldw_exper.dta
obs: 445
vars: 12 30 Jan 2013 12:47
size: 12,015
storage display value
variable name type format label variable label
t byte %8.0g participation in job training program
age byte %8.0g age
educ byte %8.0g years of education
black byte %8.0g indicator for African-American
hisp byte %8.0g indicator for Hispanic
married byte %8.0g indicator for married
nodegree byte %8.0g indicator for more than grade school but
less than high-school education
re74 float %9.0g real earnings in 1974 (in thousands of
1978 $)
re75 float %9.0g real earnings in 1975 (in thousands of
1978 $)
re78 float %9.0g real earnings in 1978 (in thousands of
1978 $)
u74 float %9.0g indicator for unemployed in 1974
u75 float %9.0g indicator for unemployed in 1975
Sorted by:
bysort t :sum age educ nodegree black hisp married u74 u75
-> t = 0
Variable | Obs Mean Std. Dev. Min Max
age | 260 25.05385 7.057745 17 55
educ | 260 10.08846 1.614325 3 14
nodegree | 260 .8346154 .3722439 0 1
black | 260 .8269231 .3790434 0 1
hisp | 260 .1076923 .3105893 0 1
married | 260 .1538462 .3614971 0 1
u74 | 260 .75 .4338478 0 1
u75 | 260 .6846154 .4655651 0 1
-> t = 1
Variable | Obs Mean Std. Dev. Min Max
age | 185 25.81622 7.155019 17 48
educ | 185 10.34595 2.01065 4 16
nodegree | 185 .7081081 .4558666 0 1
black | 185 .8432432 .3645579 0 1
hisp | 185 .0594595 .2371244 0 1
married | 185 .1891892 .3927217 0 1
u74 | 185 .7081081 .4558666 0 1
u75 | 185 .6 .4912274 0 1
tabulate t, summarize(re78) means standard
. tabulate t, summarize(re78) means standard
participati | Summary of real
on in job | earnings in 1978 (in
training | thousands of 1978 $)
program | Mean Std. Dev.
0 | 4.5548023 5.4838368
1 | 6.3491454 7.8674047
Total | 5.3007651 6.6314934
set seed 20180105 //产生随机数种子
gen u=runiform()
sort u //排序
或者order u
local v1 "t"
local v2 "age edu black hisp married re74 re75 u74 u75"
global x "`v1' `v2' "
psmatch2 $x, out(re78) neighbor(1) ate ties logit common // 1:1 匹配
psmatch2 $x, out(re78) neighbor(1) ate ties logit common // 1:1 匹
psmatch2 t age edu black hisp married re74 re75 u74 u75, out(re78) neighbor(1) ate ties logit common
psmatch2 $x, out(re78) neighbor(1) ate ties logit common
Logistic regression Number of obs = 445
LR chi2(9) = 11.70
Prob > chi2 = 0.2308
Log likelihood = -296.25026 Pseudo R2 = 0.0194
t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
age | .0142619 .0142116 1.00 0.316 -.0135923 .0421162
educ | .0499776 .0564116 0.89 0.376 -.060587 .1605423
black | -.347664 .3606532 -0.96 0.335 -1.054531 .3592032
hisp | -.928485 .50661 -1.83 0.067 -1.921422 .0644523
married | .1760431 .2748817 0.64 0.522 -.3627151 .7148012
re74 | -.0339278 .0292559 -1.16 0.246 -.0912683 .0234127
re75 | .01221 .0471351 0.26 0.796 -.0801731 .1045932
u74 | -.1516037 .3716369 -0.41 0.683 -.8799987 .5767913
u75 | -.3719486 .317728 -1.17 0.242 -.9946841 .2507869
_cons | -.4736308 .8244205 -0.57 0.566 -2.089465 1.142204
There are observations with identical propensity score values.
The sort order of the data could affect your results.
Make sure that the sort order is random before calling psmatch2.
> --
Variable Sample | Treated Controls Difference S.E. T-st
> at
> --
re78 Unmatched | 6.34914538 4.55480228 1.79434311 .632853552 2.
> 84
ATT | 6.40495818 4.99436488 1.4105933 .839875971 1.
> 68
ATU | 4.52683013 6.15618973 1.6293596 .
> .
ATE | 1.53668776 .
> .
> --
Note: S.E. does not take into account that the propensity score is estimated.
psmatch2: | psmatch2: Common
Treatment | support
assignment | Off suppo On suppor | Total
Untreated | 11 249 | 260
Treated | 2 183 | 185
Total | 13 432 | 445
. pstest age edu black hisp married re74 re75 u74 u75, both graph
> --
Unmatched | Mean %reduct | t-test | V(T)/
Variable Matched | Treated Control %bias |bias| | t p>|t| | V(C)
> --
age U | 25.816 25.054 10.7 | 1.12 0.265 | 1.03
M | 25.781 25.383 5.6 47.7 | 0.52 0.604 | 0.91
| | |
educ U | 10.346 10.088 14.1 | 1.50 0.135 | 1.55*
M | 10.322 10.415 -5.1 63.9 | -0.49 0.627 | 1.52*
| | |
black U | .84324 .82692 4.4 | 0.45 0.649 | .
M | .85246 .86339 -2.9 33.0 | -0.30 0.765 | .
| | |
hisp U | .05946 .10769 -17.5 | -1.78 0.076 | .
M | .06011 .04372 5.9 66.0 | 0.71 0.481 | .
| | |
married U | .18919 .15385 9.4 | 0.98 0.327 | .
M | .18579 .19126 -1.4 84.5 | -0.13 0.894 | .
| | |
re74 U | 2.0956 2.107 -0.2 | -0.02 0.982 | 0.74*
M | 2.0672 1.9222 2.7 -1166.6 | 0.27 0.784 | 0.88
| | |
re75 U | 1.5321 1.2669 8.4 | 0.87 0.382 | 1.08
M | 1.5299 1.6446 -3.6 56.7 | -0.32 0.748 | 0.82
| | |
u74 U | .70811 .75 -9.4 | -0.98 0.326 | .
M | .71038 .75956 -11.1 -17.4 | -1.06 0.288 | .
| | |
u75 U | .6 .68462 -17.7 | -1.85 0.065 | .
M | .60656 .63388 -5.7 67.7 | -0.54 0.591 | .
| | |
> --
* if variance ratio outside [0.75; 1.34] for U and [0.75; 1.34] for M
Sample | Ps R2 LR chi2 p>chi2 MeanBias MedBias B R %Var
Unmatched | 0.019 11.75 0.227 10.2 9.4 33.1* 0.82 50
Matched | 0.008 3.87 0.920 4.9 5.1 20.6 1.09 25
* if B>25%, R outside [0.5; 2]