合成控制法(SCM)安慰剂检验及可视化操作(synth2)
来源:SYNTH2: Stata module to implement synthetic control method (SCM) with placebo tests, robustness test and visualization
合成控制法简介:
合成控制法可视为DID方法的延伸,即考察某一政策冲击对个体影响的效应分析,此时DID方法便失效了,经济学研究中评估某项政策或事件的效应时,使政策可能实施于某个国家,地区或城市。较为简单的方法是考察政策实施前后的时间序列,看所关心的结果如何变化。但结果还可能受其自身时间趋势的影响和其他事件带来的综合影响,为此,常使用“鲁宾的反事实框架”。即假设该地区如果未受到政策干预将会如何。这一方法的困难之处就在于,这一假设是无法观测的。
鉴于反事实无法观测,通常的解决办法是寻找适当的控制组,即在各方面都与受干预地区相似却未受干预的其他地区,以作为处理组的反事实替身。但现实情况中这种控制组的寻找也是相当困难的。为了解决控制组选择的难题,Abadie和Gardeazabal(2003)出了合成控制法”,基本思想为,假设考察仅在北京实施的政策的效果评估,可能可以选择上海,广州,深圳等地,但是实际上他们与北京也不尽相同。虽然无法找到最佳控制地区,但通常可以对中国的若千大城市进行适当的线性拟合,以构造一个“合成控制地区”,并将真实北京”和合成北京”进行对比。
详情阅读:
Stata&一文读懂合成控制法操作及应用(附:安慰剂检验完整代码)
synth2
作为synth实现合成控制方法(SCM)的包装程序,synth2提供了方便的实用程序来自动化使用假治疗单元的空间安慰剂测试、使用假治疗时间的即时安慰剂测试和一次排除一个非零重量控制单元的遗漏鲁棒性测试。Synth2生成一系列用于可视化的图形。命令synth(可从统计软件组件获得)是必需的。
语法格式为:
synth2 depvar indepvars, trunit(#) trperiod(#) [options]
选项含义为:
ctrlunit(numlist)用于捐助池的Ctrlunit (numlist)控制单元
preperiod(numlist) 干预发生前的预处理阶段
postperiod(numlist) 干预发生时及之后的治疗后时期
Xperiod (numlist) indepvar中指定的预测器的平均周期
mspeperiod(numlist)均方预测误差(MSPE)应最小化的周期
customV(numlist)提供自定义V-Weights,确定变量在预处理期间对结果的预测能力
nested 在所有(对角)正半定v矩阵和w权集之间进行搜索的嵌套全嵌套优化过程
allopt 如果指定了嵌套,将获得完全健壮的结果
placebo([{unit|unit(numlist)} period(numlist) cutoff(#_c)]) 使用假治疗单位的空间安慰剂试验和/或使用假治疗时间的时间安慰剂试验
操作应用
. use "C:\Users\Metrics\Desktop\smoking.dta"
(Tobacco Sales in 39 US States)
. xtset state year
Panel variable: state (strongly balanced)
Time variable: year, 1970 to 2000
Delta: 1 unit
使用假治疗时间1985和删除协变量cigsale(1988)进行及时安慰剂试验
. synth2 cigsale lnincome age15to24 retprice beer cigsale(1980) cigs
> ale(1975), trunit(3) trperiod(1989) xperiod(1980(1)1984) nested pla
> cebo(period(1985))
Fitting results in the pre-treatment periods:
--------------------------------------------------------------------------------
Treated Unit : California Treatment Time : 1989
--------------------------------------------------------------------------------
Mean Absolute Error = 1.51510 Number of Control Units = 38
Mean Squared Error = 4.86334 Number of Covariates = 6
Root Mean Squared Error = 2.20530 R-squared = 0.95253
--------------------------------------------------------------------------------
Predictor balance in the pre-treatment periods:
--------------------------------------------------------------------------------
Covariate | V.weight Treated Synthetic Control Average Control
| Value Bias Value Bias
---------------+----------------------------------------------------------------
lnincome | 0.0000 10.0372 9.8639 -1.73% 9.7892 -2.47%
age15to24 | 0.0000 0.1815 0.1825 0.55% 0.1814 -0.06%
retprice | 0.0384 76.2200 76.1523 -0.09% 71.8353 -5.75%
beer | 0.0000 25.0000 23.0089 -7.96% 23.6947 -5.22%
cigsale(1980) | 0.9597 120.2000 120.0846 -0.10% 138.0895 14.88%
cigsale(1975) | 0.0019 127.1000 126.8324 -0.21% 136.9316 7.74%
--------------------------------------------------------------------------------
Note: "V.weight" is the optimal covariate weight in the diagonal of V matrix.
"Synthetic Control" is the weighted average of control units in the donor pool with
optimal weights.
"Average Control" is the simple average of control units in the donor pool with equal
weights.
Optimal Unit Weights:
--------------------------
Unit | U.weight
-------------+------------
Utah | 0.3600
Nevada | 0.2880
Connecticut | 0.1990
Colorado | 0.1020
NewMexico | 0.0500
--------------------------
Note: The unit Alabama Arkansas Delaware Georgia Idaho Illinois Indiana Iowa Kansas Kentucky
Louisiana Maine Minnesota Mississippi Missouri Montana Nebraska NewHampshire
NorthCarolina NorthDakota Ohio Oklahoma Pennsylvania RhodeIsland SouthCarolina
SouthDakota Tennessee Texas Vermont Virginia WestVirginia Wisconsin Wyoming in the
donor pool get a weight of 0.
Prediction results in the post-treatment periods:
-----------------------------------------------------------
Time | Actual Outcome Predicted Outcome Treatment Effect
------+----------------------------------------------------
1989 | 82.4000 93.0322 -10.6322
1990 | 77.8000 89.4297 -11.6297
1991 | 68.7000 82.4727 -13.7727
1992 | 67.5000 80.6731 -13.1731
1993 | 63.4000 79.5929 -16.1929
1994 | 58.6000 78.1272 -19.5272
1995 | 56.4000 75.6207 -19.2207
1996 | 54.5000 74.8372 -20.3372
1997 | 53.8000 74.5395 -20.7395
1998 | 52.3000 71.1561 -18.8561
1999 | 47.2000 71.4380 -24.2380
2000 | 41.6000 65.8382 -24.2382
------+----------------------------------------------------
Mean | 60.3500 78.0631 -17.7131
-----------------------------------------------------------
Note: The average treatment effect over the post-treatment periods is -17.7131.
Implementing placebo test using fake treatment time 1985...
Placebo test results using fake treatment time 1985:
-----------------------------------------------------------
Time | Actual Outcome Predicted Outcome Treatment Effect
------+----------------------------------------------------
1985 | 102.8000 106.1262 -3.3262
1986 | 99.7000 103.2850 -3.5850
1987 | 97.5000 106.1524 -8.6524
1988 | 90.1000 98.4873 -8.3873
1989 | 82.4000 96.5237 -14.1237
1990 | 77.8000 91.9127 -14.1127
1991 | 68.7000 83.7156 -15.0156
1992 | 67.5000 81.4730 -13.9730
1993 | 63.4000 79.7911 -16.3911
1994 | 58.6000 77.9078 -19.3078
1995 | 56.4000 76.2193 -19.8193
1996 | 54.5000 75.2010 -20.7010
1997 | 53.8000 75.1958 -21.3958
1998 | 52.3000 71.9437 -19.6437
1999 | 47.2000 72.2260 -25.0260
2000 | 41.6000 67.1861 -25.5861
------+----------------------------------------------------
Mean | 69.6437 85.2092 -15.5654
-----------------------------------------------------------
Note: The average treatment effect over the post-treatment periods is -15.5654.
Finished.
结果图为:
*实现leave-one-out稳健性测试,并创建一个Stata数据框“california”存储生成的变量
未完待续!