其他
“众里寻他千百度”——工具变量法(IV)的Stata操作
数据说明
Replication Data for: Ting Chen, James Kai-sing Kung, Chicheng Ma. Long Live Keju ! The Persistent Effects of China’s Civil Examination System[J]. The Economic Journal, 2020, 130 (631) : 2030–2064.
识别策略
lneduyear
lnjinshipop
bprvdist
lnnightlight
、到海岸线的距离lncoastdist
、地形崎岖程度tri
、农业产量suitability
、人口密度lnpopdensity
、城市化率urbanrates
和省级固定效应Stata操作
ivregress
命令和ivreg2
命令。ivregress
命令
ivregress
命令是Stata自带的命令,支持两阶段最小二乘(2SLS)、广义矩估计(GMM)和有限信息最大似然估计(LIML)三种工具变量估计方法,我们最常使用的是两阶段最小二乘法(2SLS),因为2SLS最能体现工具变量的实质,并且在球形扰动项的情况下,2SLS是最有效率的工具变量法。ivregress
后面加上2sls
即可,然后将内生解释变量lnjinshipop
和工具变量bprvdist
放在一个小括号中,用=
号连接。选项first
表示报告第一阶段回归结果,选项cluster()
表示使用聚类稳健的标准误。ivregress 2sls lneduyear (lnjinshipop=bprvdist) lnnightlight lncoastdist tri suitability lnpopdensity urbanrates i.provid , first cluster(provid)
第一阶段回归结果
First-stage regressions
-----------------------
Number of obs = 274
No. of clusters = 28
F( 7, 239) = 85.27
Prob > F = 0.0000
R-squared = 0.6487
Adj R-squared = 0.5988
Root MSE = 0.4442
------------------------------------------------------------------------------
| Robust
lnjinshipop | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnnightlight | .183385 .0682506 2.69 0.008 .0489354 .3178346
lncoastdist | .0350333 .077158 0.45 0.650 -.1169634 .1870299
tri | 1.06676 .5637082 1.89 0.060 -.0437105 2.177231
suitability | -.0769726 .0549697 -1.40 0.163 -.1852596 .0313144
lnpopdensity | .196144 .0843727 2.32 0.021 .0299349 .3623532
urbanrates | 3.352916 1.687109 1.99 0.048 .029414 6.676419
|
provid |
12 | .2051006 .0551604 3.72 0.000 .096438 .3137632
13 | -1.890425 .0951146 -19.88 0.000 -2.077795 -1.703055
......
64 | -1.301895 .1581021 -8.23 0.000 -1.613346 -.9904433
|
bprvdist | -.0846917 .0107859 -7.85 0.000 -.1059393 -.0634441
_cons | 2.126233 .9791046 2.17 0.031 .1974567 4.05501
------------------------------------------------------------------------------
bprvdist
的系数为-0.085,标准误为0.011,在1%的水平上显著。第二阶段回归结果
Instrumental variables (2SLS) regression Number of obs = 274
Wald chi2(34) = 13.62
Prob > chi2 = 0.9993
R-squared = 0.7874
Root MSE = .05434
(Std. Err. adjusted for 28 clusters in provid)
------------------------------------------------------------------------------
| Robust
lneduyear | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnjinshipop | .0834221 .0116773 7.14 0.000 .060535 .1063092
lnnightlight | .0592777 .0102138 5.80 0.000 .0392589 .0792964
lncoastdist | .0093399 .0118791 0.79 0.432 -.0139426 .0326224
tri | -.1044346 .063917 -1.63 0.102 -.2297097 .0208404
suitability | -.0008944 .0125824 -0.07 0.943 -.0255555 .0237666
lnpopdensity | -.0472648 .0115102 -4.11 0.000 -.0698245 -.0247051
urbanrates | -.0353689 .1784687 -0.20 0.843 -.3851612 .3144233
|
provid |
12 | -.10732 .0088755 -12.09 0.000 -.1247157 -.0899243
13 | .0244675 .0276005 0.89 0.375 -.0296284 .0785634
......
64 | -.0740796 .0350354 -2.11 0.034 -.1427477 -.0054116
|
_cons | 2.014146 .1490819 13.51 0.000 1.721951 2.306341
------------------------------------------------------------------------------
lnjinshipop
的系数为0.083,标准误为0.012,在1%的水平上显著。弱工具变量检验
estat firststage
命令对第一阶段的结果进行分析,以判断是否存在弱工具变量问题。. estat firststage,forcenonrobust
First-stage regression summary statistics
--------------------------------------------------------------------------
| Adjusted Partial Robust
Variable | R-sq. R-sq. R-sq. F(1,27) Prob > F
-------------+------------------------------------------------------------
lnjinshipop | 0.6487 0.5988 0.3276 61.914 0.0000
--------------------------------------------------------------------------
(F statistic adjusted for 28 clusters in provid)
Minimum eigenvalue statistic = 116.419
Critical Values # of endogenous regressors: 1
Ho: Instruments are weak # of excluded instruments: 1
---------------------------------------------------------------------
| 5% 10% 20% 30%
2SLS relative bias | (not available)
-----------------------------------+---------------------------------
| 10% 15% 20% 25%
2SLS Size of nominal 5% Wald test | 16.38 8.96 6.66 5.53
LIML Size of nominal 5% Wald test | 16.38 8.96 6.66 5.53
---------------------------------------------------------------------
bprvdist
对内生变量lnjinshipop
有很强的解释力度。F统计量=61.914>10,根据经验准则可以判断,我们的工具变量不是一个弱工具变量。除此之外,estat firststage
命令还会报告一个最小特征值统计量(Minimum eigenvalue statistic
),一般大于2SLS Size of nominal 5% Wald test
中10%对应的临界值就表明不存在弱工具变量问题。ivreg2
命令
ivreg2
命令是对ivregress
命令的改进和优化,功能更加强大,支持的估计方法更多(默认使用2SLS),并且会直接报告工具变量的几个统计检验结果,陈婷、龚启圣和马驰骋三位老师使用的就是ivreg2
命令。ivreg2
命令是一个外部命令,所以使用之前需要安装(ssc install ivreg2
)。它的语法格式和ivregress
基本一致:ivreg2 lneduyear (lnjinshipop=bprvdist) lnnightlight lncoastdist tri suitability lnpopdensity urbanrates i.provid , first cluster(provid)
第一阶段回归结果
First-stage regression of lnjinshipop:
Statistics robust to heteroskedasticity and clustering on provid
Number of obs = 274
Number of clusters (provid) = 28
------------------------------------------------------------------------------
| Robust
lnjinshipop | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
bprvdist | -.0846917 .0107633 -7.87 0.000 -.1058948 -.0634886
lnnightlight | .183385 .0681077 2.69 0.008 .0492169 .3175531
lncoastdist | .0350333 .0769964 0.45 0.650 -.116645 .1867116
tri | 1.06676 .5625276 1.90 0.059 -.0413849 2.174906
suitability | -.0769726 .0548546 -1.40 0.162 -.1850328 .0310876
lnpopdensity | .196144 .084196 2.33 0.021 .030283 .3620051
urbanrates | 3.352916 1.683576 1.99 0.048 .0363742 6.669459
|
provid |
12 | .2051006 .0550449 3.73 0.000 .0966656 .3135357
13 | -1.890425 .0949154 -19.92 0.000 -2.077402 -1.703447
......
64 | -1.301895 .157771 -8.25 0.000 -1.612694 -.9910956
|
_cons | 2.126233 .9770541 2.18 0.031 .201496 4.050971
------------------------------------------------------------------------------
第二阶段回归结果
IV (2SLS) estimation
--------------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on provid
Number of clusters (provid) = 28 Number of obs = 274
F( 34, 27) = 0.34
Prob > F = 0.9984
Total (centered) SS = 3.805186838 Centered R2 = 0.7874
Total (uncentered) SS = 1282.736705 Uncentered R2 = 0.9994
Residual SS = .8089841945 Root MSE = .05434
------------------------------------------------------------------------------
| Robust
lneduyear | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lnjinshipop | .0834221 .0116773 7.14 0.000 .060535 .1063092
lnnightlight | .0592777 .0102138 5.80 0.000 .0392589 .0792964
lncoastdist | .0093399 .0118791 0.79 0.432 -.0139426 .0326224
tri | -.1044346 .063917 -1.63 0.102 -.2297097 .0208404
suitability | -.0008944 .0125824 -0.07 0.943 -.0255555 .0237666
lnpopdensity | -.0472648 .0115102 -4.11 0.000 -.0698245 -.0247051
urbanrates | -.0353689 .1784687 -0.20 0.843 -.3851612 .3144233
|
provid |
12 | -.10732 .0088755 -12.09 0.000 -.1247157 -.0899243
13 | .0244675 .0276005 0.89 0.375 -.0296284 .0785634
......
64 | -.0740796 .0350354 -2.11 0.034 -.1427477 -.0054116
|
_cons | 2.014146 .1490819 13.51 0.000 1.721951 2.306341
------------------------------------------------------------------------------
弱工具变量检验结果
ivreg2
命令会直接报告Cragg-Donald Wald F
统计量和Kleibergen-Paap Wald rk F
统计量两个用于弱工具变量检验的统计量,其中Cragg-Donald Wald F
统计量假设扰动项iid
(独立同分布),而Kleibergen-Paap Wald rk F
统计量没有做扰动项iid
的假设。需要注意的是,这里的原假设是所选工具变量是弱工具变量,当两个F统计量大于Stock-Yogo weak ID test critical values
中10%偏误的临界值时,可以拒绝原假设,认为不存在弱工具变量问题。Weak identification test
Ho: equation is weakly identified
Cragg-Donald Wald F statistic 116.42
Kleibergen-Paap Wald rk F statistic 61.91
Stock-Yogo weak ID test critical values for K1=1 and L1=1:
10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
需要本篇推文使用的数据和代码的朋友,请在后台对话框内回复关键词“IV”。
xtivreg
和xtivreg2
两个命令,语法格式与ivreg2
基本类似,大家有兴趣的话可以去看一下这两个命令的help文件。