查看原文
其他

必须使用所有外生变量作为工具变量吗?

计量经济圈 计量经济圈 2021-10-23


凡是搞计量经济的,都关注这个号了

邮箱:econometrics666@sina.cn

所有计量经济圈方法论丛的程序文件, 微观数据库和各种软件都放在社群里.欢迎到计量经济圈社群交流访问

必须使用所有外生变量作为工具变量吗?

Someone posed the following question:

I am estimating an equation:

Y = a + bX + cZ + dW

I then want to instrument W with Q. I know the first-stage regression is supposed to be

W = e + fX + gZ + hQ

(i.e., use all the exogenous variables in the first stage). Actually this is automatically done if I use the ivregress command. However, I only want to use Q to instrument W without using X and Z in the first stage. Is there a way I can do it in Stata? I can regress W on Q and get the predicted W, and then use it in the second-stage regression. The standard errors will, however, be incorrect.


ivregress程序一般是自动包括所有外生变量,且在2sls过程中把标准误都修正好了,因此咱们在工具变量回归过程中都是把方程中所有外生变量放到这个系统里的。

ivregress will not let you do this and, moreover, if you believe W to be endogenous because it is part of a system, then you must include X and Z as instruments, or you will get biased estimates for b, c, and d.


下面举一个简单的例子,一个联立方程组,其中X1,X2,X3,X4是外生变量,但Y1是内生于Y2这个估计式的。在回归中,是否需要把X1和X2放到Y2的工具变量方程中去呢?当然是需要的,因为联立方程组可以转换为简约形式,转化后的(2r)里都出现了X1,X2,X3,X4,因此Y2是与X1和X2相关的。一旦在Y2的工具变量回归中遗漏掉X1和X2,就会造成估计的估计的偏误。


Consider the system

Y1 = a0 + a1*Y2 + a2*X1 + a3*X2 + e1 (1)

Y2 = b0 + b1*Y1 + b2*X3 + b3*X4 + e2 (2)

Warning: Assume we are estimating structural equation (1); if X1 and X2 are exogenous, then they must be kept as instruments or your estimates will be biased. In a general system, such exogenous variables must be used as instruments for any endogenous variables when the instrumented value for the endogenous variables appears in an equation in which the exogenous variable also appears.


Consider the reduced forms of your two equations:

Y1 = e0 + e1*X1 + e2*X2 + e3*X3 + e4*x4 + u1 (1r)

Y2 = f0 + f1*X1 + f2*X2 + f3*X3 + f4*x4 + u2 (2r)

where e# and f# are combinations of the a# and b# coefficients from (1) and (2) and u1 and u2 are linear combinations of e1 and e2.


All exogenous variables appear in each equation for an endogenous variable. This is the nature of simultaneous systems, so efficiency argues that all exogenous variables be included as instruments for each endogenous variable.


Here is the real problem. Take (1): the reduced-form equation for Y2, (2r), clearly shows that Y2 is correlated with X2 (by the coefficient f2). If we do not include X2 among the instruments for Y2, then we will have failed to account for the correlation of Y2 with X2 in its instrumented values. Since we did not account for this correlation, when we estimate (1) with the instrumented values for Y2, the coefficient a3 will be forced to account for this correlation. This approach will lead to biased estimates of both a1 and a3.


For a brief reference, see Baltagi (2011). See the whole discussion of 2SLS, particularly the paragraph after equation 11.40, on page 265. (I have no idea why this issue is not emphasized in more books.)


Failing to include X4 affects only efficiency and not bias.


**以下情况不需要添加所有外生变量X1和X2,即这是一个triangular方程组(相当于Y1不影响Y2),具体可以参考联立方程模型是什么, 又如何识别和估计?,里面有较为详细的介绍。这种形式,可以直接通过“间接最小二乘法”求出联立方程组里的每个系数。

However, there is one case where it is not necessary to include X1 and X2 as instruments for Y2. That is when the system is triangular such that Y2 does not depend on Y1, but you believe it is weakly endogenous because the disturbances are correlated between the equations. You are still consistent here to do what ivregress does and retain X1 and X2 as instruments. They are, however, no longer required. Then you could do what you suggested and just regress on the predicted instruments from the first stage.


此时,可以通过二步法:①是用外生变量对内生变量进行估计,然后求出内生变量的估计值,②用内生变量估计值和原来的外生变量对Y进行估计。不过,此时需要对方差和协方差进行修正。


If you do use this method of indirect least squares, you will have to perform the adjustment to the covariance matrix yourself. Consider the structural equation

y1 = y2 + x1 + e

where you have an instrument z1 and you do not think that y2 is a function of y1.


The following example uses only z1 as an instrument for y2. Let’s begin by creating a dataset (containing made-up data) on y1y2x1, and z1:

. sysuse auto (1978 Automobile Data) 

rename price y1 

 . rename mpg y2 

 . rename displacement z1 

 . rename turn x1


Now we perform the first-stage regression and get predictions for the instrumented variable, which we must do for each endogenous right-hand-side variable.

. regress y2 z1


y2
Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]



z1
-.0444536   .0052606    -8.45   0.000    -.0549405   -.0339668
_cons
30.06788   1.143462    26.30   0.000     27.78843    32.34733

. predict double y2hat (option xb assumed; fitted values)  

* perform IV regression 

. regress y1 y2hat x1


y1
Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]



y2hat
-463.4688    117.187    -3.95   0.000    -697.1329   -229.8046
x1
-126.4979   108.7468    -1.16   0.249    -343.3328    90.33697
_cons
21051.36   6451.837     3.26   0.002     8186.762    33915.96

Now we correct the variance–covariance by applying the correct mean squared error:

. rename y2hat y2hold 

 . rename y2 y2hat 

 . predict double res, residual 

 . rename y2hat y2                      /* put back real y2 */ 

 . rename y2hold y2hat   

 . replace res = res^2   (74 real changes made) 

. summarize res

Variable
Obs        Mean    Std. Dev.       Min        Max



res
74     7553657    1.43e+07   117.4375   1.06e+08

. scalar realmse = r(mean)*r(N)/e(df_r)                                  /* much ado about small sample */ 

 . matrix bmatrix = e(b) 

 . matrix Vmatrix = e(V) 

 . matrix Vmatrix = e(V) * realmse / e(rmse)^2

 . ereturn post bmatrix Vmatrix, noclear 

 . ereturn display




Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]



y2hat
-463.4688   127.7267    -3.63   0.001    -718.1485    -208.789
x1
-126.4979   118.5274    -1.07   0.289    -362.8348    109.8389
_cons
21051.36   7032.111     2.99   0.004      7029.73    35072.99

Reference

  • Baltagi, B. H. 2011.

  • Econometrics. New York: Springer.

2年,计量经济圈公众号近1000篇文章,

Econometrics Circle

数据系列:空间矩阵 | 工企数据 | PM2.5 | 市场化指数 | CO2数据 |  夜间灯光 

计量系列:匹配方法 | 内生性 | 工具变量 | DID | 面板数据 | 常用TOOL | 中介调节  | 时间序列

干货系列:能源环境 | 效率研究 | 空间计量 | 国际经贸 | 计量软件 | 商科研究 | 机器学习 | SSCI | CSSCI

计量经济圈组织了一个计量社群,有如下特征:热情互助最多、前沿趋势最多、社科资料最多、社科数据最多、科研牛人最多、海外名校最多。因此,建议积极进取和有强烈研习激情的中青年学者到社群交流探讨,始终坚信优秀是通过感染优秀而互相成就彼此的。

: . Video Mini Program Like ,轻点两下取消赞 Wow ,轻点两下取消在看

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存