Stata:hausman检验
Stata:hausman检验
hausman命令进行hausman(1978)检验。
1、Hausman检验对存储模型的一致性和有效性
hausman consistent efficient
2、如上所述,但是比较固定效应和随机效应的线性回归模型
hausman fixed random, sigmamore
3、内生性检验后ivprobit和probit与估计存储在iv和noiv
hausman iv noiv, equations(1:1)
4、所有方案均为全的模型和有遗漏方案的模型的无关方案独立性检验
hausman omitted all, alleqs constant
案例介绍
hausman是hausman(1978)检验的一般实现,该检验将已知与被检验假设下有效的估计量θb1相一致的估计量与θb2相比较。零假设是估计量θb2确实是真参数的有效(且一致)估计量。如果是这样的话,这两个估计之间应该没有系统的区别。如果在估计中存在系统性的差异,你就有理由怀疑有效估计所基于的假设。
我们正在研究1968年到1988年间影响美国年轻女性工资的因素,我们有一个在这段时间内单个女性的面板数据样本。
use https://www.stata-press.com/data/r17/nlswork4
. describe
结果为:
use "C:\Users\Metrics\Desktop\nlswork4.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)
. desc
Contains data from C:\Users\Metrics\Desktop\nlswork4.dta
Observations: 28,534 National Longitudinal Survey of Young Women, 14-24 years old in 1968
Variables: 6 29 Jan 2020 16:35
(_dta has notes)
-----------------------------------------------------------------------------------------------------------------------------
Variable Storage Display Value
name type format label Variable label
-----------------------------------------------------------------------------------------------------------------------------
idcode int %8.0g NLS ID
year byte %8.0g Interview year
age byte %8.0g Age in current year
msp byte %8.0g 1 if married, spouse present
ttl_exp float %9.0g Total work experience
ln_wage float %9.0g ln(wage/GNP deflator)
-----------------------------------------------------------------------------------------------------------------------------
Sorted by: idcode year
.
我们认为,在我们的模型中,随机效应规范适用于个体水平的效应。
我们拟合了一个固定效应模型,该模型将捕获所有时间上恒定的个人层面效应。
xtreg ln_wage age msp ttl_exp, fe
结果为:
xtreg ln_wage age msp ttl_exp, fe
Fixed-effects (within) regression Number of obs = 28,494
Group variable: idcode Number of groups = 4,710
R-squared: Obs per group:
Within = 0.1373 min = 1
Between = 0.2571 avg = 6.0
Overall = 0.1800 max = 15
F(3,23781) = 1262.01
corr(u_i, Xb) = 0.1476 Prob > F = 0.0000
------------------------------------------------------------------------------
ln_wage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
age | -.005485 .000837 -6.55 0.000 -.0071256 -.0038443
msp | .0033427 .0054868 0.61 0.542 -.0074118 .0140971
ttl_exp | .0383604 .0012416 30.90 0.000 .0359268 .0407941
_cons | 1.593953 .0177538 89.78 0.000 1.559154 1.628752
-------------+----------------------------------------------------------------
sigma_u | .37674223
sigma_e | .29751014
rho | .61591044 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23781) = 7.76 Prob > F = 0.0000
.
我们假设该模型对真实参数是一致的,并使用estimates store存储在一个名称下,
现在,我们拟合一个随机效应模型,作为个体效应的充分有效规范,假设它们是随机的,遵循正态分布。然后我们使用hausman命令将这些估计值与之前存储的结果进行比较。
xtreg ln_wage age msp ttl_exp, re
hausman fixed ., sigmamore
结果为:
. xtreg ln_wage age msp ttl_exp, re
Random-effects GLS regression Number of obs = 28,494
Group variable: idcode Number of groups = 4,710
R-squared: Obs per group:
Within = 0.1373 min = 1
Between = 0.2552 avg = 6.0
Overall = 0.1797 max = 15
Wald chi2(3) = 5100.33
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
ln_wage | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
age | -.0069749 .0006882 -10.13 0.000 -.0083238 -.0056259
msp | .0046594 .0051012 0.91 0.361 -.0053387 .0146575
ttl_exp | .0429635 .0010169 42.25 0.000 .0409704 .0449567
_cons | 1.609916 .0159176 101.14 0.000 1.578718 1.641114
-------------+----------------------------------------------------------------
sigma_u | .32648519
sigma_e | .29751014
rho | .54633481 (fraction of variance due to u_i)
------------------------------------------------------------------------------
. hausman fixed ., sigmamore
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| fixed . Difference Std. err.
-------------+----------------------------------------------------------------
age | -.005485 -.0069749 .0014899 .0004803
msp | .0033427 .0046594 -.0013167 .0020596
ttl_exp | .0383604 .0429635 -.0046031 .0007181
------------------------------------------------------------------------------
b = Consistent under H0 and Ha; obtained from xtreg.
B = Inconsistent under Ha, efficient under H0; obtained from xtreg.
Test of H0: Difference in coefficients not systematic
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 260.40
Prob > chi2 = 0.0000
.
在目前的检验下,我们最初的假设,即个人层面的影响是充分地由一个随机效应模型被拒绝。这个结果是基于我们其余的模型规范的,随机效应可能适用于一些替代的工资模型。
案例2
多元和条件logit模型的一个严格假设是,模型的结果类别具有无关选项的独立性(IIA)。简单地说,这一假设要求类别的纳入或排除不影响与其他类别中的回归量相关的相对风险。
违反这一假设的一个经典例子涉及交通方式的选择;参考McFadden(1974)。为了简单起见,假设一个交通模型有四种可能的结果:乘火车上班,乘公共汽车上班,开福特上班,开雪佛兰上班。显然,“驾驶福特”更接近于“驾驶雪佛兰”,而不是“乘坐火车”(至少对大多数人来说)。
这意味着,从模型中排除“驱动福特”可能会影响剩余选项的相对风险,该模型将不服从IIA假设。利用mlogit中提供的数据,我们将使用一个简化模型对IIA进行测试。
在补偿保险、预付保险和未投保保险中,保险类型的选择被建模为年龄和性别的函数。允许补偿类别为基本类别,且包含所有三种结果的模型是合适的。然后将结果存储在名称allcats下。
use https://www.stata-press.com/data/r17/sysdsn3
(Health insurance data)
. mlogit insure age male
estimates store allcats
结果为:
. use "C:\Users\Metrics\Desktop\sysdsn3.dta"
(Health insurance data)
. mlogit insure age male
Iteration 0: log likelihood = -555.85446
Iteration 1: log likelihood = -551.32973
Iteration 2: log likelihood = -551.32802
Iteration 3: log likelihood = -551.32802
Multinomial logistic regression Number of obs = 615
LR chi2(4) = 9.05
Prob > chi2 = 0.0598
Log likelihood = -551.32802 Pseudo R2 = 0.0081
------------------------------------------------------------------------------
insure | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Indemnity | (base outcome)
-------------+----------------------------------------------------------------
Prepaid |
age | -.0100251 .0060181 -1.67 0.096 -.0218204 .0017702
male | .5095747 .1977893 2.58 0.010 .1219147 .8972346
_cons | .2633838 .2787575 0.94 0.345 -.2829708 .8097383
-------------+----------------------------------------------------------------
Uninsure |
age | -.0051925 .0113821 -0.46 0.648 -.0275011 .0171161
male | .4748547 .3618462 1.31 0.189 -.2343508 1.18406
_cons | -1.756843 .5309602 -3.31 0.001 -2.797506 -.7161803
------------------------------------------------------------------------------
.
. estimates store allcats
.
在IIA假设下,如果我们从模型中排除一个结果,我们预计系数不会发生系统性变化。(有关详细讨论,请参阅Hausman和McFadden[1984]。)我们重新估计参数,排除未保险的结果,并对完全有效的全模型执行Hausman检验
. mlogit insure age male if insure != "Uninsure":insure
hausman . allcats, alleqs constant
结果为:
. mlogit insure age male if insure != "Uninsure":insure
Iteration 0: log likelihood = -394.8693
Iteration 1: log likelihood = -390.4871
Iteration 2: log likelihood = -390.48643
Iteration 3: log likelihood = -390.48643
Multinomial logistic regression Number of obs = 570
LR chi2(2) = 8.77
Prob > chi2 = 0.0125
Log likelihood = -390.48643 Pseudo R2 = 0.0111
------------------------------------------------------------------------------
insure | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Indemnity | (base outcome)
-------------+----------------------------------------------------------------
Prepaid |
age | -.0101521 .0060049 -1.69 0.091 -.0219214 .0016173
male | .5144003 .1981735 2.60 0.009 .1259874 .9028133
_cons | .2678043 .2775563 0.96 0.335 -.276196 .8118046
------------------------------------------------------------------------------
.
.
.
. hausman . allcats, alleqs constant
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| . allcats Difference Std. err.
-------------+----------------------------------------------------------------
age | -.0101521 -.0100251 -.0001269 .
male | .5144003 .5095747 .0048256 .0123338
_cons | .2678043 .2633838 .0044205 .
------------------------------------------------------------------------------
b = Consistent under H0 and Ha; obtained from mlogit.
B = Inconsistent under Ha, efficient under H0; obtained from mlogit.
Test of H0: Difference in coefficients not systematic
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 0.08
Prob > chi2 = 0.9944
(V_b-V_B is not positive definite)
.
mlogit命令的if条件语法简单地用insure值标签标识“未投保”类别;
在检验hausman的输出时,我们看到没有证据表明IIA假设已经被违反。
由于Hausman检验是对模型系数的标准化比较,使用mlogit检验要求在两个竞争模型中基本结果是相同的。特别是,如果为了测试IIA而删除了最常见的类别(默认的基本结果),则必须使用mlogit中的baseoutcome()选项手动将基本结果设置为其他内容。或者您可以使用hausman命令的equation()选项来对齐两个模型的方程。
我们也可以对模型中剩下的备选方案进行Hausman IIA测试:
mlogit insure age male if insure != "Prepaid":insure
Iteration 0: log likelihood = -132.59913
Iteration 1: log likelihood = -131.78009
Iteration 2: log likelihood = -131.76808
Iteration 3: log likelihood = -131.76807
Multinomial logistic regression Number of obs = 338
LR chi2(2) = 1.66
Prob > chi2 = 0.4356
Log likelihood = -131.76807 Pseudo R2 = 0.0063
------------------------------------------------------------------------------
insure | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Indemnity | (base outcome)
-------------+----------------------------------------------------------------
Uninsure |
age | -.0041055 .0115807 -0.35 0.723 -.0268033 .0185923
male | .4591074 .3595663 1.28 0.202 -.2456296 1.163844
_cons | -1.801774 .5474476 -3.29 0.001 -2.874752 -.7287968
------------------------------------------------------------------------------
. hausman . allcats, alleqs constant
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| . allcats Difference Std. err.
-------------+----------------------------------------------------------------
age | -.0041055 -.0051925 .001087 .0021355
male | .4591074 .4748547 -.0157473 .
_cons | -1.801774 -1.756843 -.0449311 .1333421
------------------------------------------------------------------------------
b = Consistent under H0 and Ha; obtained from mlogit.
B = Inconsistent under Ha, efficient under H0; obtained from mlogit.
Test of H0: Difference in coefficients not systematic
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= -0.18
Warning: chi2 < 0 ==> model fitted on these data
fails to meet the asymptotic assumptions
of the Hausman test; see suest for a
generalized test.
这里的χ2统计量实际上是负的。我们可以将这个结果解释为我们不能拒绝零假设的有力证据。对于Hausman检验来说,这样的结果并不罕见,特别是当样本相对较小时——在这个数据集中只有45个未投保的个人。
我们对这个例子中的Hausman测试结果感到惊讶吗?不是真的。从原始多项logit模型上的z统计数据判断,我们正在努力用当前规范识别数据中的任何结构。即使当我们愿意假设IIA并在此假设下计算有效估计量时,也很少能确定在统计上与基本类别上的影响不同。试图将Hausman检验建立在两个糟糕估计之间的对比(差异)上,只会对现有数据要求太多。
在例2中,我们遇到了一个Hausman没有很好定义的情况。不幸的是,根据我们的经验,这种情况经常发生。Stata提供了Hausman检验的另一种方法,通过对两个估计量之间差异的方差的另一种估计量来克服这个问题。另一个估计量保证是正的。这个替代估计还允许通过放宽一个估计是有效的假设来扩大hausman型检验适用的问题范围。例如,您可以对聚集观测和调查估计执行hausman类型测试。详情见suest