因果效应中的双重稳健估计值, 让你的估计精准少误

因果推断研究小组计量经济圈 2021-10-23

凡是搞计量经济的，都关注这个号了

邮箱：econometrics666@sina.cn

所有计量经济圈方法论丛的code程序, 宏微观数据库和各种软件都放在社群里.欢迎到计量经济圈社群交流访问.

现在Stata中一个估计处理效应的通用框架teffects，可以做的处理效应种类非常多，比如PSM，NNM，RA，IPW等。

teffects allows you to write a model for the treatment and a model for the outcome. We will show how—even if you misspecify one of the models—you can still get correct estimates using doubly robust estimators.

In experimental data, the treatment is randomized so that a difference between the average treated outcomes and the average nontreated outcomes estimates the average treatment effect (ATE). Suppose you want to estimate the ATE of a mother’s smoking on her baby’s birthweight. The ethical impossibility of asking a random selection of pregnant women to smoke mandates that these data be observational. Which women choose to smoke while pregnant almost certainly depends on observable covariates, such as the mother’s age.

We use a conditional model to make the treatment as good as random. More formally, we assume that conditioning on observable covariates makes the outcome conditionally independent of the treatment. Conditional independence allows us to use differences in model-adjusted averages to estimate the ATE.

The regression-adjustment (RA) estimator uses a model for the outcome. The RA estimator uses a difference in the average predictions for the treated and the average predictions for the nontreated to estimate the ATE. Below we use teffects ra to estimate the ATE when conditioning on the mother’s marital status, her education level, whether she had a prenatal visit in the first trimester, and whether it was her first baby.

Mothers’ smoking lowers the average birthweight by 231 grams.

The inverse-probability-weighted (IPW) estimator uses a model for the treatment instead of a model for the outcome; it uses the predicted treatment probabilities to weight the observed outcomes. The difference between the weighted treated outcomes and the weighted nontreated outcomes estimates the ATE. Conditioning on the same variables as above, we now use teffects ipw to estimate the ATE:

Mothers’ smoking again lowers the average birthweight by 231 grams.

We could use both models instead of one. The shocking fact is that only one of the two models must be correct to estimate the ATE, whether we use the augmented-IPW (AIPW) combination proposed by Robins and Rotnitzky (1995) or the IPW-regression-adjust ment (IPWRA) combination proposed by Wooldridge (2010).

The AIPW estimator augments the IPW estimator with a correction term. The term removes the bias if the treatment model is wrong and the outcome model is correct, and the term goes to 0 if the treatment model is correct and the outcome model is wrong.

The IPWRA estimator uses IPW probability weights when performing RA. The weights do not affect the accuracy of the RA estimator if the treatment model is wrong and the outcome model is correct. The weights correct the RA estimator if the treatment model is correct and the outcome model is wrong.

双重稳健估计值是如下两个：aipw和ipwra。平时可以使用一下这两个，主要的好处就是给你的处理政策变量和结果变量都加上了covariates进行回归，因此只要有一个方程识别正确对整体的估计就会很好。相当于加了一个双保险，这样出错误的几率就会下降很多。

We now use teffects aipw to estimate the ATE:

Mothers’ smoking lowers the average birthweight by 230 grams.

Finally, we use teffects ipwra to estimate the ATE:

Mothers’ smoking lowers the average birthweight by 227 grams.

All of these results tell a similar story, so we assume that both the outcome and the treatment models are correct. When both models are correct, the AIPW estimator is more efficient than either the RA or the IPW estimator. We started off in search of robustness and ended up with extra efficiency.

References

Robins, J. M., and A. Rotnitzky. 1995.
Semiparametric efficiency in multivariate regression models with missing data. Journal of the American Statistical Association 90: 122–129.
Wooldridge, J. M. 2010.
Econometric Analysis of Cross Section and Panel Data. 2nd ed. Cambridge, MA: MIT Press.

这是来自于一个题主的回答：

Doubly robust estimation is not actually particularly hard to implement in the language of your choice. All you are actually doing is controlling for variables in two ways, rather than one- the idea being that as long as one of the two models used for control is correct, you've successfully controlled for confounding.

The easiest way to do it, in my mind, is to use Inverse-Probability-of-Treatment (IPTW) weights to weight the data set, then also include variables in a normal regression model. This is how the authors approach the problem in the paper linked above. There are other options as well, usually built off propensity scores used for either matching or as a covariate in the model.

There are lots of introductions to IPTW in whatever statistical language you prefer. I'd provide code snippets, but all of mine are in SAS, and would likely read very much like the authors.

Briefly, what you do is model the probability of exposure based on your covariates using something like logistic regression and estimate the predicted probability of exposure based on that model. This gives you a propensity score. The Inverse Probability of Treatment Weight is, as the name suggests, 1/Propensity Score. This sometimes produces extreme values, so some people stabilize the weight by substituting the marginal probability of exposure (obtained by a logistic regression model of the outcome and no covariates) for 1 in the equation above.

Instead of treating each subject in your analysis as 1 subject, you now treat them as n copies of a subject, where n is their weight. If you run your regression model using those weights and including covariates, out comes a doubly robust estimate.

相应材料都放在计量社群里, 有需要可以下载参看。

推荐阅读：

0.双重差分DID的种类细分, 不得不看的20篇文章

1.工企数据库匹配160大步骤的完整程序和相应数据

2.1998-2016年中国地级市年均PM2.5数据release

3.1997-2014中国市场化指数权威版本release

4.2005-2015中国分省分行业CO2数据circulation

5.实证研究中用到的135篇文章, 社科学者常用toolkit

计量经济圈是中国计量第一大社区，我们致力于推动中国计量理论和实证技能的提升，圈子以海内外高校研究生和教师为主。计量经济圈绝对六多精神：社科资料最多、社科数据最多、科研牛人最多、海外名校最多、热情互助最多、前沿趋势最多。如果你热爱计量并希望长见识，那欢迎你加入到咱们这个大家庭(戳这里)，要不然你只能去其他那些Open access圈子了。注意：进去之后一定要看小鹅社群“群公告”，不然接收不了群息，也不知道怎么进入咱们独一无二的微信群和QQ群。在规则框架下社群交流讨论无时间限制。

: ， . Video Mini Program Like ，轻点两下取消赞 Wow ，轻点两下取消在看

国产算力

为什么是英国？

人大在读博士生实名举报导师性骚扰、强制猥亵……

李晟医生替同事挨刀枉死！该同事资料照片被曝，李晟家庭情况曝光

前外交部长秦刚二三事

因果效应中的双重稳健估计值, 让你的估计精准少误

References

您可能也对以下帖子感兴趣

国产算力

为什么是英国？

人大在读博士生实名举报导师性骚扰、强制猥亵……

李晟医生替同事挨刀枉死！该同事资料照片被曝，李晟家庭情况曝光

前外交部长秦刚二三事

生成图片，分享到微信朋友圈

因果效应中的双重稳健估计值, 让你的估计精准少误

References

您可能也对以下帖子感兴趣