Stata：高效实现面板回归控制法-rcm

Original 连享会连享会 2022-12-31

收录于合集

#面板数据 126 个

#连享会 460 个

#Stata命令 228 个

#回归控制法 1 个

👇 连享会 · 推文导航 | www.lianxh.cn

🍎 Stata：Stata基础 | Stata绘图 | Stata程序 | Stata新命令
📘 论文：数据处理 | 结果输出 | 论文写作 | 数据分享
💹 计量：回归分析 | 交乘项-调节 | IV-GMM | 时间序列 | 面板数据 | 空间计量 | Probit-Logit | 分位数回归
⛳ 专题：SFA-DEA | 生存分析 | 爬虫 | 机器学习 | 文本分析
🔃 因果：DID | RDD | 因果推断 | 合成控制法 | PSM-Matching
🔨 工具：工具软件 | Markdown | Python-R-Stata
🎧 课程：公开课-直播 | 计量专题 | 关于连享会

连享会 · 文本分析 | 爬虫 | 机器学习

作者：彭甲超 (中国地质大学)
邮箱：pengjiachao@cug.edu.cn

编者按：本文主要摘译自下文，特此致谢！
Source：Yan G, Chen Q. RCM: Stata module to implement regression control method/panel data approach to program evaluation[J]. 2021. -Link- -Slide- -Video-

1. 简介
2. 理论背景
3. 命令介绍
4. Stata 实例操作

4.1 OLS 估计
4.2 Post-Lasso OLS 估计
4.3 安慰剂检验

5. 参考资料
6. 相关推文

温馨提示： 文中链接在微信中无法生效。请点击底部「阅读原文」。或直接长按/扫描如下二维码，直达原文：

1. 简介

回归控制法 (Regression Control Method，rcm) 由 Hsiao 等 (2012) 提出。该方法利用横截面相关性，通过线性回归 (OLS)、Lasso 或 Post-Lasso-OLS 构建处理单元的反事实结果，是近年来流行的一种因果推断方法，尤其适用于面板数据中只有一个或几个处理个体或地区的情形。

具体而言，rcm 法认为经济中存在一些不可观测的 “共同因子” 影响个体，使得不同个体之间存在截面相关性。rcm 命令是首个在 Stata 中实现回归控制法的命令 (Yan 和 Chen，2021)，能够实现通过线性回归 (OLS)、Lasso 或 Post-Lasso-OLS 来构建处理单元的反事实结果预测和因果推断。

2. 理论背景

本节参考方诚和陈强 (2021)的相关内容论述 rcm 理论模型。

假定观测到的面板数据为，其中为个体在时期的结果变量。假定第 1 位个体从期开始受到政策冲击，面板数据的时间维度 ( 为政策冲击之前的期数，为政策冲击之后的期数)。样本中的其他个体均未受到政策冲击，构成控制组。

记为个体在时期受到政策干预的结果表现，而为个体在时期未受到政策干预的结果表现，则政策干预对个体在时期的处理效应为。因果推断的困难在于，研究者不可能同时观测与，故存在数据缺失问题。

可观测的结果变量可写为：

其中为虚拟变量，表示个体在时期受到政策干预，而表示未受政策干预。进一步，假定由一个 “因子模型” 所生成：

其中为个体固定效应，为维 “共同因子”，为相应的维 “因子载荷”，表示共同因子对个体的作用力度可以不同，为个体的特异扰动项。给定时期，将所有个体的方程叠放，可得更简洁的矩阵表达式：

其中，，为 “因子载荷矩阵”。Hsiao 等 (2012) 与 Li 和 Bell (2017) 证明，在一定的正则条件下，可将方程 (3) 进行适当的变换 (在方程两边同乘以某合适的行向量，以消去不可观测的 )，从而得到如下的时间序列回归方程：

其中包含所有控制组个体的结果变量。在使用政策冲击之前的数据 () 对方程 (4) 进行 OLS 回归后，可以使用所得方程预测个体在政策冲击之后的反事实结果：

在政策实施前的区段，如果方程 (4) 的 OLS 回归拟合效果好，则此模型预测个体在政策实施后的反事实结果更加可信。显然，政策实施前的良好拟合效果是应用 rcm 的重要前提。如果拟合效果欠佳，则政策效应的估计将出现偏差。基于以上反事实预测，可得政策干预的处理效应估计值：

在用政策实施前的数据估计方程 (4) 时，还需选择放入此方程的控制组个体数。放入越多的控制组个体，则方程 (4) 的解释变量越多，虽可得到更高的，但可能导致 “过拟合”。

为此，需要使用信息准则来惩罚过于复杂 (解释变量过多) 的模型，以选择解释变量的 “最优子集”，保证样本外 (即政策实施后) 的预测效果。Hsiao 等 (2012) 建议使用 AIC 与 AICC 来选择最优子集，而 Li 和 Bell (2017) 则建议使用 Lasso 估计量筛选变量，然后再进行OLS回归，即所谓 Post-Lasso-OLS。

假定选择控制组的个个体单元来预测被解释变量变动趋势，则回归方程 (4) 中共有 () 个待估参数 (含常数项)。以上三种信息准则的表达式分别为：

其中为样本容量 (政策实施前的时期数)，而为回归方程 (4) 的残差平方和。

3. 命令介绍

* 命令安装
ssc install rcm, all replace

* 命令语法
rcm depvar [indepvars] [if] [in], trunit(#) trperiod(#) [options]

其中，

depvar 和 indepvars 必须是数值变量，不允许使用缩写。同时必须使用 xtset panelvar timevar 来声明面板数据集；
trunit(#) 指定被处理单元 (即受干预影响的单元) 的单元号。注意，只能指定单个受影响的单元；
trperoid(#) 干预发生的时间周期，必须是一个整数。注意，只能指定一个时间段。

options 如下：

(1) Model

ctrlunit(numlist)：表示控制单元；
preperiod(numlist)：表示受干预前；
postperiod(numlist)：表示受干预后。

(2) Optimization：模型选择包括 rcm 自动执行的两个步骤：

步骤 1：选择次优模型。rcm 选择一系列次优模型，每个模型包含一个唯一的预测子集。选择次优模型的具体步骤取决于 method(sel_method) 指定的选择方法。现有的选择方法有：最优子集、Lasso 估计量、前向分步法或后向分步法；
步骤 2：从次优模型中选择最优模型。rcm 通过信息准则或由准则 (sel_criterion) 指定的交叉验证从次优模型中选择最优模型。允许的 sel_criteria 包括信息准则或 K 折交叉验证 (K-fold Cross-Validation) 选择变量。默认情况下，在选择最佳模型时对预测器的数量没有限制，但是可以通过范围 (p_min p_max) 指定允许的预测器数量，以限制其范围。

(3) Placebo Test

placebo([unit unit(numlist) period(numlist) cutoff(#_c)])：指定要进行安慰剂试验的类型，否则不进行安慰剂试验。

(4) Reporting

frame(framename)：创建一个 Stata 数据框，以宽格式存储数据集和生成的变量；
nofigure：不要显示图，默认是显示所有的图。

4. Stata 实例操作

4.1 OLS 估计

. ssc install rcm, all replace // 获取数据
. use growth, clear
. xtset region time

* Show the unit number of Hong Kong and treatment periods
. label list
. display tq(1997q3)
. display tq(2003q4)

OLS 估计通常报告步骤 1 和步骤 2 相应结果，最后给出反事实的对比。

. /* 
> Replicate results in Hsiao et al.(2012) with specified control units 
> and designated post-treatment periods
> */
. rcm gdp, trunit(9) trperiod(150) ctrlunit(4 10 12 13 14 19 20 22 23 25) ///
>     postperiod(150/175)

Step 1: Select the suboptimal models
Step 2: Select the optimal model from the suboptimal models
Comparing the suboptimal models containing different set of predictors:
-----------------------------------------------------------------
  K |    AICc         AIC         BIC         MBIC     R-squared 
----+------------------------------------------------------------
  1 |  -144.7514   -146.4657   -143.7946   -155.6437      0.4034 
  2 |  -160.5063   -163.5832   -160.0217   -170.4959      0.7937 
  3 |  -170.6492   -175.6492   -171.1973   -180.9287      0.9056 
  4 |  -171.7725   -179.4088   -174.0666   -183.1559      0.9314 
       (omitted)
 10 |  -111.3603   -173.7603   -163.0758   -167.4256      0.9518 
-----------------------------------------------------------------
Among models with 1-10 predictors, the optimal model contains 4 predictors 
with AICc = -171.7725.

Fitting results in the pre-treatment periods using OLS:
----------------------------------------------------------------------------------
    gdp·HongKong | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-----------------+----------------------------------------------------------------
       gdp·Korea |    -0.4323     0.0634    -6.82   0.000      -0.5692     -0.2954
       gdp·Japan |    -0.6760     0.1117    -6.05   0.000      -0.9172     -0.4347
      gdp·Taiwan |     0.7926     0.3099     2.56   0.024       0.1231      1.4621
gdp·UnitedStates |     0.4860     0.2195     2.21   0.045       0.0118      0.9603
           _cons |     0.0263     0.0170     1.54   0.147      -0.0105      0.0631
----------------------------------------------------------------------------------

Prediction results in the post-treatment periods using OLS:
-------------------------------------------------------------
  Time  | Actual Outcome  Predicted Outcome  Treatment Effect
--------+----------------------------------------------------
 1997q3 |        0.0610             0.0798           -0.0188 
 1997q4 |        0.0140             0.0810           -0.0670 
              (omitted)
 2003q3 |        0.0380             0.0628           -0.0248 
 2003q4 |        0.0470             0.0761           -0.0291 
--------+----------------------------------------------------
  Mean  |        0.0180             0.0576           -0.0396 
-------------------------------------------------------------
Note: The average treatment effect over the post-treatment periods is -0.0396.

rcm 还具有完善的画图功能。OLS 估计给出的反事实分析结果和处理效应，分别如下图所示。

此回归方程具有较好的拟合效果，在政策实施前 (图中的虚线左侧)，反事实预测值与的实际观测值十分接近，这表明控制组可以很好地反映 GDP 走势，甚至拐点部分也能较好地拟合。更重要的，从政策冲击开始之后 (图中虚线右侧)，反事实预测值与实际观测值开始日益背离，并随时间的大幅波动，这说明政策对 GDP 影响较为显著。

将实际观测值减去反事实预测值，可得政策效应变化。从处理效应结果可知，政策实施后，与控制组相比，其效应越来越显著。另一方面，与反事实预测的大起大落相比，实际 GDP 明显更为稳定，方差更小，符合政策目标。

4.2 Post-Lasso OLS 估计

与 OLS 估计汇报步骤一致，Post-Lasso 估计同样报告了步骤 1 和步骤 2 的相关结果。不同的是，Post-Lasso OLS 通过惩罚回归来避免过拟合，其最小化的目标函数为：

其中为方程 (4) 的残差平方和，为 “1一范数”。为 “调节参数”，用以控制惩罚的力度 (方诚和陈强，2021)。在实践中常用 Lasso 来选择变量，然后扔掉 Lasso 的回归系数，再对筛选出来的变量进行 OLS 回归。

. /* 
> Use post-lasso OLS with LOOCV and all control units, 
> and create a Stata frame "growth_wide" storing dataset 
> with generated variables in wide form
> */
. rcm gdp, trunit(9) trperiod(150) postperiod(150/175) method(lasso) ///
>     criterion(cv) frame(growth_wide)

Step 1: Select the suboptimal models
Step 2: Select the optimal model from the suboptimal models
Comparing the suboptimal models containing different set of predictors:
-------------------------------------------------------------------------
  K |    lambda      CVMSE     R-squared |           Operation           
----+------------------------------------+-------------------------------
  1 |     0.0136      0.0004      0.0513 | add gdp·Mexico       
       (omitted) 
 12 |     0.0002      0.0001      0.9782 | drop gdp·Canada      
 13 |     0.0002      0.0001      0.9795 | add gdp·UnitedKingdom 
 12 |     0.0002      0.0001      0.9822 | drop gdp·UnitedStates 
 12 |     0.0001      0.0001      0.9876 | .                              
-------------------------------------------------------------------------
Among models with 1-24 predictors, the optimal model contains 12 predictors 
with CVMSE = 0.0001.

Fitting results in the pre-treatment periods using post-lasso OLS:
-----------------------------------------------------------------------------------
     gdp·HongKong | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
------------------+----------------------------------------------------------------
    gdp·Australia |     0.0293     0.0988     0.30   0.779      -0.2247      0.2833
        gdp·China |     0.3318     0.1115     2.98   0.031       0.0452      0.6184
       gdp·France |     0.4306     0.1858     2.32   0.068      -0.0469      0.9081
      gdp·Germany |     0.5107     0.1917     2.66   0.045       0.0180      1.0033
        gdp·Japan |    -0.8833     0.1007    -8.77   0.000      -1.1421     -0.6244
        gdp·Korea |    -0.6836     0.0753    -9.07   0.000      -0.8773     -0.4900
     gdp·Malaysia |     0.0400     0.0481     0.83   0.443      -0.0836      0.1636
       gdp·Mexico |     0.0667     0.0489     1.36   0.231      -0.0591      0.1925
  gdp·Philippines |    -0.6231     0.1339    -4.65   0.006      -0.9674     -0.2789
  gdp·Switzerland |     0.1001     0.1098     0.91   0.404      -0.1822      0.3824
       gdp·Taiwan |    -0.4112     0.4313    -0.95   0.384      -1.5198      0.6974
gdp·UnitedKingdom |     0.8364     0.2854     2.93   0.033       0.1027      1.5701
            _cons |     0.0881     0.0220     4.01   0.010       0.0317      0.1446
-----------------------------------------------------------------------------------

Prediction results in the post-treatment periods using post-lasso OLS:
-------------------------------------------------------------
  Time  | Actual Outcome  Predicted Outcome  Treatment Effect
--------+----------------------------------------------------
 1997q3 |        0.0610             0.0896           -0.0286 
 1997q4 |        0.0140             0.0929           -0.0789 
              (omitted)
 2003q3 |        0.0380             0.0829           -0.0449 
 2003q4 |        0.0470             0.0950           -0.0480 
--------+----------------------------------------------------
  Mean  |        0.0180             0.1055           -0.0875 
-------------------------------------------------------------
Note: The average treatment effect over the post-treatment periods is -0.0875.

4.3 安慰剂检验

. * Implement a placebo test using all fake treatment units in the donor pool
. rcm gdp, trunit(9) trperiod(150) postperiod(150/175) method(lasso) ///
>     criterion(cv) placebo(unit)

Step 1: Select the suboptimal models
Step 2: Select the optimal model from the suboptimal models
Comparing the suboptimal models containing different set of predictors:
-------------------------------------------------------------------------
  K |    lambda      CVMSE     R-squared |           Operation           
----+------------------------------------+-------------------------------
  1 |     0.0136      0.0004      0.0513 | add gdp·Mexico       
       (omitted)
 12 |     0.0002      0.0001      0.9782 | drop gdp·Canada      
 13 |     0.0002      0.0001      0.9795 | add gdp·UnitedKingdom 
 12 |     0.0002      0.0001      0.9822 | drop gdp·UnitedStates 
 12 |     0.0001      0.0001      0.9876 | .                              
-------------------------------------------------------------------------
Among models with 1-24 predictors, the optimal model contains 12 predictors 
with CVMSE = 0.0001.

Fitting results in the pre-treatment periods using post-lasso OLS:
-----------------------------------------------------------------------------------
     gdp·HongKong | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
------------------+----------------------------------------------------------------
    gdp·Australia |     0.0293     0.0988     0.30   0.779      -0.2247      0.2833
				    (omitted)
       gdp·Taiwan |    -0.4112     0.4313    -0.95   0.384      -1.5198      0.6974
gdp·UnitedKingdom |     0.8364     0.2854     2.93   0.033       0.1027      1.5701
            _cons |     0.0881     0.0220     4.01   0.010       0.0317      0.1446
-----------------------------------------------------------------------------------

Prediction results in the post-treatment periods using post-lasso OLS:
-------------------------------------------------------------
  Time  | Actual Outcome  Predicted Outcome  Treatment Effect
--------+----------------------------------------------------
 1997q3 |        0.0610             0.0896           -0.0286 
 1997q4 |        0.0140             0.0929           -0.0789 
              (omitted)
 2003q3 |        0.0380             0.0829           -0.0449 
 2003q4 |        0.0470             0.0950           -0.0480 
--------+----------------------------------------------------
  Mean  |        0.0180             0.1055           -0.0875 
-------------------------------------------------------------
Note: The average treatment effect over the post-treatment periods is -0.0875.

Placebo test results using fake treatment units:
-------------------------------------------------------------------------------
      Unit     |  Pre MSPE  Post MSPE   Post/Pre MSPE    Pre MSPE of Fake Unit/
               |                                       Pre MSPE of Treated Unit
---------------+---------------------------------------------------------------
      HongKong |    0.0000     0.0198      3009.7824                    1.0000 
     Australia |    0.0000     0.0008        18.5015                    6.4202 
                  (omitted)
 UnitedKingdom |    0.0000     0.0015       161.0820                    1.4022 
  UnitedStates |    0.0000     0.0001       285.1185                    0.0517 
-------------------------------------------------------------------------------
Note: The probability of obtaining a post/pre-treatment MSPE ratio as large 
as HongKong's is 0.1200.

Placebo test results using fake treatment units (continued):
------------------------------------------------------------------
  Time  |  Treatment Effect      p-value of Treatment Effect      
        |                     Two-sided   Right-sided   Left-sided
--------+---------------------------------------------------------
 1997q3 |          -0.0286       0.2000       0.8800       0.1600 
 1997q4 |          -0.0789       0.0400       1.0000       0.0400 
                  (omitted)
 2003q3 |          -0.0449       0.1600       0.8800       0.1600 
 2003q4 |          -0.0480       0.1200       0.9200       0.1200 
------------------------------------------------------------------

rcm 命令给出了 “虚假处理个体” 或 “虚假处理时间”的安慰剂检验，部分图示如下：

5. 参考资料

Hsiao C, Steve Ching H, Ki Wan S. A panel data approach for program evaluation [J]. Journal of Applied Econometrics, 2012, 27(5): 705-740. -PDF-
Yan G, Chen Q. RCM: Stata module to implement regression control method/panel data approach to program evaluation[J]. 2021. -Link- -Slide- -Video-
方诚, 陈强. 棚户区改造安置的第三种方式——以安庆市的房票政策为例[J]. 经济学(季刊), 2021, 21(02):733-754. -Link-
Li K T, Bell D R. Estimation of average treatment effects with panel data: Asymptotic theory and implementation[J]. Journal of Econometrics, 2017, 197(1): 65-75. -PDF-

理论 + 实证：从「读懂模型」到「折腾模型」
🎦 理论模型构建专题
📅 2022 年 4 月 23-24 日 (周六-周日)
🔑 郭凯明副教授 (中山大学)
🍓 课程主页：https://gitee.com/lianxh/emodel

6. 相关推文

Note：产生如下推文列表的 Stata 命令为：
lianxh aic lasso, m
安装最新版 lianxh 命令：
ssc install lianxh, replace

专题：Stata教程

Stata检验：AIC-BIC-MSE-MAE-等信息准则的计算

专题：Stata命令

Stata新命令-pdslasso：众多控制变量和工具变量如何挑选？

专题：回归分析

Stata：拉索回归和岭回归-(Ridge,-Lasso)-简介
Stata Blogs - An introduction to the lasso in Stata (拉索回归简介)

专题：IV-GMM

Lasso一下：再多的控制变量和工具变量我也不怕-T217

专题：机器学习

图解Lasso系列A：Lasso的变量筛选能力
Lasso：拉索中如何做统计推断
Stata：拉索开心读懂-Lasso入门

New！ Stata 搜索神器：lianxh 和 songbl GIF 动图介绍
搜：推文、数据分享、期刊论文、重现代码 ……
👉 安装：
. ssc install lianxh
. ssc install songbl
👉 使用：
. lianxh DID 倍分法
. songbl all

🍏 关于我们

连享会 ( www.lianxh.cn，推文列表) 由中山大学连玉君老师团队创办，定期分享实证分析经验。
直通车： 👉【**百度一下：**连享会】即可直达连享会主页。亦可进一步添加「知乎」,「b 站」,「面板数据」,「公开课」等关键词细化搜索。

”FAN某”的离婚财产分割判决书（全文）

”FAN某”的离婚财产分割判决书（全文）

哈里斯女粉搞4B运动、毒杀丈夫，回旋镖能否让美国“血流成河”

比国产光刻机更重要的IPO要来了！

这把绝对高端局，只有中国人才懂

Stata：高效实现面板回归控制法-rcm

1. 简介

2. 理论背景

3. 命令介绍

4. Stata 实例操作

4.1 OLS 估计

4.2 Post-Lasso OLS 估计

4.3 安慰剂检验

5. 参考资料

6. 相关推文

🍏 关于我们

您可能也对以下帖子感兴趣

”FAN某”的离婚财产分割判决书（全文）

”FAN某”的离婚财产分割判决书（全文）

哈里斯女粉搞4B运动、毒杀丈夫，回旋镖能否让美国“血流成河”

比国产光刻机更重要的IPO要来了！

这把绝对高端局，只有中国人才懂

生成图片，分享到微信朋友圈

Stata：高效实现面板回归控制法-rcm

1. 简介

2. 理论背景

3. 命令介绍

4. Stata 实例操作

4.1 OLS 估计

4.2 Post-Lasso OLS 估计

4.3 安慰剂检验

5. 参考资料

6. 相关推文

🍏 关于我们

您可能也对以下帖子感兴趣