Stata:coefplot 如何选择交互项系数估计值进行绘图?
为了让大家更好的理解本文的内容,欢迎各位培训班会员参加明晚 8 点的直播课 「Stata:coefplot 如何选择交互项系数估计值进行绘图」
最近有小伙伴问了这样的一个问题:
也就是使用 coefplot 展示系数估计值与置信区间时如何选择交互项变量。
方法一:查看所有可用的变量名称
这里我们使用 auto 数据集为例:
sysuse auto, clear
*> (1978 automobile data)
在回归中使用因子交乘算子:
reg price weight rep78##foreign
*> note: 1b.rep78#1.foreign identifies no observations in the sample.
*> note: 2.rep78#1.foreign identifies no observations in the sample.
*> note: 5.rep78#1.foreign omitted because of collinearity.
*>
*> Source | SS df MS Number of obs = 69
*> -------------+---------------------------------- F(8, 60) = 9.05
*> Model | 315359465 8 39419933.1 Prob > F = 0.0000
*> Residual | 261437494 60 4357291.57 R-squared = 0.5467
*> -------------+---------------------------------- Adj R-squared = 0.4863
*> Total | 576796959 68 8482308.22 Root MSE = 2087.4
*>
*> -------------------------------------------------------------------------------
*> price | Coefficient Std. err. t P>|t| [95% conf. interval]
*> --------------+----------------------------------------------------------------
*> weight | 3.811384 .4666457 8.17 0.000 2.877954 4.744815
*> |
*> rep78 |
*> 2 | 435.9862 1654.487 0.26 0.793 -2873.481 3745.454
*> 3 | 738.2337 1538.028 0.48 0.633 -2338.28 3814.748
*> 4 | -330.3094 1644.223 -0.20 0.841 -3619.246 2958.627
*> 5 | 3984.978 2154.133 1.85 0.069 -323.93 8293.886
*> |
*> foreign |
*> Foreign | 398.453 1644.867 0.24 0.809 -2891.772 3688.677
*> |
*> rep78#foreign |
*> 1#Foreign | 0 (empty)
*> 2#Foreign | 0 (empty)
*> 3#Foreign | 3281.889 2245.576 1.46 0.149 -1209.932 7773.709
*> 4#Foreign | 5029.403 2076.434 2.42 0.018 875.9157 9182.89
*> 5#Foreign | 0 (omitted)
*> |
*> _cons | -7250.791 2066.713 -3.51 0.001 -11384.83 -3116.75
*> -------------------------------------------------------------------------------
直接使用 coefplot 就可以绘制所有变量的:
coefplot
那么该如何仅仅选择交互项的系数绘制呢?
回归的结果也会被存储在返回值中,查看 e-class 返回值:
eret list
*> scalars:
*> e(N) = 69
*> e(df_m) = 8
*> e(df_r) = 60
*> e(F) = 9.046888971642481
*> e(r2) = .5467425923475249
*> e(rmse) = 2087.412650144713
*> e(mss) = 315359464.5505149
*> e(rss) = 261437494.3190505
*> e(r2_a) = .4863082713271949
*> e(ll) = -620.4989339800943
*> e(ll_0) = -647.7986144493904
*> e(rank) = 9
*>
*> macros:
*> e(cmdline) : "regress price weight rep78##foreign"
*> e(title) : "Linear regression"
*> e(marginsok) : "XB default"
*> e(vce) : "ols"
*> e(depvar) : "price"
*> e(cmd) : "regress"
*> e(properties) : "b V"
*> e(predict) : "regres_p"
*> e(model) : "ols"
*> e(estat_cmd) : "regress_estat"
*>
*> matrices:
*> e(b) : 1 x 19
*> e(V) : 19 x 19
*> e(beta) : 1 x 18
*>
*> functions:
*> e(sample)
回归系数都会被存储在 e(b) 中:
*> e(b)[1,19]
*> 1b. 2. 3. 4. 5. 0b. 1. 1b.rep78#
*> weight rep78 rep78 rep78 rep78 rep78 foreign foreign 0b.foreign
*> y1 3.8113843 0 435.98624 738.23368 -330.30942 3984.9781 0 398.45297 0
*>
*> 1b.rep78# 2o.rep78# 2o.rep78# 3o.rep78# 3.rep78# 4o.rep78# 4.rep78# 5o.rep78# 5o.rep78#
*> 1o.foreign 0b.foreign 1o.foreign 0b.foreign 1.foreign 0b.foreign 1.foreign 0b.foreign 1o.foreign
*> y1 0 0 0 0 3281.8889 0 5029.4026 0 0
*>
*>
*> _cons
*> y1 -7250.7912
提取这个矩阵的列名:
matrix A = e(b)
local namecol: colnames A
*- 打印出来:
foreach i in `namecol' {
di "`i'"
}
*> weight
*> 1b.rep78
*> 2.rep78
*> 3.rep78
*> 4.rep78
*> 5.rep78
*> 0b.foreign
*> 1.foreign
*> 1b.rep78#0b.foreign
*> 1b.rep78#1o.foreign
*> 2o.rep78#0b.foreign
*> 2o.rep78#1o.foreign
*> 3o.rep78#0b.foreign
*> 3.rep78#1.foreign
*> 4o.rep78#0b.foreign
*> 4.rep78#1.foreign
*> 5o.rep78#0b.foreign
*> 5o.rep78#1o.foreign
*> _cons
这些列名都是可以在 coefplot 的 keep() 选项中使用的:
coefplot, keep(3.rep78#1.foreign 4.rep78#1.foreign)
需要注意的是,e(b) 中值为 0 的不能用,这些是被 omit 的变量:
coefplot, keep(5o.rep78#1o.foreign)
*> (.: no coefficients found, all dropped, or none kept)
*> (nothing to plot)
也可以使用 foreach 循环提取 e(b) 中值不为 0 、含有 # 的进行绘图:
local namecol "`: colnames A'"
di "`namecol'"
*> weight 1b.rep78 2.rep78 3.rep78 4.rep78 5.rep78 0b.foreign 1.foreign 1b.rep78#0b.foreign 1b.rep78#1o.foreign 2o.re
*> > p78#0b.foreign 2o.rep78#1o.foreign 3o.rep78#0b.foreign 3.rep78#1.foreign 4o.rep78#0b.foreign 4.rep78#1.foreign 5
*> > o.rep78#0b.foreign 5o.rep78#1o.foreign _cons
local varlist = ""
foreach i in `namecol' {
local v = A[1, "`i'"]
if `v' != 0 & index("`i'" , "#") {
local varlist = "`varlist' `i'"
}
}
di "`varlist'"
*> 3.rep78#1.foreign 4.rep78#1.foreign
coefplot, keep(`varlist')
方法二:使用通配符
还可以使用通配符:
reg price weight rep78##foreign
coefplot, keep(*rep78*foreign*)
另外提取回归系数的时候也可以遵循这个方法流程:
sysuse auto, clear
reg price i.rep78
*> Source | SS df MS Number of obs = 69
*> -------------+---------------------------------- F(4, 64) = 0.24
*> Model | 8360542.63 4 2090135.66 Prob > F = 0.9174
*> Residual | 568436416 64 8881819 R-squared = 0.0145
*> -------------+---------------------------------- Adj R-squared = -0.0471
*> Total | 576796959 68 8482308.22 Root MSE = 2980.2
*>
*> ------------------------------------------------------------------------------
*> price | Coefficient Std. err. t P>|t| [95% conf. interval]
*> -------------+----------------------------------------------------------------
*> rep78 |
*> 2 | 1403.125 2356.085 0.60 0.554 -3303.696 6109.946
*> 3 | 1864.733 2176.458 0.86 0.395 -2483.242 6212.708
*> 4 | 1507 2221.338 0.68 0.500 -2930.633 5944.633
*> 5 | 1348.5 2290.927 0.59 0.558 -3228.153 5925.153
*> |
*> _cons | 4564.5 2107.347 2.17 0.034 354.5913 8774.409
*> ------------------------------------------------------------------------------
*- 查看返回值
eret list
*- 回归系数都会存储在 e(b) 中
mat list e(b)
*- 提取 e(b) 的列名
mat A = e(b)
mat list A
local namecol: colnames A
foreach i in `namecol' {
di "`i'"
}
*> 1b.rep78
*> 2.rep78
*> 3.rep78
*> 4.rep78
*> 5.rep78
*> _cons
所以虚拟变量的系数可以这样提取:
di _b[1b.rep78]
0
di _b[2.rep78]
1403.125
方法三:从返回值中提取数据使用 twoway 直接绘图
对该方法感兴趣的小伙伴也可以学习下这个课程:
「Stata:如何展示回归系数和置信区间」(https://rstata.duanshu.com/#/brief/course/2d3950fe01ed46119619b615dc8a90e4)
coefplot 虽然方便,但是有时候可能不能完全满足需要,所以也可以直接使用 twoway 绘制:
sysuse auto, clear
reg price weight rep78##foreign
由于 eret list 中的 e(b) 中只存储了回归系数,不足以直接绘图,所以下面我们还是使用 ret list 中的 r(table) 数据绘图:
ret list
mat m = r(table)
mat list m
*> m[9,19]
*> 1b. 2. 3. 4. 5. 0b.
*> weight rep78 rep78 rep78 rep78 rep78 foreign
*> b 3.8113843 0 435.98624 738.23368 -330.30942 3984.9781 0
*> se .46664575 . 1654.4874 1538.0279 1644.2232 2154.1333 .
*> t 8.1676181 . .26351742 .47998717 -.20089086 1.8499218 .
*> pvalue 2.538e-11 . .79305471 .63298176 .84146336 .06925106 .
*> ll 2.8779538 . -2873.4813 -2338.2803 -3619.2456 -323.93005 .
*> ul 4.7448147 . 3745.4537 3814.7476 2958.6267 8293.8862 .
*> df 60 60 60 60 60 60 60
*> crit 2.0002978 2.0002978 2.0002978 2.0002978 2.0002978 2.0002978 2.0002978
*> eform 0 0 0 0 0 0 0
*>
*> 1. 1b.rep78# 1b.rep78# 2o.rep78# 2o.rep78# 3o.rep78# 3.rep78#
*> foreign 0b.foreign 1o.foreign 0b.foreign 1o.foreign 0b.foreign 1.foreign
*> b 398.45297 0 0 0 0 0 3281.8889
*> se 1644.8673 . . . . . 2245.5759
*> t .24224019 . . . . . 1.461491
*> pvalue .80942015 . . . . . .1490985
*> ll -2891.7715 . . . . . -1209.9317
*> ul 3688.6775 . . . . . 7773.7094
*> df 60 60 60 60 60 60 60
*> crit 2.0002978 2.0002978 2.0002978 2.0002978 2.0002978 2.0002978 2.0002978
*> eform 0 0 0 0 0 0 0
*>
*> 4o.rep78# 4.rep78# 5o.rep78# 5o.rep78#
*> 0b.foreign 1.foreign 0b.foreign 1o.foreign _cons
*> b 0 5029.4026 0 0 -7250.7912
*> se . 2076.4342 . . 2066.713
*> t . 2.4221343 . . -3.5083687
*> pvalue . .01846747 . . .00086119
*> ll . 875.91573 . . -11384.833
*> ul . 9182.8895 . . -3116.7497
*> df 60 60 60 60 60
*> crit 2.0002978 2.0002978 2.0002978 2.0002978 2.0002978
*> eform 0 0 0 0 0
*>
r(table) 的列名实际上也是和 e(b) 一样的。
然后我们再把这个数据转换为 dta 数据:
local namecol: colnames m
local n: word count `namecol'
di "`n'"
*> 19
clear
set obs `n'
gen v = ""
forval i = 1/`n' {
local vi: word `i' of `namecol'
replace v = "`vi'" in `i'
}
gen b = .
gen ll = .
gen ul = .
forval i = 1/`n'{
replace b = m[1, `i'] in `i'
replace ll = m[5, `i'] in `i'
replace ul = m[6, `i'] in `i'
}
*- 删除被 omit 的变量
drop if b == 0
*- 有序因子化
gen id = _n
sencode v, gen(varn) gsort(-id)
list
*> +-------------------------------------------------------------------------------+
*> | v b ll ul id varn |
*> |-------------------------------------------------------------------------------|
*> 1. | weight 3.811384 2.877954 4.744815 1 weight |
*> 2. | 2.rep78 435.9862 -2873.481 3745.454 2 2.rep78 |
*> 3. | 3.rep78 738.2337 -2338.28 3814.748 3 3.rep78 |
*> 4. | 4.rep78 -330.3094 -3619.246 2958.627 4 4.rep78 |
*> 5. | 5.rep78 3984.978 -323.9301 8293.886 5 5.rep78 |
*> |-------------------------------------------------------------------------------|
*> 6. | 1.foreign 398.453 -2891.771 3688.677 6 1.foreign |
*> 7. | 3.rep78#1.foreign 3281.889 -1209.932 7773.709 7 3.rep78#1.foreign |
*> 8. | 4.rep78#1.foreign 5029.403 875.9157 9182.89 8 4.rep78#1.foreign |
*> 9. | _cons -7250.791 -11384.83 -3116.75 9 _cons |
*> +-------------------------------------------------------------------------------+
然后就可以使用 twoway 绘图了:
tw rspike ll ul varn, horizontal || ///
sc varn b, leg(off) m(o) ///
yti("变量") xti("系数估计值 + 95% 置信区间") ///
xline(0) yla(1(1)9, value)
如果想给 y 轴添加标签的话,可以这样:
tw rspike ll ul varn, horizontal || ///
sc varn b, leg(off) m(o) ///
yti("变量") xti("系数估计值 + 95% 置信区间") ///
xline(0) yla(9 "Weight (lbs.)" 8 "Repair record 1978 = 2" ///
7 "Repair record 1978 = 3" ///
6 "Repair record 1978 = 4" ///
5 "Repair record 1978 = 5" ///
4 "Foreign" ///
3 "Repair record 1978 = 3 # Foreign" ///
2 "Repair record 1978 = 4 # Foreign" ///
1 "_cons", value)
矩阵转换成 dta 也可以直接使用 svmat:
clear
svmat m
xpose, clear
keep v1 v5 v6
ren v1 b
ren v5 ll
ren v6 ul
*- 编写一个程序可以把列表转换成变量的(也可以保存成 list2dta.ado 文件方便以后使用)
cap prog drop list2dta
prog def list2dta
syntax anything, name(string)
local n: word count `anything'
if `n' > `=_N' set obs `n'
gen `name' = ""
forval i = 1/`n' {
local vi: word `i' of `anything'
replace v = "`vi'" in `i'
}
end
*- 然后就可以使用了
local namecol: colnames m
list2dta `namecol', name(v)
drop if b == 0
list
*> +------------------------------------------------------+
*> | b ll ul v |
*> |------------------------------------------------------|
*> 1. | 3.811384 2.877954 4.744815 weight |
*> 2. | 435.9862 -2873.481 3745.454 2.rep78 |
*> 3. | 738.2337 -2338.28 3814.748 3.rep78 |
*> 4. | -330.3094 -3619.246 2958.627 4.rep78 |
*> 5. | 3984.978 -323.9301 8293.886 5.rep78 |
*> |------------------------------------------------------|
*> 6. | 398.453 -2891.771 3688.677 1.foreign |
*> 7. | 3281.889 -1209.932 7773.709 3.rep78#1.foreign |
*> 8. | 5029.403 875.9157 9182.89 4.rep78#1.foreign |
*> 9. | -7250.791 -11384.83 -3116.75 _cons |
*> +------------------------------------------------------+
然后后面的操作就一样了。
方法四:使用 serset 提取图表数据修改后再重新绘制
对于这部分内容感兴趣的小伙伴还可以前往系列课程「Stata 编程导论」的第 15 课时:Do file 编程实战(六)学习,里面提供了更多的案例。
系列课程「Stata 编程导论」:https://rstata.duanshu.com/#/course/500094e7213744bc904e4e2cb270d135 Do file 编程实战(六):https://rstata.duanshu.com/#/course/27a3b7a55a2545a0a6281ae8fafaf15d
使用 coefplot 绘图之后我们可以使用 graph save 把图表保存为 gph 文件:
sysuse auto, clear
reg price weight rep78##foreign
coefplot
gr save "coefplot.gph", replace
重新调用:
clear
gr use coefplot.gph
为了下面使用方便,编写一个 setnames 命令:
*- 编写一个 setnames 命令(可以保存为 setnames.ado 文件)
capture program drop setnames
program define setnames
syntax anything
local varname
foreach i of varlist _all {
local varname = "`varname' `i'"
}
ren (`varname') (`anything')
end
gph 格式的图文件会保存绘制其的代码和数据,数据可以使用 serset 调用出来:
*- 查看绘图使用的数据(注意这个代码需要重 command 窗口运行,否则不会起效果)
serset dir
*> 0. 9 observations on 4 variables
*> __00000Q __00000R __00000F __00000I
serset 0
serset use, clear
查看绘图代码:
gr desc coefplot
*> name: coefplot
*> format: live
*> created: 1 Aug 2023 18:45:50
*> scheme: lightrstata
*> size: 4 x 6
*> dta file: /Applications/Stata/ado/base/a/auto.dta dated 3 Jun 2023 12:59
*> command: twoway (rspike __00000Q __00000R __00000F if __00000E==1, pstyle(p1) lwidth(*1)
*> horizontal) (scatter __00000F __00000I if __00000E==1, pstyle(p1) ), ylabel(1
*> `"Weight (lbs.)"' 2 `"Repair record 1978=2"' 3 `"Repair record 1978=3"' 4 `"Repair
*> record 1978=4"' 5 `"Repair record 1978=5"' 6 `"Foreign"' 7 `"Repair record 1978=3
*> # Foreign"' 8 `"Repair record 1978=4 # Foreign"' 9 `"_cons"', nogrid
*> angle(horizontal) ) ytick(1 2 3 4 5 6 7 8 9, notick tlstyle(none) grid )
*> yscale(range(.5 9.5)) yscale(reverse) yti("") xti("") legend(label(2 `"."') all
*> order(2) off) plotregion(margin(t=0 b=0))
稍微改动好看点:
gen __00000E = 1
twoway (rspike __00000Q __00000R __00000F if __00000E==1, ///
pstyle(p1) lwidth(*1) horizontal) ///
(scatter __00000F __00000I if __00000E==1, pstyle(p1) ), ///
ylabel(1 `"Weight (lbs.)"' 2 `"Repair record 1978=2"' ///
3 `"Repair record 1978=3"' 4 `"Repair record 1978=4"' ///
5 `"Repair record 1978=5"' 6 `"Foreign"' ///
7 `"Repair record 1978=3 # Foreign"' ///
8 `"Repair record 1978=4 # Foreign"' ///
9 `"_cons"', nogrid angle(horizontal) ) ///
ytick(1 2 3 4 5 6 7 8 9, notick tlstyle(none) grid ) ///
yscale(range(.5 9.5)) yscale(reverse) yti("") xti("") ///
legend(label(2 `"."') all order(2) off) ///
plotregion(margin(t=0 b=0))
可以看到交乘项是 __00000F = 2 或者 __00000F = 3
keep if inlist(__00000F, 2, 3)
然后就可以重新绘制了:
twoway (rspike __00000Q __00000R __00000F if __00000E==1, ///
pstyle(p1) lwidth(*1) horizontal) ///
(scatter __00000F __00000I if __00000E==1, pstyle(p1) ), ///
ylabel(2 `"Repair record 1978=2"' ///
3 `"Repair record 1978=3"', nogrid angle(horizontal) ) ///
ytick(2 3, notick tlstyle(none) grid ) ///
yscale(range(1.5 3.5)) yscale(reverse) yti("") xti("") ///
legend(label(2 `"."') all order(2) off) ///
plotregion(margin(t=0 b=0))
这样我们就非常彻底的解决了这个问题!
直播信息
为了让大家更好的理解上面的内容,欢迎各位培训班会员参加明晚 8 点的直播课 「Stata:coefplot 如何选择交互项系数估计值进行绘图」
直播地址:腾讯会议(需要报名 RStata 培训班参加) 讲义材料:需要报名 RStata 培训班,详情可阅读:一起来学习 R 语言和 Stata 啦!学习过程中遇到的问题也可以随时提问!
更多关于 RStata 会员的更多信息可添加微信号 r_stata 咨询:
附件下载(点击文末的阅读原文即可跳转):
https://rstata.duanshu.com/#/brief/course/e877d6b6020447e89530f36639d10c6f