Stata:ado 编程案例|从 rolling 命令结果中提取数据 & 使用 ml 命令估计均值和方差
为了让大家更好的理解本文的内容,欢迎各位培训班会员参加明天晚上 8 点的直播课:「Stata:ado 命令编程案例:从 rolling 命令结果中提取数据 & 使用 ml 命令估计均值和方差」
该课程是「Stata 编程导论的最后一个课时」,课程主页(点击文末的阅读原文即可跳转):
https://rstata.duanshu.com/#/brief/course/500094e7213744bc904e4e2cb270d135
在前面四次课中我们学习了一些 Stata 中编写命令的技巧,今天我们将通过两个案例来讲解如何通过 ado 命令编程来解决问题。
从 rolling 命令结果中提取数据; 使用 ml 命令估计均值和方差以及进行有约束的最大似然估计。
从 rolling 命令结果中提取数据
之前的课程中也讲解过 rolling 命令:
use ibm.dta, clear
reg ibm spx
rolling _b _se, window(90) saving(capm, replace): reg ibm spx
use capm, clear
label variable _b_spx "IBM beta"
label variable end "日期"
list in 1/5
tsset end
format end %tdMonYY
tsline _b_spx, yline(1) ylabel(, angle(0) format(%6.1f))
上面的代码是研究 CAPM 模型中的动态贝塔,在 CAPM 模型模型中,beta = 1 意味着股票的波动率等于指数的波动率,beta < 1 意味着股票的波动率小于指数的波动率,beta > 1 意味着股票的波动率高于指数的波动率。
rolling _b _se, window(90) saving(capm, replace): reg ibm spx
命令中我们设定窗宽为 90 天,然后从 reg 中提取 _b
和 _se
保存到 capm.dta 数据中。这里的 _b
和 _se
都是 reg 的返回结果。
再考虑这样的一个代码:
reg y L(1/4).x
我们想对 y 和 x 的1~4阶滞后进行滚动窗口回归,然后提取所有滞后项的回归系数的和,这一统计量被称为稳态效应(steady-state effect of x on y)。不过 reg 并不会直接返回这一结果,因此我们需要自行编写一个返回这一统计量的回归命令:
cap prog drop myregress
prog myregress, rclass
version 14
syntax varlist(ts) [if] [in], LAGVar(string) NLAGs(integer)
reg `varlist' `if' `in'
local nl1 = `nlags' - 1
forval i = 1/`nl1' {
local lv "`lv' L`i'.`lagvar' + "
}
local lv "`lv' L`nlags'.`lagvar'"
lincom `lv'
ret scalar sum = `r(estimate)'
ret scalar se = `r(se)'
end
use wpi1.dta, clear
myregress wpi L(1/4).wpi t, lagvar(wpi) nlags(4)
*> Source | SS df MS Number of obs = 120
*> -------------+---------------------------------- F(5, 114) = 41601.52
*> Model | 108199.912 5 21639.9823 Prob > F = 0.0000
*> Residual | 59.2997117 114 .520172909 R-squared = 0.9995
*> -------------+---------------------------------- Adj R-squared = 0.9994
*> Total | 108259.211 119 909.741272 Root MSE = .72123
*>
*> ------------------------------------------------------------------------------
*> wpi | Coefficient Std. err. t P>|t| [95% conf. interval]
*> -------------+----------------------------------------------------------------
*> wpi |
*> L1. | 1.43324 .0947574 15.13 0.000 1.245526 1.620954
*> L2. | -.3915563 .1648926 -2.37 0.019 -.7182073 -.0649053
*> L3. | .1669584 .1693717 0.99 0.326 -.1685657 .5024825
*> L4. | -.2276451 .0960385 -2.37 0.019 -.4178967 -.0373936
*> |
*> t | .0184368 .0071336 2.58 0.011 .0043052 .0325684
*> _cons | .2392324 .1641889 1.46 0.148 -.0860246 .5644894
*> ------------------------------------------------------------------------------
*>
*> ( 1) L.wpi + L2.wpi + L3.wpi + L4.wpi = 0
*>
*> ------------------------------------------------------------------------------
*> wpi | Coefficient Std. err. t P>|t| [95% conf. interval]
*> -------------+----------------------------------------------------------------
*> (1) | .9809968 .0082232 119.30 0.000 .9647067 .9972869
*> ------------------------------------------------------------------------------
然后我们就可以在 rolling 里面使用这个命令了:
rolling sum = r(sum) se = r(se), window(30): ///
myregress wpi L(1/4).wpi t, lagvar(wpi) nlags(4)
list in 1/10
*> +---------------------------------------+
*> | start end sum se |
*> |---------------------------------------|
*> 1. | 1960q1 1967q2 .7892814 .1078253 |
*> 2. | 1960q2 1967q3 .8390291 .0912525 |
*> 3. | 1960q3 1967q4 .8450157 .0861015 |
*> 4. | 1960q4 1968q1 .8706701 .0881043 |
*> 5. | 1961q1 1968q2 .8765603 .0838703 |
*> |---------------------------------------|
*> 6. | 1961q2 1968q3 .8385788 .0816582 |
*> 7. | 1961q3 1968q4 .8797001 .0848513 |
*> 8. | 1961q4 1969q1 .9109453 .0941255 |
*> 9. | 1962q1 1969q2 .9166081 .0940764 |
*> 10. | 1962q2 1969q3 .878272 .0935893 |
*> +---------------------------------------+
然后再绘图展示下:
tsset end, quarterly
label var end "结束时间"
gen lo = sum - 1.96 * se
gen hi = sum + 1.96 * se
tw rarea hi lo end, color(gs12) yla(, ang(0) format(%6.1f)) ///
ti("滞后项移动窗口回归系数和的估计值及 95% 置信区间") || ///
tsline sum, leg(off) xti("")
使用 ml 命令估计均值和方差
在 statalist 上有这么一个问题:
总体服从正态分布, 和 是两个子样本的标准差, 和 是两个子样本的均值。如何计算下面的非线性组合:
以及:
还有就是如何计算同方差约束下的 。
这个问题可以使用 ml 命令解决。我们考虑 auto.dta 数据的案例,两个子样本分别是 foreign == 0 和 foreign == 1 的,为此我们需要先编写一个 ml 命令:
cap prog drop meanvar
prog def meanvar
version 14
args lnf mu1 mu2 sigma1 sigma2
qui replace `lnf' = ln(normalden($ML_y1, `mu1', `sigma1')) if $subsample == 0
qui replace `lnf' = ln(normalden($ML_y1, `mu2', `sigma2')) if $subsample == 1
end
然后就可以使用了:
sysuse auto, clear
global subsample foreign
ml model lf meanvar (mu1: price = ) (mu2: price =) /sigma1 /sigma2
ml maximize, nolog
*> initial: log likelihood = -<inf> (could not be evaluated)
*> feasible: log likelihood = -799.413
*> rescale: log likelihood = -720.80249
*> rescale eq: log likelihood = -703.54877
*>
*> Number of obs = 74
*> Wald chi2(0) = .
*> Log likelihood = -695.14898 Prob > chi2 = .
*>
*> ------------------------------------------------------------------------------
*> | Coefficient Std. err. z P>|z| [95% conf. interval]
*> -------------+----------------------------------------------------------------
*> mu1 |
*> _cons | 6072.423 425.3414 14.28 0.000 5238.769 6906.077
*> -------------+----------------------------------------------------------------
*> mu2 |
*> _cons | 6384.682 546.1422 11.69 0.000 5314.263 7455.101
*> -------------+----------------------------------------------------------------
*> /sigma1 | 3067.18 300.7616 10.20 0.000 2477.698 3656.662
*> /sigma2 | 2561.634 386.1807 6.63 0.000 1804.734 3318.534
*> ------------------------------------------------------------------------------
nlcom (/sigma1 - /sigma2) / (/sigma1 + /sigma2)
*> _nl_1: (/sigma1 - /sigma2) / (/sigma1 + /sigma2)
*>
*> ------------------------------------------------------------------------------
*> | Coefficient Std. err. z P>|z| [95% conf. interval]
*> -------------+----------------------------------------------------------------
*> _nl_1 | .089814 .0891949 1.01 0.314 -.0850049 .2646329
*> ------------------------------------------------------------------------------
nlcom 2 * _pi * sqrt(3) * (([mu1]_cons - [mu2]_cons) / (/sigma1 + /sigma2))
*> _nl_1: 2 * _pi * sqrt(3) * (([mu1]_cons - [mu2]_cons) / (/sigma1 + /sigma2))
*>
*> ------------------------------------------------------------------------------
*> | Coefficient Std. err. z P>|z| [95% conf. interval]
*> -------------+----------------------------------------------------------------
*> _nl_1 | -.6037233 1.339398 -0.45 0.652 -3.228896 2.021449
*> ------------------------------------------------------------------------------
添加约束可以使用 constraint 命令:
constraint 1 /sigma1 = /sigma2
ml model lf meanvar (mu1: price = ) (mu2: price = ) /sigma1 /sigma2, constraints(1)
ml maximize, nolog
*> initial: log likelihood = -<inf> (could not be evaluated)
*> feasible: log likelihood = -955.69077
*> rescale: log likelihood = -712.06522
*> rescale eq: log likelihood = -701.49041
*>
*> Number of obs = 74
*> Wald chi2(0) = .
*> Log likelihood = -695.62494 Prob > chi2 = .
*>
*> ( 1) [/]sigma1 - [/]sigma2 = 0
*> ------------------------------------------------------------------------------
*> | Coefficient Std. err. z P>|z| [95% conf. interval]
*> -------------+----------------------------------------------------------------
*> mu1 |
*> _cons | 6072.423 405.766 14.97 0.000 5277.136 6867.71
*> -------------+----------------------------------------------------------------
*> mu2 |
*> _cons | 6384.682 623.8296 10.23 0.000 5161.998 7607.365
*> -------------+----------------------------------------------------------------
*> /sigma1 | 2926.02 240.5173 12.17 0.000 2454.615 3397.426
*> /sigma2 | 2926.02 240.5173 12.17 0.000 2454.615 3397.426
*> ------------------------------------------------------------------------------
nlcom 2 * _pi * sqrt(3) * (([mu1]_cons - [mu2]_cons) / (/sigma1 + /sigma2))
*> _nl_1: 2 * _pi * sqrt(3) * (([mu1]_cons - [mu2]_cons) / (/sigma1 + /sigma2))
*>
*> ------------------------------------------------------------------------------
*> | Coefficient Std. err. z P>|z| [95% conf. interval]
*> -------------+----------------------------------------------------------------
*> _nl_1 | -.5806946 1.38475 -0.42 0.675 -3.294756 2.133366
*> ------------------------------------------------------------------------------
下面我们可以对这个约束条件使用似然比检验:
ml model lf meanvar (mu1: price = ) (mu2: price = ) /sigma1 /sigma2
qui ml maximize, nolog
est store unconstr
ml model lf meanvar (mu1: price = ) (mu2: price = ) /sigma1 /sigma2, constraints(1)
qui ml maximize, nolog
est store constr
lrtest unconstr constr
*> Likelihood-ratio test
*> Assumption: constr nested within unconstr
*>
*> LR chi2(1) = 0.95
*> Prob > chi2 = 0.3292
结果表明无法拒绝同方差约束。
上面的约束是个等式约束,那么我们该如何添加不等式约束呢?例如标准差应该始终是个正数。为此,我们可以使用下面的方法:
cap prog drop mynormal_lf
prog def mynormal_lf
version 14
args lnf mu lnsigma
qui replace `lnf' = ln(normalden($ML_y1, `mu', exp(`lnsigma')))
end
通过把 sigma 替换成 lnsigma,然后在程序中使用 exp(lnsigma)
就可以确保标准差是个正数了。
在前面的课程中我们还根据 mynormal_lf 程序编写了 mynormal 命令,不过如果我们仅仅像上面的代码一样修改了 mynormal_lf,mynormal 命令的结果中只会汇报 lnsigma,可以使用 diparm() 设置计算汇报 sigma:
clear all
prog def mynormal
version 14
if replay() {
if ("`e(cmd)'" != "mynormal") error 301
Replay `0'
}
else Estimate `0'
end
prog def Replay
syntax [, Level(cilevel)]
ml display, level(`level')
end
prog def Estimate, eclass sortpreserve
syntax varlist [if] [in] [, vce(passthru) Level(cilevel) *]
mlopts mlopts, `options'
gettoken lhs rhs: varlist
marksample touse
ml model lf mynormal_lf (mu: `lhs' = `rhs') /lnsigma if ///
`touse', `vce' `mlopts' maximize diparm(lnsigma, exp label("sigma"))
ereturn local cmd "mynormal"
Replay, level(`level')
end
prog def mynormal_lf
version 14
args lnf mu lnsigma
qui replace `lnf' = ln(normalden($ML_y1, `mu', exp(`lnsigma')))
end
sysuse auto, clear
mynormal price mpg weight turn
*> initial: log likelihood = -<inf> (could not be evaluated)
*> feasible: log likelihood = -811.54531
*> rescale: log likelihood = -811.54531
*> rescale eq: log likelihood = -808.73926
*> Iteration 0: log likelihood = -808.73926
*> Iteration 1: log likelihood = -729.21876 (not concave)
*> Iteration 2: log likelihood = -708.4913 (not concave)
*> Iteration 3: log likelihood = -702.40594
*> Iteration 4: log likelihood = -678.52387
*> Iteration 5: log likelihood = -677.74671
*> Iteration 6: log likelihood = -677.74638
*> Iteration 7: log likelihood = -677.74638
*>
*> Number of obs = 74
*> Wald chi2(3) = 46.26
*> Log likelihood = -677.74638 Prob > chi2 = 0.0000
*>
*> ------------------------------------------------------------------------------
*> price | Coefficient Std. err. z P>|z| [95% conf. interval]
*> -------------+----------------------------------------------------------------
*> mu |
*> mpg | -72.86501 79.06769 -0.92 0.357 -227.8348 82.10481
*> weight | 3.524339 .7947479 4.43 0.000 1.966661 5.082016
*> turn | -395.1902 119.2837 -3.31 0.001 -628.9819 -161.3985
*> _cons | 12744.24 4629.664 2.75 0.006 3670.27 21818.22
*> -------------+----------------------------------------------------------------
*> lnsigma |
*> _cons | 7.739796 .0821995 94.16 0.000 7.578688 7.900904
*> -------------+----------------------------------------------------------------
*> sigma | 2298.004 188.8948 1956.062 2699.723
*> ------------------------------------------------------------------------------
使用类似的技巧我们也可以为回归参数添加不等式约束,假如我们想设置 mpg 的系数为负数的约束,可以考虑使用 -exp(x) 函数的性质。把 price 对 mpg、weight 和 turn 的回归分成两部分:
-exp(a) * mpg weight + turn
因此只需要在 2 的回归中把均值加上 -exp(a) * mpg 即可:
cap prog drop mynormal_lf_c1
prog def mynormal_lf_c1
version 14
args lnfj a xb lnsigma
tempvar mu
qui gen double `mu' = `xb' + (- exp(`a')* $x1)
qui replace `lnfj' = ln(normalden($ML_y1, `mu', exp(`lnsigma')))
end
global x1 = "mpg"
ml model lf mynormal_lf_c1 (a:) (mu: price = weight turn) /lnsigma, ///
maximize nolog ///
diparm(lnsigma, exp label("sigma")) ///
diparm(a, function(-exp(@)) d(-exp(@)) label("mpg"))
ml display
*> Number of obs = 74
*> Wald chi2(0) = .
*> Log likelihood = -677.74638 Prob > chi2 = .
*>
*> ------------------------------------------------------------------------------
*> price | Coefficient Std. err. z P>|z| [95% conf. interval]
*> -------------+----------------------------------------------------------------
*> a |
*> _cons | 4.288602 1.085117 3.95 0.000 2.161811 6.415393
*> -------------+----------------------------------------------------------------
*> mu |
*> weight | 3.524342 .7947446 4.43 0.000 1.966671 5.082012
*> turn | -395.1901 119.2837 -3.31 0.001 -628.9818 -161.3984
*> _cons | 12744.22 4629.629 2.75 0.006 3670.316 21818.13
*> -------------+----------------------------------------------------------------
*> /lnsigma | 7.739796 .0821995 94.16 0.000 7.578688 7.900904
*> -------------+----------------------------------------------------------------
*> sigma | 2298.004 188.8948 1956.062 2699.723
*> mpg | -72.86453 79.06655 -611.1806 -8.686858
*> ------------------------------------------------------------------------------
也可以使用无约束的回归结果作为初始值来节约拟合时间:
qui reg price mpg weight turn
mat b0 = e(b), ln(e(rmse))
ml model lf mynormal_lf_c1 (a:) (mu: price = weight turn) /lnsigma, ///
maximize nolog init(b0, copy) ///
diparm(lnsigma, exp label("sigma")) ///
diparm(a, function(-exp(@)) d(-exp(@)) label("mpg"))
ml display
nlcom -exp([a]_cons)
*> _nl_1: -exp([a]_cons)
*>
*> ------------------------------------------------------------------------------
*> price | Coefficient Std. err. z P>|z| [95% conf. interval]
*> -------------+----------------------------------------------------------------
*> _nl_1 | -72.85979 79.06695 -0.92 0.357 -227.8282 82.10859
*> ------------------------------------------------------------------------------
直播信息
为了让大家更好的理解上面的内容,欢迎各位培训班会员参加明天晚上 8 点的直播课:「Stata:ado 命令编程案例:从 rolling 命令结果中提取数据 & 使用 ml 命令估计均值和方差」
直播地址:腾讯会议(需要报名 RStata 培训班参加) 讲义材料:需要报名 RStata 培训班,详情可阅读:一起来学习 R 语言和 Stata 啦!学习过程中遇到的问题也可以随时提问!
更多关于 RStata 会员的更多信息可添加微信号 r_stata 咨询: