查看原文
其他

Stata16新增功能有哪些? 满满干货拿走不谢

计量经济圈整理 计量经济圈 2021-09-19


凡是搞计量经济的,都关注这个号了

邮箱:econometrics666@sina.cn

所有计量经济圈方法论丛的程序文件, 微观数据库和各种软件都放在社群里.欢迎到计量经济圈社群交流访问.

差不多2年前,咱们引荐了Stata15版新功能,你竟然没有想到,一睹为快且在1年前,咱们又引荐了正在改变世界的30个计量方法, 不学习你就淘汰了。为了适应快速变化的学术环境,Stata公司也在努力迭代出第16版本,不然失去了与其他开源软件比如R, Python竞争的资本。下面的每一种新功能,咱们都附上了相应的materials,青年学者们可以参看每一篇文章。


Stata 16 is a big release, which our releases usually are. This one is broader than usual. It ranges from lasso to Python and from multiple datasets in memory to multiple chains in Bayesian analysis.

Stata 16新增特色如下:

对变量的最大数目进行了扩容。Oh, and in Stata/MP, Stata matrices can now be up to 65,534 x 65,534, meaning you can fit models with over 65,000 right-hand-side variables. Meanwhile, Mata matrices remain limited only by memory.

Here are my comments on the highlights.

1. Lasso, both for prediction and for inference

关于Lasso回归,可参看:

a.回归方法深度剖析(OLS, RIDGE, ENET, LASSO, SCAD, MCP, QR)

b.高维回归方法: Ridge, Lasso, Elastic Net用了吗

c.测量误差和新回归法则,引子

d.共线性、过度/不能识别问题的Solutions

There are two parts to our implementation of lasso: prediction and inference. I suspect inference will be of more interest to our users, but we needed prediction to implement inference. By the way, when I say lasso, I mean lasso, elastic net, and square-root lasso, but if you want a features list, click the title.

Let’s start with lasso for prediction. If you type,用于预测的Lasso

. lasso linear y x1 x2 x3 ... x999

lasso will select the covariates from the x‘s specified and fit the model on them. lasso will be unlikely to choose the covariates that belong in the true model, but it will choose covariates that are collinear with them, and that works a treat for prediction. If English is not your first language, by “works a treat”, I mean great. Anyway, the lasso command is for prediction, and standard errors for the covariates it selects are not reported because they would be misleading.

Concerning inference, we provide four lasso-based methods: double selection, cross-fit partialing out, and two more. If you type,用于推断的Lasso

. dsregress y x1, controls(x2-x999)

then, conceptually but not actually, y will be fit on x1 and the variables lasso selects from x2-x999. That’s not how the calculation is made because the variables lasso selects are not identical to the true variables that belong in the model. I said earlier that they are correlated with the true variables, and they are. Another way to think about selection is that lasso estimates the variables to be selected and, as with all estimation, that is subject to error. Anyway, the inference calculations are robust to those errors. Reported will be the coefficient and its standard error for x1. I specified one variable of special interest in the example, but you can specify however many you wish.

2. Reproducible and automatically updating reports

可复制和自动更新报告

The inelegant title above is trying to say (1) reports that reproduce themselves just as they were originally and (2) reports that, when run again, update themselves by running the analysis on the latest data. Stata has always been strong on both, and we have added more features. I don’t want to downplay the additions, but neither do I want to discuss them. Click the title to learn about them.

I think what’s important is another aspect of what we did. The real problem was that we never told you how to use the reporting features. Now we do in an all-new manual. We tell you and we show you, with examples and workflows. Here’s a link to the manual so you can judge for yourself.

3. New meta-analysis suite

元分析:对已有文献的研究结论可再进行分析

Stata is known for its community-contributed meta-analysis. Now there is an official StataCorp suite as well. It’s complete and easy to use. And yes, it has funnel plots and forest plots, and bubble plots and L’Abbé plots.

4. Revamped and expanded choice modeling (marginsworks everywhere)

选择模型重写及进一步扩展

a.条件Logit绝对不输多项Logit,而混合模型最给力

b.随机系数Logit模型及Stata实现

c.一个完整的实证程序, 以logit或ologit为例

d.混合Logit模型跨越标准Logit模型三座大山

Choice modeling is jargon for conditional logit, mixed logit, multinomial probit, and other procedures that model the probability of individuals making a particular choice from the alternatives available to each of them.

We added a new command to fit mixed logit models, and we rewrote all the rest. The new commands are easier to use and have new features. Old commands continue to work under version control.

margins can now be used after fitting any choice model. margins answers questions about counterfactuals and can even answer them for any one of the alternatives. You can finally obtain answers to questions like, “How would a $10,000 increase in income affect the probability people take public transportation to work?”

The new commands are easier to use because you must first cmset your data. That may not sound like a simplification, but it simplifies the syntax of the remaining commands because it gets details out of the way. And it has another advantage. It tells Stata what your data should look like so Stata can run consistency checks and flag potential problems.

Finally, we created a new [CM] Choice Modeling Manual. Everything you need to know about choice modeling can now be found in one place.

5. Integration of Python with Stata

Stata里可以使用Python软件包,这个非常必要

a.Python中的计量回归模块及所有模块概览

b.回归、分类与聚类:三大方向剖解机器学习算法的优缺点(附Python和R实现)

c.空间计量软件代码资源集锦(Matlab/R/Python/SAS/Stata)

If you don’t know what Python is, put down your quill pen, dig out your acoustic modem and plug it in, push your telephone handset firmly into the coupler, and visit Wikipedia. Python has become an exceedingly popular programming language with extensive libraries for writing numerical, machine learning, and web scraping routines.

Stata’s new relationship with Python is the same as its relationship with Mata. You can use it interactively from the Stata prompt, in do-files, and in ado-files. You can even put Python subroutines at the bottom of ado-files, just as you do Mata subroutines. Or put both. Stata’s flexible.

Python can access Stata results and post results back to Stata using the Stata Function Interface (sfi), the Python module that we provide.

6. Bayesian predictions, multiple chains, and more

贝叶斯新功能,这个是将来很重要的发展方向

a.贝叶斯回归模型,选取数据先验分布,以贝叶斯逻辑回归为例

b.贝叶斯估计和蒙特卡罗模拟什么鬼?

c.马尔科夫蒙特卡洛方法(MH算式),来做贝叶斯回归估计

d.贝叶斯因子及其在 JASP 中的实现,传说中的贝叶斯统计是什么?

e.再谈贝叶斯估计, 从MCMC和MH算法说起

f.贝叶斯线性回归方法的解释和优点

g.详解最大似然估计, 最大后验概率估计和贝叶斯公式

h.Bayesian模型预测柯南中被害人和凶手,贝叶斯估计的前奏

We have lots of new Bayesian features.

We now have multiple chains. Has the MCMC converged? Estimate models using multiple chains, and reported will be the maximum of Gelman-Rubin convergence diagnostic. If it has not yet converged, do more simulations. Still hasn’t converged? Now you can obtain the Gelman-Rubin convergence diagnostic for each parameter. If the same parameter turns up again and again as the culprit, you know where the problem lies.

We now provide Bayesian predictions for outcomes and functions of them. Bayesian predictions are calculated from the simulations that were run to fit your model, so there are a lot of them. The predictions will be saved in a separate dataset. Once you have the predictions, we provide commands so that you can graph summaries of them and perform hypothesis testing. And you can use them to obtain posterior predictive p-values to check the fit of your model.

7. Extended regression models (ERMs) for panel data

面板数据的扩展回归模型,这个是解决面板数据内生性、自选择等最强大的框架

a.你的内生性解决方式out, ERM已一统天下而独领风骚

b.面板数据是怎样处理内生性的,一篇让人豁然明朗的文章

c.非线性面板模型中内生性解决方案以及Stata命令

d.面板数据里处理多重高维固定效应的神器, 还可用工具变量处理内生性

e.内生性问题操作指南, 广为流传的22篇文章

f.处理效应模型选择标准,NNM和PSM,赠书活动

ERMs fits models with problems. These problems can be any combination of (1) endogenous and exogenous sample selection, (2) endogenous covariates, also known as unobserved confounders, and (3) nonrandom treatment assignment.

What’s new is that ERMs can now be used to fit models with panel (2-level) data. Random effects are added to each equation. Correlations between the random effects are reported. You can test them, jointly or singly. And you can suppress them, jointly or singly.

Ermistatas got a fourth antenna.

8. Importing of SAS and SPSS datasets

a.6张图掌握Stata软件的方方面面, 还有谁, 还有谁?

New command import sas imports .sas7bdat data files and .sas7bcat value-label files.

New command import spss imports IBM SPSS version 16 or higher .sav and .zsav files.

I recommend using them from their dialog boxes. You can preview the data and select the variables and observations you want to import.

9. Flexible nonparametric series regression

弹性的非参数估计,非参数估计在将来是一个重大发展方向

a.非参数估计的根基,核密度估计大陈述

b.分位数回归, Oaxaca分解, Quaids模型, 非参数估计程序

c.非参数bootstrap方法, 小数据集统计的大能手

d.半参数估计思想和Stata操作示例

New command npregress series fits models like

y = g(x1x2x3) + ε

No functional-form restrictions are placed on g(), but you can impose separability restrictions. The new command can fit

y = g1(x1) + g2(x2x3) + ε

y = g1(x1x2) + g3(x3) + ε

y = g1(x1x3) + g2(x2) + ε

and even fit

y = b1x1 + g2(x2x3) + ε

y = b1x1 + b2x2 + g3(x3) + ε

I mentioned that lasso can perform inference in models like

. dsregress y x1, controls(x2-x999)

If you know that variables x12, x19, and x122 appear in the model, but do not know the functional form, you could use npregress series to obtain inference. The command

. npregress series y x12 x19 x122, asis(x1)

fits

y = b1x1 + g2(x12x19x122) + ε

and, among other statistics, reports the coefficient and standard error of b1.

10. Multiple datasets in memory, meaning frames

这为多窗口、多任务、多数据等操作提供便利

I’m a sucker for data management commands. Even so, I do not think I’m exaggerating when I say that frames will change the way you work. If you are not interested, bear with me. I think I can change your mind.

You can have multiple datasets in memory. Each is stored in a named frame. At any instant, one of the frames is the current frame. Most Stata commands operate on the data in the current frame. It’s the commands that work across frames that will change the way you work, but before you can use them, you have to learn how to use frames. So here’s a bit of me using frames:

. use persons

. frame create counties

. frame counties: use counties

. tabulate cntyid

. frame counties: tabulate cntyid

Well, I’m thinking at this point, it appears I could merge persons.dta with counties.dta, except I’m not thinking about merging them. I’m thinking about linking them.

. frlink m:1 cntyid, frame(counties)

Linking is frame’s equivalent of merge. It does not change either dataset except to add one variable to the data in the current frame. New variable counties is created in this case. If I were to drop the variable, I would eliminate the link, but I’m not going to do that. I’m curious whether the counties in which people reside in persons.dta were all found in counties.dta. I can find out by typing

. count if counties==.

If 1,000 were reported, I would now drop counties, and it would be as if I had never linked the two frames.

Let’s assume count reported 0. Or 4, which is a small enough number that I don’t care for this demonstration. Now watch this:

. generate relinc = income / frget(counties, medinc)

I just calculated each person’s income relative to the median income in the county in which he or she resides, and median income was in the counties dataset, not the persons dataset!

Next, I will copy to the current frame all the variables in counties that start with pop. The command that does this, frget, will use the link and copy the appropriate observations.

. frget pop*, from(counties)

. describe pop*

. generate ln_pop18plus = ln(pop18plus)

. generate ln_income = ln(income)

. correlate ln_income ln_pop18plus

I hope I have convinced you that frames are of interest. If not, this is only one of the five ways frames will change how you work with Stata. Maybe one of the other four ways will convince you. Visit the overview of frames page at stata.com.

11. Sample-size analysis for confidence intervals

The goal is to optimally allocate study resources when CIs are to be used for inference or, said differently, to estimate the sample size required to achieve the desired precision of a CI in a planned study. One mean, two independent means, or two paired means. Or one variance.

12. Nonlinear DSGE models

非线性的DSGE模型,DSGE一般在Dynare里实现,Stata希望能够包容

a.DSGE模型的参数估计, 宏观经济学的神经中枢

b.宏观计量的演进(Macroeconometrics)

c.宏观经济学的名词解释,你知道多少

d.VAR宏观计量模型演进与发展,无方向确认推断更好

e.应用VAR模型时的15个注意点,总结得相当地道

f.向量自回归VAR模型操作指南针,为微观面板VAR铺基石

g.2018年诺贝尔经济学奖: 诺德豪斯和罗默, 宏观经济学春天真的来了

DSGE stands for Dynamic Stochastic General Equilibrium. Stata previously fit linear DSGEs. Now it can fit nonlinear ones too.

I know this either interests you or does not, and if it does not, there will be no changing your mind. It interests me, and what makes the new feature spectacular is how easy models are to specify and how readable the code is afterwards. You could almost teach from it. If this interests you, click through.

13. Multiple-group IRT

心理学和管理学领域使用者较多,以后与SEM,Latent growth model等一起讲解

IRT (Item Response Theory) is about the relationship between latent traits and the instruments designed to measure them. An IRT analysis might be about scholastic ability (the latent trait) and a college admission test (the instrument).

Stata 16’s new IRT features produce results for data containing different groups of people. Do instruments measure latent traits in the same way for different populations?

Here is an example. Do students in urban and rural schools perform differently on a test intended to measure mathematical ability? Using Stata 16, you can fit a 2-parameter logistic model comparing the groups by typing

. irt 2pl item1-item10, group(urbanrural)

What’s new is the group() option.

Does an instrument measuring depression perform the same today as it did five years ago? You can fit a graded-response model that compares the groups by typing

. irt grm item-item10, group(timecategory)

And IRT’s postestimation graphs have been updated to reveal the differences among groups when a group() model has been fit.

The examples I mentioned both concerned two groups, but IRT can handle any number of them.

14. Panel-data Heckman-selection models

面板数据的Heckman选择偏差矫正,这个对于”面板数据研究小组“分析者重要

a.面板数据中heckman方法和程序, 动态, 0-1面板和内生性选择都行

b.Heckman两步法的内生性问题

c.Heckman模型out了,内生转换模型掌控大局

d.PSM, RDD, Heckman, Panel模型的操作程序, selective文章精华系列

Heckman selection adjusts for bias when some outcomes are missing not at random.

The classic example is economists’ modeling of wages. Wages are observed only for those who work, and whether you work is unlikely to be random. Think about it. Should I work or go to school? Should I work or live off my meager savings? Should I work or retire? Few people would be willing to make those decisions by flipping a coin.

If you worry about such problems and are using panel data, the new xtheckman command is the solution.

15-22. Seven more new features

更多的特色功能如下:

I will summarize the last seven features briefly. My briefness makes them no less important, especially if they interest you.

15. NLMEs with lags: multiple-dose pharmacokinetic models and more can now be fit by Stata’s menl command for fitting nonlinear mixed-effects regression. This includes fitting multiple-dose models.

16. Heteroskedastic ordered probit joins the ordered probit models that Stata already could fit.

17. Graph sizes in inches, centimeters, and printer points can now be specified. Specify 1in1.4cm, or 12pt.

a.Stata统计功能、数据作图、学习资源等,一文打尽所有你的wonders

b.计量画图示例大作,中国经济区域发展,倾情推荐

c.史上最全Stata绘图技巧, 女生的最爱

d.图说,计量经济学的前世今生 ,跌跌撞撞中得以发展壮大

e.这些论文绘图软件,你一个都不会用

f.6张图掌握Stata软件的方方面面, 还有谁, 还有谁?

g.中国地图里的南海诸岛在哪里, 绘制指南

18. Programmers: Mata’s new Quadrature class numerically integrates y = f(x) over the interval a to b, where a may be -∞ or finite and b may be finite or +∞.

a.矩阵操作的Mata语言, 学习资料全在这里

b.Mata语言相关函数汇总, 高级语言运算操作

19. Programmers: Mata’s new Linear programming class solves linear programs using an interior-point method. It minimizes or maximizes a linear objective function subject to linear constraints (equality and inequality) and boundary conditions.

a.编程语言中的函数什么鬼?Stata所有函数在此集结

b.贬称编程Stata, 不可能后悔的10篇文章, 编程code和注解

20. Do-file Editor: Autocompletion and more. The editor now provides syntax highlighting for Python and Markdown. And it autocompletes Stata commands, quotes, parentheses, braces, and brackets. Last but not least, spaces as well as tabs can be used for indentation.

自带编辑器功能更好,能够自动识别和Stata程序,括号等等,可以高亮显示Python和Markdown的syntax

21. Stata for Mac: Dark Mode and tabbed windows. Dark mode is a color scheme that darkens background windows and controls so that they do not cause eye strain or distract from what you are working on. Stata now supports it. Meanwhile, tabbed windows conserve screen real estate. Stata has lots of windows. With the exception of the Results window, they come and go as they are needed. Now you can combine all or some into one. Click the tab, change the window.

咱们社群有15版本的Stata MP, Stata SE, Stata for Mac

22. Panel data mixed logit

a.面板数据模型操作指南, 不得不看的16篇文章

That’s it

The highlights are 58% of what’s new in Stata 16, measured by the number of text lines required to describe them. Here is a sampling of what else is new.

一些之前程序的新添功能如下:

ranksum has new option exact to specify that exact p-values be computed for the Wilcoxon rank-sum test.


New setting set iterlog controls whether estimation commands display iteration logs.


menl has new option lrtest that reports a likelihood-ratio test comparing the nonlinear mixed-effects model with the model fit by ordinary nonlinear regression.


The bayes: prefix command now supports the new hetoprobit command so that you can fit Bayesian heteroskedastic ordered probits.


The svy: prefix works with more estimation commands, namely, existing command hetoprobit and new commands cmmixlogit and cmxtmixlogit.


New command export sasxport8 exports datasets to SAS XPORT Version 8 Transport format.


New command splitsample splits data into random samples. It can create simple random samples, clustered samples, and balanced random samples. Balance splitting can be used for matched-treatment assignment.


I could go on. Type help whatsnew15to16 when you get your copy of Stata 16 to find out all that’s new.

2年,计量经济圈公众号近1000篇文章,

Econometrics Circle

数据系列:空间矩阵 | 工企数据 | PM2.5 | 市场化指数 | CO2数据 |  夜间灯光 

计量系列:匹配方法 | 内生性 | 工具变量 | DID | 面板数据 | 常用TOOL | 中介调节  | 时间序列

干货系列:能源环境 | 效率研究 | 空间计量 | 国际经贸 | 计量软件 | 商科研究 | 机器学习 | SSCI | CSSCI

计量经济圈组织了一个计量社群,有如下特征:热情互助最多、前沿趋势最多、社科资料最多、社科数据最多、科研牛人最多、海外名校最多。因此,建议积极进取和有强烈研习激情的中青年学者到社群交流探讨,始终坚信优秀是通过感染优秀而互相成就彼此的。

: . Video Mini Program Like ,轻点两下取消赞 Wow ,轻点两下取消在看

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存