查看原文
其他

统计计量 | 实证论文中打表的10条戒律

启研学社 数据Seminar 2022-12-31



启研学社由知名学者担任学术顾问,高校师生与企研数据科学团队联合组建,以大数据资源及相关技术助力中国学术与智库研究为宗旨的研究组织。团队当前的主要目标是挖掘经济社会大数据资源在学术和智库领域的应用价值,开展学术大数据治理研究,以及探索大数据分析技术融入中国经济社会研究的可行进路。


英文标题The 10 Commandments for Regression Tables

中文标题:回归表格的10条戒律

作者:Keith Head

本文来源Keith Head个人博客

网址http://blogs.ubc.ca/khead/research/research-advice/regression-tables

翻译说明:(1)本(系列)论文翻译稿由启研学社实习生完成,旨在帮助本科生或低年级研究生交流经验研究的方法;(2)本翻译稿仅用于学习交流,请勿用于商业目的。


关注我们,公众号内发送关键词“20210222”。获取PDF版


回归表格的10条戒律

Keith Head

启研学社·陈泽 译

这些戒律是我根据它们可能引起的争议程度来排序的,如果你是我的学生或者我是你的评审人,那么这些戒律是你的不二选择。

1.报告观察值、R方以及每个回归的均方误根(the root mean squared error)(译者注:又称标准误)。

2.报告被解释变量和估计方法。如果所有模型的被解释变量和估计方法都相同,就在表格的标题(caption)中报告;反之则在每列的标题中报告。

3.对解释变量使用明白易晓的解释标签。模型部分中晦涩的缩写或者符号会迫使读者来回翻页以理解回归结果。因此,对于五到六列的回归结果,应当留有足够的空间来表述每个回归元,同时将模型中使用的符号作为插入语置于表格下方。

4.为变量选择合理的单位。估计系数不应当太小(如0.000032)或太大(如75432.8),根据经验,估计系数只能使用小数点左边或右边的前两至三位。一个例外是由于你正在估计双对数模型(log-log model)从而存在没有单位的变量,在这种情况下,系数的大小本身就是有意义的。

5.应当使用大字体来展示表格。请不要展示充满微小数字的表格,同时又写到“我知道你无法看清但是…”。如果有必要,可以将一些控制变量置于辅助表格中,以便把注意力集中于要关注的变量上。

6.将标准误与估计系数置于同一列。回归包(译者注:是指做回归的软件包)会将标准误作为单独的列置于回归系数旁,但我们应当将标准误放入括号里并置于回归系数的下方。使用Stata的输出功能来输出表格,输出功能的安装命令如下:

ssc install estout,replace

7.将表格置于正文中。一些期刊强调将回归表格置于论文最终版本的末尾,但是这并不意味着你在工作论文或者论文初稿中也必须这么做。这是因为如果将回归表格放回文章中,阅读论文会更加容易,不必频繁地翻到文尾查询回归结果,然后再跳回文中查看解释。同时,这样还有助于检查论文的内容和表格信息组合是否正确。

8.展示标准误,而不是t值和p值。除非你唯一感兴趣的是系数不为零的假设是否通过检验,否则就展示标准误。这样可以使读者更直接地查看估计系数的精确度。即便读者根本不愿意进行经典的假设检验,对各种可能的检验来说,标准误依然是有用的,并且是有价值的。如果上述论点无法说服你,那么请看更权威的观点:

部分原因在于我们不只感兴趣于进行正式的假设检验,我们喜欢在回归参数下的括号里看到标准误。这个量为我们提供了对估计精确度的一种度量,可以用来构造置信区间,可以在估计值之间进行比较而且可以检验任何我们感兴趣的假设。

——乔舒亚·安格里斯特、约恩-斯特芬·皮施克,《基本无害的计量经济学》

9.如果要展示所有的估计显著性,那么请使用“a” (1%), “b” (5%),以及“c” (10%)来说明显著性水平。使用多个星号(***)会浪费表格中稀缺的空间。在我看来,一张充满星号的表格看起来像是在炫耀。如果你真的喜欢星号,你可以遵循一些常规的做法,即选择一个合适你研究的显著性水平(5%或者1%),然后用一个星号表示那个水平。你可以对显著性水平较低的回归系数使用波浪线,另一种较受青睐的方法就是将显著的回归系数加粗。

10.只需要报告双尾检验的显著性水平。如果你的理论告诉你回归系数的符号,你可以认为使用单尾检验是合适的,但是这种潜在的说法通常被使用双尾检验的标准所压倒,许多读者会把单侧检验视为夸大研究结果显著性水平的一种“愤世嫉俗”的策略。因此,在自由度很宽裕的情况下,对于t值超过1.96而非1.645,确定是变量在5%的水平上显著。示例:

来源:Head, Keith, Ran Jing, and Deborah Swenson, “From Beijing to Bentonville: Do Multinational Retailers Link Markets?” Journal of Development Economics, 110, 79—92,September 2014.





作者简介


英属哥伦比亚大学尚德商学院战略与商业经济学系教授、汇丰银行亚洲商务教授,麻省理工学院经济学博士,师从经济学家保罗·克鲁格曼。研究兴趣涵盖国际贸易、跨国公司、经济地理等领域,研究成果见诸American Economic ReviewThe Review of Economic StudiesThe Review of Economics and StatisticsJournal of International EconomicsJournal of Development EconomicsCanadian Journal of Economics等国际顶级期刊,是Journal of International Economics等多个杂志的主编和副主编。


代表性论文


Head, Keith and Thierry Mayer, “Brands in Motion: How frictions shape multinational production,” American Economic Review, 109(9), 3073–3124, September 2019.

Head, Keith, Yao Li and Asier Minondo, “Geography, Ties, and Knowledge Flows: Evidence from Citations in Mathematics,” Review of Economics and Statistics, 101(4), 713–727, October 2019.

Head, Keith and Thierry Mayer,“Misfits in the Car Industry: Offshore assembly decisions at the variety level,” Journal of the Japanese and International Economies, 52, 90–105, June 2019.

Head, Keith and Barbara J. Spencer, “Oligopoly in international trade: Rise, fall, and resurgence,” Canadian Journal of Economics, 50(5), 1414–1444, 2017.

Head, Keith, Ran Jing and John Ries, “Import Sourcing of Chinese Cities: Order versus Randomness” Journal of International Economics, 105, 119—129, 2017.


原文

The 10 Commandments for Regression Tables
These commands are organized according to how controversial I think they might be. If you are my student or I am your referee these commands are not optional.
1. Report the number of observations, the r-squared, and the root mean squared error for each regression.
2. Report the dependent variable and the estimation method. in the table’s caption if it is common to all specifications or as a column heading if it varies across specifications.
3. Use self-explanatory labels for your explanatory variables. Cryptic abbreviations or symbols from the model section force the reader to page back and forth to understand your results. With five or six columns of regression results there should be enough room to use words to describe each regressor. Put the symbol used in the model in parentheses below this.
4. Choose sensible units for variables. The coefficients should not be very small (e.g. 0.000032) or very large (e.g. 75432.8). As a rule of thumb, coefficients should only use the first two or three places to the left or right of the decimal point. One exception is the case where variables are unit-free because you are estimating a log-log model. In that case coefficient size is inherently meaningful.
5. The presentation version of the table should be in large type. Don’t show a table full of tiny numbers and say “I know you can’t read this but…” If necessary, place some of you control variables in an auxiliary table so you can focus attention on the variables of interest.
6. Put standard errors in the same column as the coefficients. Regression packages put standard errors alongside coefficients as separate columns but you should put each regression as a single column in your results table. Columns should be used for 4-8 alternative specifications and samples. Thus the standard error should appear below the related coefficient in parentheses. Use the Stata estout package. You have to install this by typing the following code on the Stata command line:
ssc install estout, replace
7. Insert key tables inside the body of the paper. Journals insist upon tables at the end for the final submitted version of the paper. This does not mean you should do it for working papers or first submissions. There is a reason why the printed version of your article puts the tables back into the text: it is easier to read a paper that way without having to constantly flip to the end to find results and then flip back to the text for interpretation. By putting the tables in the text you will also be more aware of whether your paper has the right mix of text and tabular information.
8. Display standard errors, not t-statistics or p-values. Unless the test that the coefficient is not equal to zero is the only conceivable test of interest, display standard errors. These give readers a direct view of the precision with which you are estimating the coefficient. They are useful information for a variety of possible tests and are still valuable even if the reader prefers not to engage in classical hypothesis testing at all. If my arguments have not persuaded you, let me appeal to a higher authority:
We’re not only (or even primarily) interested in formal hypothesis testing: we like to see the standard errors in parentheses under our regression coefficients. These provide a summary measure of precision that can be used to construct confidence intervals, compare estimators, and test any hypothesis that strikes us, now or later.
Angrist and Pischke, Mostly Harmless Econometrics, p. 302
9. Use “a” (1%), “b” (5%), and “c” (10%) superscripts to show statistical significance, if you show it at all. Using multiple asterisks (***) to display statistical significance wastes scarce horizontal space in a table. In my perception, a table stuffed with asterisks looks like you are showing off. If you really like asterisks, and there is something to be said for following common practice, then just pick a level of significance (five percent or possibly one percent) that seems appropriate for your study and then use a single asterisk for that level. You could use a squiggle for coefficients that are only marginally significant. Another approach that is gaining favor is to put the significant coefficients in bold font.
10. Report significance for two-tailed tests only. You may think it is OK to use one-tailed tests if your theory tells you the sign of the coefficient. However, this potential justification is overwhelmed by the common practice of using two-tailed test criteria. Many readers, will view the use of one-tailed tests as a cynical ploy to exaggerate the significance of your results. Thus, with infinite degrees of freedom, variables are significant at the 5% level for t-stats over 1.96, NOT 1.645.



·END·


星标⭐我们不迷路!

想要文章及时到,文末“在看”少不了!


点击搜索你感兴趣的内容吧


往期推荐


专题数据 | 真正的商业银行网点面板数据是这样的~~~

数据分享丨中国健康与养老追踪调查(CHARLS)

推荐 | 提供数据和软件代码的国际期刊一览, 各种数据可得性政策一览!

学术前沿丨Climatic Change:牧民对气候变化的感知和适应

软件应用 | 8个流行的Python可视化工具包,你喜欢哪个?

数据分享丨《中国国土资源统计年鉴》(2005-2018)

数据治理 | 处理数据:两位经验研究者的经验







数据Seminar




这里是大数据、分析技术与学术研究的三叉路口


翻译 | 陈泽

校对 | 杨奇明

排版编辑 | 青酱



    欢迎扫描👇二维码添加关注    

点击下方“阅读全文”了解更多

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存