Top金融,经济与会计期刊中的文本分析, 一项长达2万字的综述性调查
凡是搞计量经济的,都关注这个号了
稿件:econometrics666@126.com
所有计量经济圈方法论丛的code程序, 宏微观数据库和各种软件都放在社群里.欢迎到计量经济圈社群交流访问.
LOUGHRAN, T. and MCDONALD, B. (2016), Textual Analysis in Accounting and Finance: A Survey. Journal of Accounting Research, 54: 1187-1230. Relative to quantitative methods traditionally used in accounting and finance, textual analysis is substantially less precise. Thus, understanding the art is of equal importance to understanding the science. In this survey, we describe the nuances of the method and, as users of textual analysis, some of the tripwires in implementation. We also review the contemporary textual analysis literature and highlight areas of future research.
会计与金融中的文本分析:一项综述性调查
目录
1 引言
2 信息内容、文档结构与可读性
2.1 信息内容
2.2 可读性
2.2.1 关于可读性的已有研究
2.2.2 可读性的定义与测度
3 词包方法与文档词项矩阵
3.1 目标短语
3.2 词表
3.2.1 Henry(2008)词表
3.2.2 Harvard GI词表
3.2.3 Diction积极、消极情绪词表
3.2.4 Harvard词表和Diction词表的局限性
3.2.5 LM词表
3.2.6 齐夫定律
3.2.7 词权重
3.3 朴素贝叶斯方法
3.4 文本中的主旨分析
4 文本叙述
5 文本相似性测度
6 文本分析实现中的细节讨论
6.1 何为“单词”
6.2 何为“句子”
6.3 为什么“积极语气”及“净语气”分析存在问题
6.4 强制性披露文本中的分析目标问题
6.5 水平与差异
6.6 实例一则
6.6.1 编程语言
6.6.2 简单范例
7 未来研究领域
7.1 企业层面复杂性测度
7.2 加权方式结构化
7.3 分析方法创新
7.4 词表问题
7.5 文本语言
8 结语
参考文献:
[1] AHERN, K., AND D. SOSYURA. “Who Writes the News? Corporate Press Releases During Merger Negotiations.” Journal of Finance 69 (2014): 241–91. [2] ALLEE, K., AND M. DEANGELIS. “The Structure of Voluntary Disclosure Narratives: Evidence from Tone Dispersion.” Journal of Accounting Research 53 (2015): 241–74. [3] ANTWEILER, W., AND M. FRANK. “Is All That Talk Just Noise? The Information Content of Internet Stock Message Boards.” Journal of Finance 59 (2004): 1259–94. [4] BERELSON, B. Content Analysis in Communication Research. Glencoe, IL: The Free Press, 1952. [5] BIDDLE, G.; G. HILARY; AND R. VERDI. “How Does Financial Reporting Quality Relate to Investment Efficiency?” Journal of Accounting and Economics 48 (2009): 112–31. [6] BLEI, D.; A. NG; AND M. JORDAN. “Latent Dirichlet Allocation.” Journal of Machine Learning Research 3 (2003): 993–1022. [7] BLOOMFIELD, R. “Discussion of Annual Report Readability, Current Earnings, and Earnings Persistence.” Journal of Accounting and Economics 45 (2008): 248–52. [8] BODNARUK, A.; T. LOUGHRAN; AND B. MCDONALD. “Using 10-K Text to Gauge Financial Constraints.” Journal of Financial and Quantitative Analysis 50 (2015): 623–46. [9] BONSALL, S. B.; A. J. LEONE; AND B. P. MILLER. “A Plain English Measure of Financial Reporting Readability.” Working paper, Ohio State University, 2015. [10] BONSALL, S. B., AND B. P. MILLER. “The Impact of Narrative Disclosure Readability on Bond Ratings and Rating Agency Disagreement.” Working paper, Ohio State University, 2014. [11] BOUKUS, E., AND J. ROSENBERG. “The Information Content of FOMC Minutes.” Working paper, Federal Reserve Bank of New York, 2006. [12] BRATTEN, B.; C. A. GLEASON; S. LAROCQUE; AND L. F. MILLS. “Forecasting Tax Expense: New Evidence from Analysts.” Working paper, University of Notre Dame, 2014. [13] BROWN, S., AND J. W. TUCKER. “Large-Sample Evidence on Firms’ Year-over-Year MD&A Modi-fications.” Journal of Accounting Research 49 (2011): 309–46. [14] BUEHLMAIER, M., AND T. WHITED. “Looking for Risk in Words: A Narrative Approach to Measuring the Pricing Implications of Finance Constraints.” Working paper, University of Rochester, 2014. [15] BUEHLMAIER, M., AND J. ZECHNER. “Slow-Moving Real Information in Merger Arbitrage.” Working paper, University of Hong Kong, 2013. [16] BURKE, K. “The Rhetoric of Hitler’s ‘Battle’.” The Southern Review 5 (1939): 1–21. [17] BUSHEE, B. J.; I. D. GOW; AND D. J. TAYLOR. “Linguistic Complexity in Firm Disclosures: Obfuscation or Information?” Working paper, University of Pennsylvania, 2015. [18] THE CATHOLIC ENCYCLOPEDIA, Vol. 4, 1908. New York: Robert Appleton Company. [19] CHEN, H.; P. DE; Y. HU; AND B. H. HWANG. “Wisdom of Crowds: The Value of Stock Opinions Transmitted Through Social Media.” Review of Financial Studies 27 (2014): 1367–403. [20] CHEN, J. V., AND F. LI. “Estimating the Amount of Estimation in Accruals.” Working paper, University of Michigan, 2013. [21] COVAL, J., AND T. SHUMWAY. “Is Sound Just Noise?” Journal of Finance 61 (2001): 1887–910. [22] CROSSNO, P.; A. WILSON; T. SHEAD; AND D. DUNLAVY. “Topicview: Visually Comparing Topic Models of Text Collections.” Tools with Artificial Intelligence (ICTAI), 2011 23rd IEEE International Conference, Boca Raton, Florida, November 7–9, 2011: 936–43. [23] DAS, S. R. “Text and Context: Language Analytics in Finance.” Foundations and Trends in Finance 8 (2014): 145–261. [24] DAS, S., AND S. BANERJEE. “Pattern Recognition Approaches to Japanese Character Recognition.” Advances in Computer Science, Engineering and Applications 166 (2012): 83–92. [25] DAS, S. R., AND M. Y. CHEN. “Yahoo! for Amazon: Sentiment Extraction from Small Talk on the Web.” Management Science 53 (2007): 1375–88. [26] DAVIS, A. K.; W. GE; D. MATSUMOTO; AND J. L. ZHANG. “The Effect of Manager-Specific Optimism on the Tone of Earnings Conference Calls.” Review of Accounting Studies 20 (2015): 639–73. [27] DAVIS, A. K.; J. M. PIGER; AND L. M. SEDOR. “Beyond the Numbers: Measuring the Information Content of Earnings Press Release Language.” Contemporary Accounting Research 29 (2012): 845–68. [28] DAVIS, A. K., AND I. TAMA-SWEET. “Managers’ Use of Language Across Alternative Disclosure Outlets: Earnings Press Releases Versus MD&A.” Contemporary Accounting Research 29 (2012): 804–37. [29] DE FRANCO, G.; O. HOPE; D. VYAS; AND Y. ZHOU. “Analyst Report Readability.” Contemporary Accounting Research 32 (2015): 76–104. [30] DORAN, J. S.; D. R. PETERSON; AND S. M. PRICE. “Earnings Conference Call Content and Stock Price: The Case of REITs.” The Journal of Real Estate Finance and Economics 45 (2012): 402–34. [31] DOUGAL, C.; J. ENGELBERG; D. GARCIA; AND C. A. PARSONS. “Journalists and the Stock Market.” Review of Financial Studies 25 (2012): 639–79. [32] EGOZI, O.; S. MARKOVITCH; AND E. GABRILOVICH. “Concept-Based Information Retrieval Using Explicit Semantic Analysis.” ACM Transactions of Information Systems 29 (2011): 8–32. [33] ERTUGRUL, M.; J. LEI; J. QIU; AND C. WAN. “Annual Report Readability, Tone Ambiguity, and the Cost of Borrowing.” Journal of Financial and Quantitative Analysis (2015): Forthcoming. [34] FELDMAN, R.; S. GOVINDARAJ; J. LIVNAT; AND B. SEGAL. “Management’s Tone Change, Post Earnings Announcement Drift and Accruals.” Review of Accounting Studies 15 (2010): 915–53. [35] FERRIS, S. P.; G. HAO; AND M. LIAO. “The Effect of Issuer Conservatism on IPO Pricing and Performance.” Review of Finance 17 (2013): 993–1027. [36] FRAZIER, K. B.; R. W. INGRAM; AND B. M. TENNYSON. “A Methodology for the Analysis of Narrative Accounting Disclosures.” Journal of Accounting Research 22 (1984): 318–31. [37] GARCIA, D. “Sentiment During Recessions.” Journal of Finance 68 (2013): 1267–300. [38] GENTZKOW, M., AND J. M. SHAPIRO. “What Drives Media Slant? Evidence from U.S. Daily Newspapers.” Econometrica 78 (2010): 35–71. [39] GUAY, W.; D. SAMUELS; AND D. TAYLOR. “Guiding Through the Fog: Financial Statement Complexity and Voluntary Disclosure.” Working paper, University of Pennsylvania, 2015. [40] GURUN, U. G., AND A. W. BUTLER. “Don’t Believe the Hype: Local Media Slant, Local Advertising, and Firm Value.” Journal of Finance 67 (2012): 561–98. [41] HANLEY, K. W., AND G. HOBERG. “The Information Content of IPO Prospectuses.” Review of Financial Studies 23 (2010): 2821–64. [42] HENRY, E. “Are Investors Influenced by How Earnings Press Releases Are Written?” Journal of Business Communication 45 (2008): 363–407. [43] HESTON, S. L., AND N. SINHA. “News Versus Sentiment: Predicting Stock Returns from News Stories.” Working paper, University of Maryland, 2015. [44] HILLERT, A.; A. NIESSEN-RUENZI; AND S. RUENZI. “Mutual Fund Shareholder Letter Tone—Do Investors Listen?” Working paper, University of Mannheim, 2014. [45] HOBERG, G., AND G. PHILLIPS. “Text-Based Network Industries and Endogenous Product Differentiation.” Journal of Political Economy (2015): Forthcoming. [46] HOFMANN, T. “Unsupervised Learning by Probabilistic Latent Semantic Analysis.” Machine Learning 42 (2001): 177–96. [47] HUANG, A.; R. LEHAVY; A. ZANG; AND R. ZHENG. “Analyst Information Discovery and Interpretation Roles: A Topic Modeling Approach.” Working paper, University of Michigan, 2015. [48] HUANG, A.; A. ZANG; AND R. ZHENG. “Evidence on the Information Content of Text in Analyst Reports.” The Accounting Review 89 (2014): 2151–80. [49] HUANG, X.; S. H. TEOH; AND Y. ZHANG. “Tone Management.” The Accounting Review 89 (2014): 1083–113. [50] JEGADEESH, N., AND D. WU. “Word Power: A New Approach for Content Analysis.” Journal of Financial Economics 110 (2013): 712–29. [51] JONES, M. J., AND P. A. SHOEMAKER. “Accounting Narratives: A Review of Empirical Studies of Content and Readability.” Journal of Accounting Literature 13 (1994): 142–84. [52] KEARNEY, C., AND S. LIU. “Textual Sentiment in Finance: A Survey of Methods and Models.” International Review of Financial Analysis 33 (2014): 171–85. [53] KIM, Y. H. “Self Attribution Bias of the CEO: Evidence from CEO Interviews on CNBC.” Journal of Banking & Finance 27 (2013): 2472–89. [54] KLARE, G. The Measurement of Readability. Ames, IA: Iowa University Press, 1963. [55] KOTHARI, S. P.; X. LI; AND J. E. SHORT. “The Effect of Disclosures by Management, Analysts, and Business Press on Cost of Capital, Return Volatility, and Analyst Forecasts: A Study Using Content Analysis.” The Accounting Review 84 (2009): 1639–70. [56] LANG, M., AND L. STICE-LAWRENCE. “Textual Analysis and International Financial Reporting: Large Sample Evidence.” Journal of Accounting and Economics 60 (2015): 110–35. [57] LARCKER, D. F., AND A. A. ZAKOLYUKINA. “Detecting Deceptive Discussions in Conference Calls.” Journal of Accounting Research 50 (2012): 495–540. [58] LAWRENCE, A. “Individual Investors and Financial Disclosure.” Journal of Accounting & Economics 56 (2013): 130–47. [59] LEHAVY, R.; F. LI; AND K. MERKLEY. “The Effect of Annual Report Readability on Analyst Following and the Properties of Their Earnings Forecasts.” The Accounting Review 86 (2011): 1087–115. [60] LEUZ, C., AND C. SCHRAND. “Disclosure and the Cost of Capital: Evidence from Firms’ Responses to the Enron Shock.” Working paper, University of Chicago, 2009. [61] LEUZ, C., AND P. WYSOCKI. “The Economics of Disclosure and Financial Reporting Regulation: Evidence and Suggestions for Future Research.” Journal of Accounting Research 54 (2016): 525–622. [62] LEWIS, N. R.; L. D. PARKER; G. D. POUND; AND P. SUTCLIFFE. “Accounting Report Readability: The Use of Readability Techniques.” Accounting and Business Research 16 (1986): 199–213. [63] LI, F. “Annual Report Readability, Current Earnings, and Earnings Persistence.” Journal of Accounting and Economics 45 (2008): 221–47. [64] LI, F. “Textual Analysis of Corporate Disclosures: A Survey of the Literature.” Journal of Accounting Literature 29 (2010a): 143–65. [65] LI, F. “The Information Content of Forward-Looking Statements in Corporate Filings—A Na¨ıve Bayesian Machine Learning Approach.” Journal of Accounting Research 48 (2010b): 1049–102. [66] LI, J., AND X. ZHAO. “Complexity and Information Content of Financial Disclosures: Evidence from Evolution of Uncertainty Following 10-K Filings.” Working paper, University of Texas at Dallas, 2014. [67] LIU, B., AND J. J. MCCONNELL. “The Role of the Media in Corporate Governance: Do the Media Influence Managers’ Capital Allocation Decisions?” Journal of Financial Economics 110 (2013): 1–17. [68] LOUGHRAN, T., AND B. MCDONALD. “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks.” Journal of Finance 66 (2011): 35–65. [69] LOUGHRAN, T., AND B. MCDONALD. “IPO First-Day Returns, Offer Price Revisions, Volatility, and Form S-1 Language.” Journal of Financial Economics 109 (2013): 307–26. [70] LOUGHRAN, T., AND B. MCDONALD. “Measuring Readability in Financial Disclosures.” Journal of Finance 69 (2014): 1643–71. [71] LOUGHRAN, T., AND B. MCDONALD. “The Use of Word Lists in Textual Analysis.” Journal of Behavioral Finance 16 (2015): 1–11. [72] LOUGHRAN, T.; B. MCDONALD; AND H. YUN. “A Wolf in Sheep’s Clothing: The Use of EthicsRelated Terms in 10-K Reports.” Journal of Business Ethics 89 (2009): 39–49. [73] LUNDHOLM, R. J.; R. ROGO; AND J. ZHANG. “Restoring the Tower of Babel: How Foreign Firms Communicate with US Investors.” The Accounting Review 89 (2014): 1453–85. [74] MANNING, C. D., AND H. SCHÜTZE. Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press, 2003. [75] MARCUS, M.; B. SANTORINI; AND M. A. MARCINKIEWICZ. “Building a Large Annotated Corpus of English: The Penn Treebank.” Computational Linguistics 19 (1993): 313–30. [76] MARNEFFE, M.; B. MACCARTNEY; AND C. MANNING. “Generating Typed Dependency Parses from Phrase Structure Parses.” Proceedings of LREC 6 (2006): 449–54. [77] MATSUMOTO, D.; M. PRONK; AND E. ROELOFSEN. “What Makes Conference Calls Useful? The Information Content of Managers’ Presentations and Analysts’ Discussion Sessions.” The Accounting Review 86 (2011): 1383–414. [78] MAYEW, W. J., AND M. VENKATACHALAM. “The Power of Voice: Managerial Affective States and Future Firm Performance.” Journal of Finance 67 (2012): 1–43. [79] MCLAUGHLIN, G. “SMOG Grading: A New Readability Formula.” Journal of Reading 12 (1969): 639–46. [80] MIKHEEV, A. “Periods, Capitalized Words, etc.” Computational Linguistics 28 (2002): 289–316. [81] MILLER, B. P. “The Effects of Reporting Complexity on Small and Large Investor Trading.” The Accounting Review 85 (2010): 2107–43. [82] MOSTELLER, F., AND D. WALLACE. Inference and Disputed Authorship: The Federalist. Reading, MA: Addison-Wesley, 1964. [83] PALMER, D., AND M. A. HEARST. “Adaptive Sentence Boundary Disambiguation.” Proceedings of the Fourth Annual ACL Conference on Applied Natural Language Processing, Stuttgart, Germany, October 13–15, 1994: 78–83. [84] PRATT, G. “Is a Cambrian Explosion Coming for Robotics?” Journal of Economic Perspectives, 29 (2015): 51–60. [85] PRICE, S. M.; J. S. DORAN; D. R. PETERSON; AND B. A. BLISS. “Earnings Conference Calls and Stock Returns: The Incremental Informativeness of Textual Tone.” Journal of Banking & Finance 36 (2012): 992–1011. [86] PURDA, L., AND D. SKILLICORN. “Accounting Variables, Deception, and a Bag of Words: Assessing the Tools of Fraud Detection.” Contemporary Accounting Research 32 (2015): 1193–1223. [87] RENNEKAMP, K. “Processing Fluency and Investors’ Reactions to Disclosure Readability.” Journal of Accounting Research 50 (2012): 1319–54. [88] ROGERS, J. L., AND W. NICEWANDER. “Thirteen Ways to Look at the Correlation Coefficient.” The American Statistician 42 (1988): 59–66. [89] ROGERS, J. L.; A. VAN BUSKIRK; AND S. ZECHMAN. “Disclosure Tone and Shareholder Litigation.” The Accounting Review 86 (2011): 2155–83. [90] SALTON, G., AND C. BUCKLEY. “Term-Weighting Approaches in Automatic Text Retrieval.” Information Processing & Management 24 (1988): 513–23. [91] SOLOMON, D. H. “Selective Publicity and Stock Prices.” Journal of Finance 67 (2012): 599–638. [92] SOLOMON, D. H.; E. SOLTES; AND D. SOSYURA. “Winners in the Spotlight: Media Coverage of Fund Holdings as a Driver of Flows.” Journal of Financial Economics 113 (2014): 53–72. [93] TADDY, M. “Document Classification by Inversion of Distributed Language Representations.” Proceedings of the Annual 53rd Meeting of the Association for Computational Linguistics, Beijing, China, July 27–31, 2015: 45–49. [94] TENNYSON, B. M.; R. W. INGRAM; AND M. T. DUGAN. “Assessing the Information Content of Narrative Disclosures in Explaining Bankruptcy.” Journal of Business Finance & Accounting 17 (1990): 391–410. [95] TETLOCK, P. C. “Giving Content to Investor Sentiment: The Role of Media in the Stock Market.” Journal of Finance 62 (2007): 1139–68. [96] TETLOCK, P. C.; M. SAAR-TSECHANSKY; AND S. MACSKASSY. “More than Words: Quantifying Language to Measure Firms’ Fundamentals.” Journal of Finance 63 (2008): 1437–67. [97] TSARFATY, R.; D. SEDDAH; S. KÜBLER; AND J. NIVRE. “Parsing Morphologically Rich Languages: Introduction to the Special Issue.” Association for Computational Linguistics 39 (2013): 15–22. [98] TWEDT, B., AND L. REES. “Reading Between the Lines: An Empirical Examination of Qualitative Attributes of Financial Analysts’ Reports.” Journal of Accounting and Public Policy 31 (2012): 1–21. [99] WILLIAMS, C. B. “Mendenhall’s Studies of Word-length Distribution in the Works of Shakespeare and Bacon.” Biometrika 62 (1975): 207–12. [100] YOU, H., AND X. ZHANG. “Financial Reporting Complexity and Investor Underreaction to 10-K Information.” Review of Accounting Studies 14 (2009): 559–86. [101] ZOBEL, J., AND A. MOFFAT. “Exploring the Similarity Space.” ACM SIGIR Forum 32 (1998):18–34.
关于机器学习,1.机器学习之KNN分类算法介绍: Stata和R同步实现(附数据和代码),2.机器学习对经济学研究的影响研究进展综述,3. 回顾与展望经济学研究中的机器学习,4.最新: 运用机器学习和合成控制法研究武汉封城对空气污染和健康的影响! 5.Top, 机器学习是一种应用的计量经济学方法, 不懂将来面临淘汰危险!6.Top前沿: 农业和应用经济学中的机器学习, 其与计量经济学的比较, 不读不懂你就out了!7.前沿: 机器学习在金融和能源经济领域的应用分类总结,8.机器学习方法出现在AER, JPE, QJE等顶刊上了!9.机器学习第一书, 数据挖掘, 推理和预测,10.从线性回归到机器学习, 一张图帮你文献综述,11.11种与机器学习相关的多元变量分析方法汇总,12.机器学习和大数据计量经济学, 你必须阅读一下这篇,13.机器学习与Econometrics的书籍推荐, 值得拥有的经典,14.机器学习在微观计量的应用最新趋势: 大数据和因果推断,15.R语言函数最全总结, 机器学习从这里出发,16.机器学习在微观计量的应用最新趋势: 回归模型,17.机器学习对计量经济学的影响, AEA年会独家报道,18.回归、分类与聚类:三大方向剖解机器学习算法的优缺点(附Python和R实现),19.关于机器学习的领悟与反思,20.机器学习,可异于数理统计,21.前沿: 比特币, 多少罪恶假汝之手? 机器学习测算加密货币资助的非法活动金额! 22.利用机器学习进行实证资产定价, 金融投资的前沿科学技术! 23.全面比较和概述运用机器学习模型进行时间序列预测的方法优劣!24.如何用机器学习在中国股市赚钱呢? 顶刊文章告诉你方法!
2.5年,计量经济圈近1000篇不重类计量文章,
可直接在公众号菜单栏搜索任何计量相关问题,
Econometrics Circle
数据系列:空间矩阵 | 工企数据 | PM2.5 | 市场化指数 | CO2数据 | 夜间灯光 | 官员方言 | 微观数据 | 内部数据计量系列:匹配方法 | 内生性 | 工具变量 | DID | 面板数据 | 常用TOOL | 中介调节 | 时间序列 | RDD断点 | 合成控制 | 200篇合辑 | 因果识别 | 社会网络 | 空间DID数据处理:Stata | R | Python | 缺失值 | CHIP/ CHNS/CHARLS/CFPS/CGSS等 |干货系列:能源环境 | 效率研究 | 空间计量 | 国际经贸 | 计量软件 | 商科研究 | 机器学习 | SSCI | CSSCI | SSCI查询 | 名家经验计量经济圈组织了一个计量社群,有如下特征:热情互助最多、前沿趋势最多、社科资料最多、社科数据最多、科研牛人最多、海外名校最多。因此,建议积极进取和有强烈研习激情的中青年学者到社群交流探讨,始终坚信优秀是通过感染优秀而互相成就彼此的。