查看原文
其他

Loughran&McDonald金融文本情感分析库

大邓 大邓和他的Python 2022-07-09

今天看到一个预测股价的项目,其中用到pysentiment库对金融文本数据进行情感计算。查了下该库的官方文档,发现该库提供了两大情感分析

  • Harvard IV-4 英文通用情感分析

  • Loughran&MCdonald 英文金融情感分析

pysentiment github地址https://github.com/hanzhichao2000/pysentiment

pysentiment安装

  1. !pip3 install pysentiment

  1. Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple

  2. Collecting pysentiment

  3. Using cached https://pypi.tuna.tsinghua.edu.cn/packages/3d/32/b9822555aeafd949ba2e1e5f0ca9a7aea857802965c61a6290e711b11e6c/pysentiment-0.2.tar.gz

  4. [31m ERROR: Complete output from command python setup.py egg_info:[0m

  5. [31m ERROR: Traceback (most recent call last):

  6. File "<string>", line 1, in <module>

  7. File "/private/var/folders/rr/6m7nhd_d0296rq7gjqlf0gsr0000gn/T/pip-install-c0k8cqik/pysentiment/setup.py", line 8, in <module>

  8. install_req = [e.strip() for e in open(path_req).readlines()]

  9. FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/rr/6m7nhd_d0296rq7gjqlf0gsr0000gn/T/pip-install-c0k8cqik/pysentiment/requirements.txt'

  10. ----------------------------------------[0m

  11. [31mERROR: Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/rr/6m7nhd_d0296rq7gjqlf0gsr0000gn/T/pip-install-c0k8cqik/pysentiment/[0m

pysentiment我一直没有安装成功过,最后没办法只能从github上下载下来放到 ipynb文件所属的文件夹 内调用使用。

大家课后下载我这个项目文件夹即可(文末附有下载地址)。

pysentiment接口

  • HIV4 英文通用情感分析

  • LM 英文金融领域情感分析

英文通用情感分析

通用情绪的情感分析使用的Harvard IV-4的词库,词库详情可见 http://www.wjh.harvard.edu/~inquirer/

计算说明:

  • Positive正面词词频数

  • Negative负面词词频数

  • Polarity=(Pos-Neg)/(Pos+Neg)

  • Subjectivity=(Pos+Neg)/count(*)

  1. from pysentiment import hiv4, lm


  2. #初始化hiv4

  3. hiv4 = hiv4.HIV4()


  4. #待分析文本

  5. test_text = """Lately, the Indonesian government has unleashed an array of policies that are keeping mining and oil executives awake at night across this vast and geologically rich archipelago. The unpopular new regulations, aimed at reforming the mining and oil industries, are promoted in the name of "national interest." Yet left uncorrected, they will inevitably lead to a dramatic decline of output in Indonesia's extractive industries, damaging foreign investment and economic growth.Particularly hard-hit will be some of Indonesia's less-developed regions such as Kalimantan and Papua, where oil and mining play major economic roles.Equating the government to the Emperor Nero and the local mining industry to ancient Rome," said Bill Sullivan, leading legal consultant for the mining industry in Indonesia, "It is as if Nero is choosing to complacently fiddle while Rome burns.Why exactly this fiddling persists—especially since large investors have alrea"""


  6. #分词得到词语列表tokens

  7. words = hiv4.tokenize(test_text)

  8. #将词语列表words传入hiv4.get_score,得到得分score

  9. score = hiv4.get_score(words)

  10. #查看score

  11. score

  1. {'Positive': 14,

  2. 'Negative': 10,

  3. 'Polarity': 0.1666666597222225,

  4. 'Subjectivity': 0.3287671187840121}

英文金融情感分析

英文金融情感分析使用的Loughran and McDonald的词库,词库详情可见 https://www3.nd.edu/~mcdonald/Word_Lists.html

计算说明:

  • Positive正面词词频数

  • Negative负面词词频数

  • Polarity=(Pos-Neg)/(Pos+Neg)

  • Subjectivity=(Pos+Neg)/count(*)

  1. from pysentiment import hiv4, lm


  2. #初始化lm

  3. lm = lm.LM()


  4. #待分析文本

  5. test_text = "Cisco Posts Another Record Quarter With Growth Across All Segments; Raising FVE to $46Cisco's first-quarter results modestly beat our top line and net income expectations while the $0.77 earnings per share exceeded our expected result due to an increased quantity of shares repurchased. The narrow-moat firm posted 8% year-over-year revenue growth, with strength across all the business segments and provided strong guidance for the next quarter. After updating our Cisco forecast to consider stronger growth driven by expected cross selling of multi-cloud environment products, security solutions, and infrastructure hardware, we are raising our fair value estimate to $46 per share from $43. With shares trading around our fair value estimate after hours, we recommend for investors to sustain their Cisco positions.The company guided the second quarter to have a 5%-7% growth over the previous year with 30.5%-31.5% non-GAAP operating margins. Cisco is benefitting from a strong IT spending environment, and we believe that the company's product roadmap has made the it a one-stop-shop for networking environments. Two major recent announcements by Cisco were its integration of security into SD-WAN products and its offering of production grade Kubernetes to be run on premises and then offloaded to Amazon AWS. We like that Cisco is intertwining previously siloed offerings into combined solutions that contain unique selling features. Additionally, having support with all three major hyperscale public cloud providers allows Cisco to be a commonality for IT teams balancing on-premises, private, and public cloud environments. We like that Cisco has completely embraced the cloud as a path to growth instead of a business threat. In our view, Cisco's innovative product portfolio should keep it on the shortlist for enterprise customers debating networking infrastructure providers for hardware, software, and services in cloud environments or on premises."


  6. #分词得到词语列表tokens

  7. words = lm.tokenize(test_text)

  8. #将词语列表words传入lm.get_score,得到得分score

  9. score = lm.get_score(words)

  10. #查看score

  11. score

  1. {'Positive': 6,

  2. 'Negative': 2,

  3. 'Polarity': 0.4999999375000079,

  4. 'Subjectivity': 0.055172413412604045}

觉得本文有用,请不吝点赞评论转发~谢谢支持~

近期文章

精选课程 | Python数据分析实战(学术)

代码不到40行的超燃动态排序图

使用Python自动生成事件分析图谱

如何使用Adaboost预测下一次营销活动的效果

使用networkx及matplotlib库实现社会网络分析及可视化

计算社会经济学

Loughran&McDonald金融文本情感分析库

使用分析师报告中含有的情感信息预测上市公司股价变动

日期数据操作第1期 datetime库

日期数据操作第2期 pandas库

史上最大规模1.4亿中文知识图谱开源下载

【公开视频课】Python语法快速入门

【公开视频课】ython爬虫快速入门

文本数据分析文章汇总(2016-至今)

当文本分析遇到乱码(ง'⌣')ง怎么办?

当pandas遇上数据类型问题

如何理解pandas中的transform函数

一行pandas代码生成哑变量

Python最被低估的库,用好了效率提升10倍!


课件获取方式,请在公众号后台回复关键词“LM情感分析


觉得本文有用,请不吝点赞评论转发~谢谢支持~

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存