Loughran&McDonald金融文本情感分析库
今天看到一个预测股价的项目,其中用到pysentiment库对金融文本数据进行情感计算。查了下该库的官方文档,发现该库提供了两大情感分析
Harvard IV-4 英文通用情感分析
Loughran&MCdonald 英文金融情感分析
pysentiment github地址https://github.com/hanzhichao2000/pysentiment
pysentiment安装
!pip3 install pysentiment
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Collecting pysentiment
Using cached https://pypi.tuna.tsinghua.edu.cn/packages/3d/32/b9822555aeafd949ba2e1e5f0ca9a7aea857802965c61a6290e711b11e6c/pysentiment-0.2.tar.gz
[31m ERROR: Complete output from command python setup.py egg_info:[0m
[31m ERROR: Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/rr/6m7nhd_d0296rq7gjqlf0gsr0000gn/T/pip-install-c0k8cqik/pysentiment/setup.py", line 8, in <module>
install_req = [e.strip() for e in open(path_req).readlines()]
FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/rr/6m7nhd_d0296rq7gjqlf0gsr0000gn/T/pip-install-c0k8cqik/pysentiment/requirements.txt'
----------------------------------------[0m
[31mERROR: Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/rr/6m7nhd_d0296rq7gjqlf0gsr0000gn/T/pip-install-c0k8cqik/pysentiment/[0m
pysentiment我一直没有安装成功过,最后没办法只能从github上下载下来放到 与ipynb文件所属的文件夹
内调用使用。
大家课后下载我这个项目文件夹即可(文末附有下载地址)。
pysentiment接口
HIV4 英文通用情感分析
LM 英文金融领域情感分析
英文通用情感分析
通用情绪的情感分析使用的Harvard IV-4的词库,词库详情可见 http://www.wjh.harvard.edu/~inquirer/
计算说明:
Positive正面词词频数
Negative负面词词频数
Polarity=(Pos-Neg)/(Pos+Neg)
Subjectivity=(Pos+Neg)/count(*)
from pysentiment import hiv4, lm
#初始化hiv4
hiv4 = hiv4.HIV4()
#待分析文本
test_text = """Lately, the Indonesian government has unleashed an array of policies that are keeping mining and oil executives awake at night across this vast and geologically rich archipelago. The unpopular new regulations, aimed at reforming the mining and oil industries, are promoted in the name of "national interest." Yet left uncorrected, they will inevitably lead to a dramatic decline of output in Indonesia's extractive industries, damaging foreign investment and economic growth.Particularly hard-hit will be some of Indonesia's less-developed regions such as Kalimantan and Papua, where oil and mining play major economic roles.Equating the government to the Emperor Nero and the local mining industry to ancient Rome," said Bill Sullivan, leading legal consultant for the mining industry in Indonesia, "It is as if Nero is choosing to complacently fiddle while Rome burns.Why exactly this fiddling persists—especially since large investors have alrea"""
#分词得到词语列表tokens
words = hiv4.tokenize(test_text)
#将词语列表words传入hiv4.get_score,得到得分score
score = hiv4.get_score(words)
#查看score
score
{'Positive': 14,
'Negative': 10,
'Polarity': 0.1666666597222225,
'Subjectivity': 0.3287671187840121}
英文金融情感分析
英文金融情感分析使用的Loughran and McDonald的词库,词库详情可见 https://www3.nd.edu/~mcdonald/Word_Lists.html
计算说明:
Positive正面词词频数
Negative负面词词频数
Polarity=(Pos-Neg)/(Pos+Neg)
Subjectivity=(Pos+Neg)/count(*)
from pysentiment import hiv4, lm
#初始化lm
lm = lm.LM()
#待分析文本
test_text = "Cisco Posts Another Record Quarter With Growth Across All Segments; Raising FVE to $46Cisco's first-quarter results modestly beat our top line and net income expectations while the $0.77 earnings per share exceeded our expected result due to an increased quantity of shares repurchased. The narrow-moat firm posted 8% year-over-year revenue growth, with strength across all the business segments and provided strong guidance for the next quarter. After updating our Cisco forecast to consider stronger growth driven by expected cross selling of multi-cloud environment products, security solutions, and infrastructure hardware, we are raising our fair value estimate to $46 per share from $43. With shares trading around our fair value estimate after hours, we recommend for investors to sustain their Cisco positions.The company guided the second quarter to have a 5%-7% growth over the previous year with 30.5%-31.5% non-GAAP operating margins. Cisco is benefitting from a strong IT spending environment, and we believe that the company's product roadmap has made the it a one-stop-shop for networking environments. Two major recent announcements by Cisco were its integration of security into SD-WAN products and its offering of production grade Kubernetes to be run on premises and then offloaded to Amazon AWS. We like that Cisco is intertwining previously siloed offerings into combined solutions that contain unique selling features. Additionally, having support with all three major hyperscale public cloud providers allows Cisco to be a commonality for IT teams balancing on-premises, private, and public cloud environments. We like that Cisco has completely embraced the cloud as a path to growth instead of a business threat. In our view, Cisco's innovative product portfolio should keep it on the shortlist for enterprise customers debating networking infrastructure providers for hardware, software, and services in cloud environments or on premises."
#分词得到词语列表tokens
words = lm.tokenize(test_text)
#将词语列表words传入lm.get_score,得到得分score
score = lm.get_score(words)
#查看score
score
{'Positive': 6,
'Negative': 2,
'Polarity': 0.4999999375000079,
'Subjectivity': 0.055172413412604045}
觉得本文有用,请不吝点赞评论转发~谢谢支持~
近期文章
课件获取方式,请在公众号后台回复关键词“LM情感分析”
觉得本文有用,请不吝点赞评论转发~谢谢支持~