计算文本的语言具体性 | 以JCR2021论文为例
前不久分享了一篇JCR2018的综述 营销研究中文本分析应用概述(含案例及代码)
最近看到一篇JCR2021的实证 语言具体性如何影响消费者态度 ,研究者从一个现象, 即消费者可以通过感知店员的表达具体(例如,更多的名词而非代词),判断店员是否用心倾听自己的需求。这有点像三十年前, 在服务态度不好的百货商场,店员往往爱答不理。下面看一下店员针对同一个信息的6种表达。
语言具体性
语言具体性Concreteness描述了一个词在多大程度上是指一个实际的、有形的或“真实的”实体,以一种更具体、更熟悉、更容易被眼睛或心灵感知的方式描述对象和行为(即,可想象或生动;Brysbaert, Warriner, and Kuperman 2014; Semin and Fiedler 1988).
具体性词典
Brysbaert, Warriner, A. B., & Kuperman, V. (2014) 找4000人,网络众包标注,开发了英文40000词的具体性词典。下图是对应的词典excel文件,字段Conc.M就是对应词语的具体性得分。
心理距离与语言具体性
Snefjella, Bryor, and Victor Kuperman(2015)挖掘了心理距离与语言具体性之间的数学关系, 第一次将心理距离看做连续性变量进行度量(而之前的研究几乎只把心理距离设置为高、低二分类变量),计算过程使用了Brysbaert2014的语言具体性词典度量。
实验结果与我们认知相吻合,基本上心理距离越大, 具体性得分越小;反之,也成立。下面我列出在地理、时间、社会三个维度的量化可视化结果。
地理维度
时间维度
社会维度
代码实现-以JCR为例
摘要: 消费者经常对客户服务感到沮丧。但是语言的简单转变是否有助于提高客户满意度?我们认为,语言具体性linguistic concreteness——员工在与客户交谈时使用的词语的有形性tangibility、具体性specificity或可想象性imaginability——可以塑造消费者的态度和行为。五项研究,包括对两个不同领域环境中超过 1,000 次真实消费者-员工互动的文本分析,表明当员工与他们具体交谈时,客户会更满意、更愿意购买和购买。这是因为客户推断使用更具体语言的员工正在倾听(即关注并理解他们的需求)。这些发现加深了对语言如何塑造消费者行为的理解,揭示了具体性影响人们感知的心理机制,并为管理者帮助提高客户满意度提供了一种直接的方法。
We computed a concreteness score for each conversational turn (averaging across all words in that turn) and for each conversational participant (averaging across all words over all their turns). Results were the same whether or not stop words commonly excluded from linguistics analyses (e.g., but, and) were included. We report results excluding stop words.
按照我的理解, 设计如下算法
对文本(会话)使用nltk分词,得到词语列表 在具体性词典中查询对应的具体性得分 得到文本的具体性得分(句子所有词的具体性得分加总除以词数)
代码如下
import pandas as pd
from nltk.tokenize import word_tokenize
#JCR文中使用的Paetzold2016的词典
# Paetzold2016文中的词典下载链接失效。这里使用Brysbaert2014的词典
df = pd.read_excel("Concreteness_ratings_Brysbaert_et_al_BRM.xlsx")
from nltk.tokenize import word_tokenize
def query_concreteness(word):
"""
查询word的具体性得分
"""
try:
return df[df["Word"]==word]['Conc.M'].values[0]
except:
return 0
def concreteness_score(text):
"""
计算文本的具体性得分
"""
score = 0
text = text.lower()
try:
words = word_tokenize(text)
except:
print('你的电脑nltk没配置好,请观看视频https://www.bilibili.com/video/BV14A411i7DB')
words = text.split(' ')
for word in words:
try:
score += query_concreteness(word=word)
except:
score += 0
return score/len(words)
# 案例
employee_replys = ["I'll go look for that",
"I'll go search for that",
"I'll go search for that top",
"I'll go search for that t-shirt",
"I'll go look for that t-shirt in grey",
"I'll go search for that t-shirt in grey"]
for idx, reply in enumerate(employee_replys):
score=concreteness_score(reply)
template = "Concreteness Score: {score:.2f} | Example-{idx}: {exmaple}"
print(template.format(score=score,
idx=idx,
exmaple=reply))
Run
Concreteness Score: 1.55 | Example-0: I'll go look for that
Concreteness Score: 1.55 | Example-1: I'll go search for that
Concreteness Score: 1.89 | Example-2: I'll go search for that top
Concreteness Score: 2.04 | Example-3: I'll go search for that t-shirt
Concreteness Score: 2.37 | Example-4: I'll go look for that t-shirt in grey
Concreteness Score: 2.37 | Example-5: I'll go search for that t-shirt in grey
员工的表达越具体,具体性得分越高。
跟JCR中的得分不一样,但是案例的得分趋势是一致的。基本上从上至下,每个员工回复对应的具体性得分越来越高。
代码获取
代码下载地址 https://hidadeng.github.io/blog/jcr_concreteness_computation/JCR_Concreteness_Computation.zip
博客地址 https://hidadeng.github.io
相关文献
Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46, 904–911
Snefjella, Bryor, and Victor Kuperman. "Concreteness and psychological distance in natural language use." Psychological science 26, no. 9 (2015): 1449-1460.
Paetzold, G. H., and L. Specia (2016), “Inferring Psycholinguistic Properties of Words,” in Proceedings of the North American Association for Computational Linguistics-Human Language Technologies 2016, 435–40.
Packard, Grant, and Jonah Berger. "How concrete language shapes customer satisfaction." Journal of Consumer Research 47, no. 5 (2021): 787-806.