其他
whatlies包 | 简单玩转词向量可视化
2021暑期 | 结构模型、Stata实证前沿、Python数据挖掘暑假工作坊
whatlies
可以与spacy语言模型结合,可视化词向量。安装zh_core_web_md、en_core_web_md和whatlies。具体文档可以查看https://github.com/RasaHQ/whatlies
!pip3 install zh_core_web_md-3.0.0-py3-none-any.whl
!pip3 install en_core_web_md-3.0.0-py3-none-any.whl
!pip3 install whatlies
快速上手
spacy模型中的词向量均为几十上百维度的词向量,通过压缩映射至二维空间后,横坐标man,纵坐标woman,就可以将词语的性别倾向可视化出来。
例如woman词更接近纵轴,man更接近横轴。nurse、queen一般更多的是女性从业者,因此更接近y轴。king国王多为男性,所以更接近x轴。
至于动物,女性喜欢养猫,男性喜欢养狗,所以也能体现出词语的性别倾向。
from whatlies import EmbeddingSet
from whatlies.language import SpacyLanguage
lang = SpacyLanguage("en_core_web_md")
words = ["cat", "dog", "fish", "kitten", "man", "woman",
"king", "queen", "doctor", "nurse"]
emb = EmbeddingSet(*[lang[w] for w in words])
emb.plot_interactive(x_axis=emb["man"], y_axis=emb["woman"])
whatlies也可以对中文进行操作。
from whatlies import EmbeddingSet
from whatlies.language import SpacyLanguage
zh_lang = SpacyLanguage("zh_core_web_md")
zh_words = ["猫", "狗", "鱼", "鲤鱼", "男人", "女人",
"国王", "王后", "医生", "护士"]
zh_emb = EmbeddingSet(*[zh_lang[w] for w in zh_words])
zh_emb.plot_interactive(x_axis=zh_emb["男人"], y_axis=zh_emb["女人"])