人工模板:典型的方法便是针对某个关系,人工撰写一个模板,如“出生于”对应的模板为“[marie curie] was born in [warsaw]”;“职业”对应的模板为“[obama] worked as a [president]”。
自动模板:人工模板的劣势是显然的,耗时耗力、也不一定好使。于是很多工作研究了如何自动产生模板。以Jiang et al(2020) [2] 为例,对于某个关系实例 (x, r, y),它首先识别维基百科中同时包含 x 和 y 的句子,然后将句子中 x 和 y 去掉,变成模板。这些针对关系 r 的模板通过重构(如翻译两次),生成更多模板。然后从这些候选模板中,选择性能最好的模板。下面是一些模板的例子。当然,自动模板的方法中也有不同流派,这里不展开了。
[1] Petroni, Fabio, et al. "Language models as knowledge bases?." arXiv preprint arXiv:1909.01066 (2019). [2] Jiang, Zhengbao, et al. "How can we know what language models know?." Transactions of the Association for Computational Linguistics 8 (2020): 423-438. [3] Guu, Kelvin, et al. "Retrieval augmented language model pre-training.“ ICML 2020. [4] Roberts, Adam, et al. "How Much Knowledge Can You Pack Into the Parameters of a Language Model?." EMNLP 2020. [5] Poerner, Nina, Ulli Waltinger, and Hinrich Schütze. "E-BERT: Efficient-yet-effective entity embeddings for BERT." EMNLP 2020. [6] Xiong, Wenhan, et al. "Pretrained encyclopedia: Weakly supervised knowledge-pretrained language model.“, ICLR 2020. [7] Clark, Kevin, et al. "Electra: Pre-training text encoders as discriminators rather than generators." arXiv preprint arXiv:2003.10555 (2020). [8] Ling, Jeffrey, et al. "Learning cross-context entity representations from text." ICLR 2020. [9] Bordes, Antoine, et al. "Translating embeddings for modeling multi-relational data." Advances in neural information processing systems 26 (2013). [10] Peters, Matthew E., et al. "Knowledge enhanced contextual word representations." EMNLP 2019. [11] Zhang, Zhengyan, et al. "ERNIE: Enhanced language representation with informative entities." ACL 2019. [12] Févry, Thibault, et al. "Entities as experts: Sparse memory access with entity supervision." EMNLP 2020. [13] Soares, Livio Baldini, et al. "Matching the Blanks: Distributional Similarity for Relation Learning." ACL 2019. [14] Qin, Yujia, et al. "ERICA: improving entity and relation understanding for pre-trained language models via contrastive learning." ACL 2021. [1] 5Wang, Xiaozhi, et al. "KEPLER: A unified model for knowledge embedding and pre-trained language representation." TACL 2021.