基于预训练语言模型的文本生成研究综述
© 作者|李军毅
机构|中国人民大学高瓴人工智能学院
研究方向|文本生成与预训练语言模型
本文介绍的是一篇有基于预训练语言模型的文本生成研究综述。文章也同步发布在AI Box知乎专栏(知乎搜索 AI Box专栏),欢迎大家在知乎专栏的文章下方评论留言,交流探讨!
导读:本文将参考上述综述论文,从预训练语言模型应用于文本生成任务的三个挑战出发:
如何对输入数据进行编码并保持语义,使其与预训练语言模型进行融合; 如何设计通用且合适的预训练语言模型架构,使其作为生成函数; 如何优化生成函数,并保证生成文本满足特殊属性。
并详细列举目前每个挑战下的研究进展。
一、背景介绍
文本生成是目前自然语言处理领域一项非常重要但具有挑战性的任务,它的目的是希望生成可读的自然语言文本,比较有代表性的应用,例如对话系统、文本摘要和机器翻译等。
目前,深度神经模型在文本生成研究中已取得重大进展,其优势在于深度神经网络可以端到端地学习输入数据到输出文本的语义映射,而不需要人工参与进行特征工程。但是,深度神经模型往往具有大量的参数,而大部分文本生成任务数据集都非常小,因此深度神经网络非常容易在这些数据集上过拟合,导致其无法在实际应用中进行泛化。
随着预训练语言模型(Pretrained Language Models, PLMs)范式的蓬勃发展,越来越多的研究将其运用到各种自然语言处理任务中以取得SOTA效果,例如BERT解决语言理解和GPT解决语言生成。通过在大规模语料集上进行预训练,预训练语言模型可以准确地理解自然语言并以自然语言的形式流畅表达,这两项都是完成文本生成任务的重要能力。
二、任务定义
文本生成任务 的目标是生成可读的自然语言文本,可以表示为一个单词序列 ,每一个单词 都来自于词典 。在大多数情况下,文本生成任务会基于给定的输入数据,例如文本、图像、表格等,可以表示为 。特别地,我们希望生成的自然语言文本满足某些特性,例如流畅性、自然性、一致性等,可以表示为属性集合 。因此,文本生成任务可以形式化地定义为:
其中 表示生成函数,这里特指基于预训练语言模型 的生成函数。
按照预训练任务的不同,用于文本生成任务的预训练语言模型可以分为四种:
掩码语言模型,采用MLM训练目标,例如BERT、RoBERTa等。 因果语言模型,采用自回归式语言建模目标,例如GPT-2/3、CTRL等。 前缀语言模型,将前两种模型的优势结合起来,并使用特殊的mask机制,使得输入序列中的单词可以相互关注而输出序列中的单词只能关注到左侧的单词,例如UniLM、GLM等。 编码器-解码器模型,采用标准的Transformer结构,例如MASS、T5、BART等。
对于公式 ,使用预训练语言模型 解决文本生成任务 主要包括三个方面的挑战:
如何对输入数据 进行编码并保持语义,使其与预训练语言模型进行融合; 如何设计通用且合适的预训练语言模型架构 ,使其作为生成函数 ; 如何优化生成函数 ,并保证生成文本满足属性 。
本文综述的主要内容将围绕这三个方面进行整理。
三、输入表示编码
将PLMs应用到文本生成任务的第一个挑战则是如何编码输入表示,使其保持原有语义并与PLMs进行融合。不同的文本生成任务往往会涉及不同类型的输入数据,这里主要介绍三种类型的输入数据,包括非结构化输入、结构化输入和多媒体输入。
3.1 非结构化输入
在文本生成中,大部分的研究主要关注非结构化的文本输入,例如句子、段落和文档等,这需要模型对输入文本中的单词和短语有超越其表层含义的深层理解。
Text summarization with pretrained encoders, in EMNLP, 2019.
Sentence centrality revisited for unsupervised summarization, in ACL, 2019.
Pre-trained language model representations for language generation, in NAACL-HLT, 2019.
Multi-granularity interaction network for extractive and abstractive multi-document summarization, in ACL, 2020.
HIBERT: document level pretraining of hierarchical bidirectional transformers for document summarization, in ACL, 2019.
Unsupervised extractive summarization by pre-training hierarchical transformers, in EMNLP, 2020.
Discourse-aware neural extractive text summarization, in ACL, 2020.
Cross-lingual language model pretraining, in NeurIPS, 2019.
Multilingual denoising pretraining for neural machine translation, in TACL, 2020.
Unsupervised cross-lingual word embedding by multilingual neural language models, in arXiv preprint arXiv:1809.02306, 2018.
3.2 结构化输入
结构化数据(例如表格、图和树)在文本生成任务中也是一种非常重要的输入类型。虽然预训练语言模型可以从无标注数据中获取语言知识,但预训练语言模型在解决结构化数据的文本生成任务时仍面临三个问题:
结构化数据输入和用于预训练的自然语言之间存在着语义鸿沟; 缺乏对结构化数据结构的建模; 如何保持生成文本对输入数据的忠诚度。
Investigating pretrained language models for graph-to-text generation, in arXiv preprint arXiv:2007.08426, 2020.
Gpt-too: A language-model-first approach for amr-to-text generation, in ACL, 2020.
Tablegpt: Few-shot table-to-text generation with table structure reconstruction and content matching, in COLING, 2020.
Few-shot knowledge graph-to-text generation with pretrained language models, in Findings of ACL, 2021.
Few-shot NLG with pre-trained language model, in ACL, 2020. Get to the point: Summarization with pointer-generator networks, in ACL, 2017.
Incorporating copying mechanism in sequence-to-sequence learning, in ACL, 2016.
Structure-Aware Pre-Training for Table-to-Text Generation, in Findings of ACL, 2021.
XLPT-AMR: Cross-Lingual Pre-Training via Multi-Task Learning for Zero-Shot AMR Parsing and Text Generation, in ACL, 2021.
JointGT Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs, in Findings of ACL, 2021.
Plan-then-Generate Controlled Data-to-Text Generation via Planning, in Findings of EMNLP, 2021.
Structural Adapters in Pretrained Language Models for AMR-to-Text Generation, in EMNLP, 2021.
3.3 多媒体输入
除了上述文本数据,许多研究也将多媒体数据(例如图像、视频和声音)作为输入。如何为输入多媒体内容提供良好有效的表示是多模态文本生成任务的主要难点。
Videobert: A joint model for video and language representation learning, in ICCV, 2019.
Contrastive bidirectional transformer for temporal representation learning, arXiv preprint arXiv:1906.05743, 2019.
Unified vision-language pre-training for image captioning and VQA, in AAAI, 2020.
Unified language model pre-training for natural language understanding and generation, in NeurIPS, 2019.
XGPT: cross-modal generative pre-training for image captioning, arXiv preprint arXiv:2003.01473, 2020.
Unsupervised pre-training for sequence to sequence speech recognition, in CoRR, vol. arXiv preprint arXiv:1910.12418, 2019.
四、设计预训练语言模型架构
将PLMs应用到文本生成任务的第二个挑战是如何设计通用且合适的预训练语言模型架构,使其作为生成函数。按照预训练任务的不同,目前针对文本生成任务的预训练语言模型主要有四种结构:
掩码语言模型; 因果语言模型; 前缀语言模型; 编码器-解码器模型。
在此基础上,有不少研究对PLMs架构进行扩展,以适应不同的任务或场景的需求。
4.1 标准生成架构
由于Transformer结构的优异性,几乎所有的PLMs都采用Transformer的骨架。对于文本生成任务,主要有四类结构。
4.1.1 掩码语言模型
采用MLM预训练目标,代表模型例如BERT、RoBERTa等。由于预训练任务与下游生成任务之间存在差异,很少有研究将此类模型用作文本生成任务的主要模型,更多的是将其作为编码器,借助其优异的双向表示能力。
BERT: pre-training of deep bidirectional transformers for language understanding, in NAACL-HLT, 2019.
Leveraging pre-trained checkpoints for sequence generation tasks, in TACL, 2020.
4.1.2 因果语言模型
采用自回归式语言建模预训练目标,代表模型例如GPT-2/3、CTRL等。因果语言模型对于文本生成任务是非常直接的,其目标是基于先前的单词预测下一个单词。
Language models are unsupervised multi-task learners, in OpenAIblog, 2019.
Language models are few-shot learners, in NeurIPS, 2020.
CTRL: A conditional transformer language model for controllable generation, arXiv preprint arXiv:1909.05858, 2019.
CPM: A large-scale generative chinese pre-trained language model, in CoRR, vol. abs/2012.00413, 2020.
Defending against neural fake news, in NeurIPS, 2019. Pangu-α: Large-scale autoregressive pretrained chi-nese language models with auto-parallel computation, in CoRR, vol.abs/2104.12369, 2021.
4.1.3 前缀语言模型
结合前两种模型的优势,并使用特殊的掩码机制,使得输入序列中的单词可以相互关注而输出序列中的单词只能关注到左侧的单词,代表模型例如UniLM、GLM等。
Unified language model pre-training for natural language understanding and generation, in NeurIPS, 2019.
Unilmv2: Pseudo-masked language models for unified language model pre-training, in ICML, 2020.
Xlnet: Generalized autoregressive pretraining for language understanding, in NeurIPS, 2019.
All NLP tasks are generation tasks: A general pretraining framework, in CoRR, vol. abs/2103.10360, 2021.
ERNIE-M: enhanced multilingual representation by aligning cross-lingual semantics with monolingual corpora, in CoRR, vol.abs/2012.15674, 2020.
4.1.4 编码器-解码器模型
采用标准的Transformer结构,编码器负责对输入序列进行编码,解码器负责生成文本,编码器和解码器之间通过自注意力机制进行联系,代表模型包括MASS、T5、BART等。
MASS: masked sequence to sequence pre-training for language generation, in ICML, 2019.
Exploring the limits of transfer learning with a unified text-to-text transformer, in JMLR, 2020.
BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension, in ACL, 2020.
Prophetnet: Predicting future n-gram for sequence-to-sequence pre-training, in Findings of EMNLP, 2020.
PALM: pre-training an autoencoding & autoregressive language model for context-conditioned generation, in EMNLP, 2020.
CPM-2: large-scale cost-effective pre-trained language models, in CoRR, vol. abs/2106.10715, 2021.
Denoising based sequence-to-sequence pre-training for text generation, in EMNLP, 2019.
4.2 PLMs框架扩展
目前对传统PLMs框架的拓展研究主要包括增加额外的模块、额外的特殊embedding、以及重新设计注意力掩码机制。
Explicit Cross-lingual Pre-training for Unsupervised Machine Translation, in EMNLP, 2019.
A Multilingual View of Unsupervised Machine Translation, in Findings of EMNLP, 2020.
COCON: A Self-supervised Approach for Controlled Text Generation, in ICLR, 2021.
Hooks in the Headline Learning to Generate Headlines with Controlled Styles, in ACL, 2020.
A Simple and Efficient Multi-Task Learning Approach for Conditioned Dialogue Generation, in NAACL, 2021.
A Three-Stage Learning Framework for Low-Resource Knowledge-Grounded Dialogue Generation, in EMNLP, 2021.
Fact-Enhanced Synthetic News Generation, in AAAI, 2021.
A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining, in Findings of EMNLP, 2020.
BASS: Boosting Abstractive Summarization with Unified Semantic Graph, in ACL, 2021.
Long-Span Summarization via Local Attention and Content Selection, in ACL, 2021.
五、优化预训练语言模型
将PLMs应用到文本生成任务的第三个挑战则是如何优化生成函数即预训练语言模型,并保证生成文本满足某些属性,例如流畅性和一致性等。在这里,我们主要考虑三种优化方法:
普通微调,即通过最小化任务特定的损失来使得PLMs的权重适应下游任务。 提示微调,即将文本生成过程定义为前缀式的生成。 属性微调,即优化PLMs参数使其生成的文本满足某些特殊属性。
5.1 普通微调(Fine-Tuning)
普通微调通过使用任务特定的损失调整PLMs的权重,将任务特定的知识传授给PLMs。此外,普通微调完善了PLMs模型的性能,因为它将不同标签的点聚集在一起,并使其相互远离,从而使聚类区域之间有很大的间隔。
5.1.1 传统微调
在传统微调(Vanilla fine-tuning)中,PLMs通过任务特定的损失来适应下游任务。
DIALOGPT: Large-scale generative pretraining for conversational response generation, in ACL, 2020.
Investigating pretrained language models for graph-to-text generation, arXiv preprint arXiv:2007.08426, 2020.
Transfertransfo: A transfer learning approach for neural network based conversational agents, arXiv preprint arXiv:1901.08149, 2019.
5.1.2 中间微调
中间微调(Intermediate fine-tuning)是指在一个有大量数据的中间数据集上对PLMs进行微调。它可以帮助PLMs获得额外的领域或特定任务的知识,以避免过度拟合,并提高其在小型目标数据集上的性能。
Improving the Lexical Ability of Pretrained Language Models for Unsupervised Neural Machine Translation, in NAACL, 2021.
ZmBART: An Unsupervised Cross-lingual Transfer Framework for Language Generation, in Findings of ACL, 2021.
Continual Mixed-Language Pre-Training for Extremely Low-Resource Neural Machine Translation, in Findings of ACL, 2021.
Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation, in NAACL, 2021.
Simple Conversational Data Augmentation for Semi-supervised Abstractive Conversation Summarization, in EMNLP, 2021.
TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising, in Findings of EMNLP, 2020.
A Three-Stage Learning Framework for Low-Resource Knowledge-Grounded Dialogue Generation, in EMNLP, 2021.
Jointly Improving Language Understanding and Generation with Quality-Weighted Weak Supervision of Automatic Labeling, in EACL, 2021.
Structure-Aware Pre-Training for Table-to-Text Generation, in Findings of ACL, 2021.
Improving Neural Story Generation by Targeted Common Sense Grounding, in EMNLP, 2019.
5.1.3 多任务微调
多任务微调(Multi-task fine-tuning)允许PLMs学习跨任务的知识,它的主要重点可以是在辅助任务的帮助下提高目标任务的表现,或提高所有任务的表现。
SemFace: Pre-training Encoder and Decoder with a Semantic Interface for Neural Machine Translation, in ACL, 2021.
Context-Interactive Pre-Training for Document Machine Translation, in NAACL, 2021.
Breaking the Corpus Bottleneck for Context-Aware Neural Machine Translation with Cross-Task Pre-training, in ACL, 2021.
Contrastive Aligned Joint Learning for Multilingual Summarization, in Findings of ACL, 2021.
Cross-Lingual Abstractive Summarization with Limited Parallel Resources, in ACL, 2021.
Exploring Multi-task Learning for Low-Resource Abstractive Summarization, in EMNLP, 2021.
Noisy Self-Knowledge Distillation for Text Summarization, in NAACL, 2021.
RepSum: Unsupervised Dialogue Summarization based on Replacement Strategy, in ACL, 2021.
Topic-Aware Contrastive Learning for Abstractive Dialogue Summarization, in EMNLP, 2021.
Towards Zero-Shot Conditional Summarization with Adaptive Multi-Task Fine-Tuning, in Findings of EMNLP, 2020.
PRAL: A Tailored Pre-Training Model for Task-Oriented Dialog Generation, in ACL, 2021.
XLPT-AMR: Cross-Lingual Pre-Training via Multi-Task Learning for Zero-Shot AMR Parsing and Text Generation, in ACL, 2021.
Hooks in the Headline Learning to Generate Headlines with Controlled Styles, in ACL, 2020.
Jointly Learning to Align and Summarize for Neural Cross-Lingual Summarization, in ACL, 2020.
Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models, in Findings of ACL, 2021.
JointGT Graph-Text Joint Representation Learning for Text Generation from Knowledge Graphs, in Findings of ACL, 2021.
KG-BART: Knowledge Graph-Augmented BART for Generative Commonsense Reasoning, in AAAI, 2021.
Language Generation with Multi-Hop Reasoning on Commonsense Knowledge Graph, in EMNLP, 2020.
TableGPT: Few-shot Table-to-Text Generation with Table Structure Reconstruction and Content Matching, in COLING, 2020.
Counterfactual Story Reasoning and Generation, in EMNLP, 2019.
A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation, in TACL, 2020.
5.1.4 参数高效微调
由于传统微调涉及到更新整个PLMs模型的权重,因此需要为每个任务训练一个单独的模型,而这并不符合参数高效微调(Parameter-Efficient fine-tuning)的初衷。
Recipes for Adapting Pre-trained Monolingual and Multilingual Models to Machine Translation, in EACL, 2021.
Cross-Attention is All You Need Adapting Pretrained Transformers for Machine Translation, in EMNLP, 2021.
Efficient Attentions for Long Document Summarization, in NAACL, 2021.
Meta-Transfer Learning for Low-Resource Abstractive Summarization, in AAAI, 2021.
Structural Adapters in Pretrained Language Models for AMR-to-Text Generation, in EMNLP, 2021.
BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining, in ICML, 2021.
Lightweight Adapter Tuning for Multilingual Speech Translation, in ACL, 2021.
Exploring Versatile Generative Language Model Via Parameter-Efficient, in Findings of EMNLP, 2020.
Prefix-Tuning Optimizing Continuous Prompts for Generation, in ACL, 2021.
5.2 提示微调(Prompt-Tuning)
PLMs在下游任务的表现可以通过提示微调得到改善,特别是在少样本和零样本的设置中。这里的提示可以是填空式或前缀式的,也可以是手动或自动生成。具体来说,填空式提示适合于通过masked language modeling目标预训练的PLMs,而前缀式提示适合于使用causal modeling目标的PLMs。因此,前缀式提示通常用于文本生成,因为它们与模型的从左到右的性质相匹配。
5.2.1 离散提示
直观上说,构建提示的最自然方式是根据人类直觉手动创建离散的模板,通常与自然语言短语相对应。
Language models are unsupervised multitask learners, in OpenAI blog, 2019.
Language models are fewshot learners, in NeurIPS, 2020.
Few-shot text generation with pattern-exploiting training, in CoRR, 2020.
Controllable generation from pre-trained language models via inverse prompting, in KDD, 2021.
How can we know what language models know, in TACL, 2020.
5.2.2 连续提示
由于提示微调的目的是为了找到一种方法使PLM能够有效地执行文本生成任务,因此没有必要将提示限制在人类可理解的自然语言内。因此,有研究开始尝试连续提示(又称软提示)的方法,在模型的嵌入空间中直接进行提示。
Prefix-tuning: Optimizing continuous prompts for generation, in ACL, 2021.
5.3 属性微调(Property-Tuning)
在某些场景下,我们希望根据某些特殊属性优化PLMs的参数,使得生成文本可以满足某些文本生成任务的需求。
5.3.1 相关性
在文本生成任务中,相关性指的是输出文本的主题与输入文本高度相关。一个具有代表性的任务是对话系统,要求生成的回复与输入对话历史高度相关。
Knowledgebased review generation by coherence enhanced text planning, in SIGIR, 2021.
Transfertransfo: A transfer learning approach for neural network based conversational agents, arXiv preprint arXiv:1901.08149, 2019.
DIALOGPT: Large-scale generative pretraining for conversational response generation, in ACL, 2020.
Generalized conditioned dialogue generation based on pre-trained language model, arXiv preprint arXiv:2010.11140, 2020.
5.3.2 保真性
保真性指的是生成文本的内容不应该与输入文本的事实相违背。有时,它也意味着生成文本的内容与世界事实相一致。一个具有代表性的任务是文本摘要,要求生成高保真的摘要文本,能够反映原文最关键的信息。
Leveraging pre-trained checkpoints for sequence generation tasks, in TACL, 2020.
Improving abstraction in text summarization, in EMNLP, 2018.
TED: A pretrained unsupervised summarization model with theme modeling and denoising, in Findings of EMNLP, 2020.
5.3.3 保序性
保序性指的是输入和输出文本里的语义单元(例如单词、短语等)的顺序是一致的。一个具有代表性的任务是机器翻译。在从源语言翻译为目标语言时,保持源文本和目标文本语义单元顺序的一致可以保证翻译结果的准确性。
CSP: code-switching pre-training for neural machine translation, in EMNLP, 2020.
Cross-lingual language model pretraining, in NeurIPS, 2019.
Unsupervised cross-lingual word embedding by multilingual neural language models, arXiv preprint arXiv:1809.02306, 2018.
Pretraining multilingual neural machine translation by leveraging alignment information, in EMNLP, 2020.
六、应用
预训练语言模型在文本生成任务中有很广泛的应用场景,这一章主要列举了三种主要的文本生成应用中预训练语言模型的研究进展。
6.1 机器翻译
机器翻译任务是为了将源语言文本转化为目标语言的文本。因此,根据是否可以获得源语言相对应的目标语言数据,机器翻译任务可以分为有监督和无监督两种。
6.1.1 有监督翻译
Multilingual Denoising Pre-training for Neural Machine Translation, in TACL, 2020.
Pre-training via Paraphrasing, in NeurIPS, 2020.
Pre-training Multilingual Neural Machine Translation by Leveraging Alignment Information, in EMNLP, 2020.
A Simple and Effective Approach to Automatic Post-Editing, in ACL, 2019.
CSP: Code-Switching Pre-training for Neural Machine Translation, in EMNLP, 2020.
6.1.2 无监督翻译
Cross-lingual Language Model Pretraining, in NeurIPS, 2019.
MASS: Masked Sequence to Sequence Pre-training for Language Generation, in ICML, 2019.
Multilingual Denoising Pre-training for Neural Machine Translation, in TACL, 2020.
Explicit Cross-lingual Pre-training for Unsupervised Machine Translation, in EMNLP, 2019.
A Multilingual View of Unsupervised Machine Translation, in Findings of EMNLP, 2020.
Data-Dependent Gaussian Prior Objective for Language Generation, in ICLR, 2020.
Cross-lingual Supervision Improves Unsupervised Neural Machine Translation, in NAACL, 2021.
6.2 文本摘要
文本摘要任务是为了生成可以反映文档主要内容的摘要文本。根据输入文档的多少,文本摘要任务可以分为单文档摘要和多文档摘要。
6.2.1 单文档摘要
PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization, in ICML, 2020.
On Faithfulness and Factuality in Abstractive Summarization, in ACL, 2020.
TLDR: Extreme Summarization of Scientific Documents, in Findings of EMNLP, 2020.
Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization, in EMNLP, 2020.
A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining, in EMNLP, 2020.
GSum: A General Framework for Guided Neural Abstractive Summarization, in NAACL, 2021.
TED: A Pretrained Unsupervised Summarization Model with Theme Modeling and Denoising, in EMNLP, 2020.
Self-Attention Guided Copy Mechanism for Abstractive Summarization, in ACL, 2020.
Multi-Fact Correction in Abstractive Text Summarization, in EMNLP, 2020.
6.2.2 多文档摘要
A Spectral Method for Unsupervised Multi-Document Summarization, in EMNLP, 2020.
Better Highlighting Creating Sub-Sentence Summary Highlights, in EMNLP, 2020.
ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining, in ACL, 2021.
Data Augmentation for Abstractive Query-Focused Multi-Document Summarization, in AAAI, 2021.
6.3 对话系统
对话系统任务是根据对话生成对应的回复文本。根据对话系统应用的场景不同,对话系统任务可以分为开放域对话和任务型对话。
6.3.1 开放域对话
DIALOGPT: Large-Scale Generative Pre-training for Conversational Response Generation, in ACL, 2020.
TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents, arXiv preprint arXiv:1901.08149, 2019.
PLATO: Pre-trained Dialogue Generation Model with Discrete Latent Variable, in ACL, 2020.
Large-Scale Transfer Learning for Natural Language Generation, in ACL, 2019.
Like hiking You probably enjoy nature Persona-grounded Dialog with Commonsense Expansions, in EMNLP, 2020.
Controlling Dialogue Generation with Semantic Exemplars, in NAACL, 2021.
StyleDGPT: Stylized Response Generation with Pre-trained Language Models, in Findings of EMNLP, 2020.
6.3.2 任务型对话
Few-shot Natural Language Generation for Task-Oriented Dialog, in Findings of EMNLP, 2020.
MinTL: Minimalist Transfer Learning for Task-Oriented Dialogue Systems, in EMNLP, 2020.
Alternating Recurrent Dialog Model with Large-scale Pre-trained Language Models, in EACL, 2021.
Explicit Memory Tracker with Coarse-to-Fine Reasoning for Conversational Machine Reading, in ACL, 2020.
七、总结
本文概述了最近基于预训练语言模型的文本生成方面取得的进展,主要从预训练语言模型应用于文本生成任务的三个挑战出发。
参考文献
Junyi Li, Tianyi Tang, Wayne Xin Zhao and Ji-Rong Wen. Pretrained Language Models for Text Generation: A Survey. IJCAI Survey 2021.
更多推荐
EMNLP 2021中预训练模型最新研究进展
一文解读元学习研究进展
咚咚咚!你期待的RecBole三期来了!