其他
牛!AI大赛机器阅读理解任务冠军方案
The following article is from 飞桨PaddlePaddle Author 王肖
参赛背景
比赛介绍
基于鲁棒性优化的多模型融合
的是非观点极性分析方法
Yes:肯定观点,肯定观点指的是答案给出了较为明确的肯定态度。有客观事实的从客观事实的角度出发,主观态度类的从答案的整体态度来判断。 No:否定观点,否定观点通常指的是答案较为明确的给出了与问题相反的态度。 Depends:无法确定/分情况,主要指的是事情本身存在多种情况,不同情况下对应的观点不一致;或者答案本身对问题表示不确定,要具体具体情况才能判断。
BERT [7] : 使用Transformer [10] 作为算法的主要框架,更彻底地捕捉语义关系,使用了Mask Language Model(MLM) [11] 和 Next Sentence Prediction(NSP) 的多任务训练目标,相较于较早的预训练模型,BERT使用更强大的算力训练了更大规模的数据。 RoBERTa [1] : 相较BERT,RoBERTa不再使用Next Sentence Prediction(NSP)任务,使用更大更多样性的数据,且数据从一个文档中连续获得。在mask方面,使用动态掩码机制每次向模型输入一个序列时都会生成新的掩码模式。这样,在大量数据不断输入的过程中,模型会逐渐适应不同的掩码策略,学习不同的语言表征。 ERNIE [6] : 在BERT的基础上做优化,主要改进点在于,在pretrainning阶段增加了外部的知识,由三种level的mask组成,分别是basic-level masking(word piece)+ phrase level masking(WWM style) + entity level masking,引入了DLM (Dialogue Language Model) task,中文的ERNIE还使用了各种异构数据集。
实验分析
结论
Liu Y, Ott M, Goyal N, et al. Roberta: A robustly optimized bert pretraining approach[J]. arXiv preprint arXiv:1907.11692, 2019.
https://github.com/PaddlePaddle/PALM He W, Liu K, Liu J, et al. Dureader: a chinese machine reading comprehension dataset from real-world applications[J]. arXiv preprint arXiv:1711.05073, 2017. Rajpurkar P, Zhang J, Lopyrev K, et al. Squad: 100,000+ questions for machine comprehension of text[J]. arXiv preprint arXiv:1606.05250, 2016. https://github.com/PaddlePaddle/Paddle Sun Y, Wang S, Li Y, et al. Ernie: Enhanced representation through knowledge integration[J]. arXiv preprint arXiv:1904.09223, 2019. Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J]. arXiv preprint arXiv:1810.04805, 2018. Miyato T, Dai A M, Goodfellow I. Adversarial training methods for semi-supervised text classification[J]. arXiv preprint arXiv:1605.07725, 2016. Dietterich T G. Ensemble methods in machine learning[C]//International workshop on multiple classifier systems. Springer, Berlin, Heidelberg, 2000: 1-15. Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]//Advances in neural information processing systems. 2017: 5998-6008. Taylor W L. “Cloze procedure”: A new tool for measuring readability[J]. Journalism quarterly, 1953, 30(4): 415-433. Ross A S, Doshi-Velez F. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients[C]//Thirty-second AAAI conference on artificial intelligence. 2018. Sun Y, Wang S, Li Y, et al. Ernie 2.0: A continual pre-training framework for language understanding[J]. arXiv preprint arXiv:1907.12412, 2019. Wei J, Ren X, Li X, et al. NEZHA: Neural Contextualized Representation for Chinese Language Understanding[J]. arXiv preprint arXiv:1909.00204, 2019. Yang Z, Dai Z, Yang Y, et al. Xlnet: Generalized autoregressive pretraining for language understanding[C]//Advances in neural information processing systems. 2019: 5754-5764. Lan Z, Chen M, Goodman S, et al. Albert: A lite bert for self-supervised learning of language representations[J]. arXiv preprint arXiv:1909.11942, 2019. Chen T, Guestrin C. Xgboost: A scalable tree boosting system[C]//Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016: 785-794.