爱可可AI前沿推介(12.30)
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 RO - 机器人
1、[IR] Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification
2、[CV] Multi-Realism Image Compression with a Conditional Generator
3、[LG] On Implicit Bias in Overparameterized Bilevel Optimization
4、[CL] Cramming: Training a Language Model on a Single GPU in One Day
5、[LG] LAMBADA: Backward Chaining for Automated Reasoning in Natural Language
[CL] Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
[RO] A System-Level View on Out-of-Distribution Data in Robotics
[CV] Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
[CV] A Generalization of ViT/MLP-Mixer to Graphs
摘要:用于新冠疫苗接种搜索分类的稠密特征记忆增强Transformer、基于条件生成器的多(感知)真实性图像压缩、过参数化双层优化中的隐性偏差、不到一天时间在单个GPU上训练语言模型、自然语言自动推理的后向链、为知识密集型NLP编写检索和语言模型、机器人分布外数据的系统级视角、面向图像描述从网络抓取图像文本数据的噪声感知学习、将ViT/MLP-Mixer推广到图
1、[IR] Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification
J Gupta, Y Tay, C Kamath, V Q. Tran, D Metzler, S Bavadekar, M Sun, E Gabrilovich
[Google Research]
用于新冠疫苗接种搜索分类的稠密特征记忆增强Transformer
要点:
提出一种新的搜索查询意图分类模型和框架,作为COVID-19疫苗相关搜索的洞察工具; 结合现代最先进的自然语言理解模型以及传统的稠密特征,提出一种新的融合方法,使查询能以类似上下文键值存储的方式从稠密记忆存储中检索查询; 实验证明,该方法可显著改善强大的梯度提升基线,且F1得分超越最先进的Transformer。
摘要:
随着新冠肺炎的全面爆发,疫苗是应对全球大流行中大规模感染的关键防线之一。鉴于其提供的保护,疫苗在某些社会和专业环境中成为强制性的。本文提出一种用于检测新冠肺炎疫苗接种相关搜索查询的分类模型,一种用于生成新冠肺炎疫苗接种搜索见解的机器学习模型。该方法结合并利用了现代最先进的(SOTA)自然语言理解(NLU)技术的进步,例如具有传统稠密特征的预训练Transformer。提出一种将稠密特征视为模型可处理记忆token的新方法。这种新的建模方法可以显著改进疫苗搜索洞察(VSI)任务,通过F1得分相对提高+15%和精度提高+14%来提高精心构造的梯度提升基线。
With the devastating outbreak of COVID-19, vaccines are one of the crucial lines of defense against mass infection in this global pandemic. Given the protection they provide, vaccines are becoming mandatory in certain social and professional settings. This paper presents a classification model for detecting COVID-19 vaccination related search queries, a machine learning model that is used to generate search insights for COVID-19 vaccinations. The proposed method combines and leverages advancements from modern state-of-the-art (SOTA) natural language understanding (NLU) techniques such as pretrained Transformers with traditional dense features. We propose a novel approach of considering dense features as memory tokens that the model can attend to. We show that this new modeling approach enables a significant improvement to the Vaccine Search Insights (VSI) task, improving a strong well-established gradient-boosting baseline by relative +15% improvement in F1 score and +14% in precision.
https://arxiv.org/abs/2212.13898
2、[CV] Multi-Realism Image Compression with a Conditional Generator
E Agustsson, D Minnen, G Toderici, F Mentzer
[Google Research]
基于条件生成器的多(感知)真实性图像压缩
要点:
通过优化速率-失真-(感知)真实性权衡,即使是在低比特率下,生成式压缩方法也可以产生细节丰富、逼真的图像; 针对用户担心产生远离输入图像的误导性重建的问题,通过训练一个可以弥补两个域的解码器来引导失真-(感知)真实性权衡; 在高分辨率基准数据集中达到了新水平,可在高(感知)真实性(低FID)情况下实现更好的失真,在低失真(高PSNR)的情况下获得更好的(感知)真实性。
摘要:
通过优化速率-失真-(感知)真实性的权衡,生成式压缩方法即使在低速率下也能产生详细、逼真的图像,而不是速率-失真优化模型产生的模糊重建。然而,之前的方法没有明确控制合成了多少细节,这导致了对这些方法的普遍批评:用户可能会担心会产生远离输入图像的误导性重建。本文通过训练一个解码器来缓解这些担忧,该解码器可以连接两种域,并驾驭失真-(感知)真实性权衡。从单个压缩表示中,接收器可以决定重建靠近输入的低MSE重建,重建具有高感知质量的逼真重建,或介于两者之间的任意内容。通过该方法,在失真-(感知)真实性方面开创了一种新的最先进的境界,推动了可实现的失真-(感知)真实性对的前沿,在高真实性时实现了比以往更好的失真,在低失真时实现了更好的真实性。
By optimizing the rate-distortion-realism trade-off, generative compression approaches produce detailed, realistic images, even at low bit rates, instead of the blurry reconstructions produced by rate-distortion optimized models. However, previous methods do not explicitly control how much detail is synthesized, which results in a common criticism of these methods: users might be worried that a misleading reconstruction far from the input image is generated. In this work, we alleviate these concerns by training a decoder that can bridge the two regimes and navigate the distortion-realism trade-off. From a single compressed representation, the receiver can decide to either reconstruct a low mean squared error reconstruction that is close to the input, a realistic reconstruction with high perceptual quality, or anything in between. With our method, we set a new state-of-the-art in distortion-realism, pushing the frontier of achievable distortion-realism pairs, i.e., our method achieves better distortions at high realism and better realism at low distortion than ever before.
https://arxiv.org/abs/2212.13824
3、[LG] On Implicit Bias in Overparameterized Bilevel Optimization
P Vicol, J Lorraine, F Pedregosa, D Duvenaud, R Grosse
[University of Toronto & Google Brain]
过参数化双层优化中的隐性偏差
要点:
许多机器学习问题涉及双层优化(BLO),双层问题由两个嵌套的子问题组成; 收敛解或长期运行行为在很大程度上取决于冷启动、热启动和其他算法选择; 即使外部参数是低维的,热启动BLO获得的内部解也可以编码出惊人的关于外部目标的信息。
摘要:
机器学习中的许多问题涉及双层优化(BLO),包括超参数优化、元学习和数据集蒸馏。双层问题由两个嵌套的子问题组成,分别称为外部和内部问题。在实践中,这些子问题中通常至少有一个被过参数化。在这种情况下,在实现同等目标值的最佳选择中有很多方法可供选择。受最近对单层优化中优化算法归纳的隐式偏差的研究启发,本文研究了基于梯度的算法用于双层优化的隐性偏差。本文描述了两种标准的BLO方法——冷启动和热启动——并表明收敛解或长期运行行为在很大程度上取决于这些和其他算法选择,例如超梯度近似。本文还表明,即使外部参数是低维的,热启动BLO获得的内部解也可以编码出惊人的关于外部目标的信息。本文认为,隐性偏差在双层优化研究中应该发挥核心作用,就像在单级神经网络优化研究中一样。
Many problems in machine learning involve bilevel optimization (BLO), including hyperparameter optimization, meta-learning, and dataset distillation. Bilevel problems consist of two nested sub-problems, called the outer and inner problems, respectively. In practice, often at least one of these sub-problems is overparameterized. In this case, there are many ways to choose among optima that achieve equivalent objective values. Inspired by recent studies of the implicit bias induced by optimization algorithms in single-level optimization, we investigate the implicit bias of gradient-based algorithms for bilevel optimization. We delineate two standard BLO methods -- cold-start and warm-start -- and show that the converged solution or long-run behavior depends to a large degree on these and other algorithmic choices, such as the hypergradient approximation. We also show that the inner solutions obtained by warm-start BLO can encode a surprising amount of information about the outer objective, even when the outer parameters are low-dimensional. We believe that implicit bias deserves as central a role in the study of bilevel optimization as it has attained in the study of single-level neural net optimization.
https://arxiv.org/abs/2212.14032
4、[CL] Cramming: Training a Language Model on a Single GPU in One Day
J Geiping, T Goldstein
[University of Maryland]
Cramming: 不到一天时间在单个GPU上训练语言模型
要点:
分析了预训练管道的组件,并对其进行修改,使其达到接近BERT的性能; 提供证据表明,即使在受限环境中,性能也遵循缩放律; 讨论了最近对transformer架构进行的几种修改和改进的优点和实际适用性。
摘要:
调研了完全从零开始训练的基于transformer的语言模型可以实现的下游性能,该模型在单个消费级GPU上进行一天的掩码语言建模。除了为该场景重新分析预训练管道的几乎所有组件,并提供性能接近BERT的修改管道外,还调研了为什么缩小规模很难,以及在这种情况下哪些修改实际上提高了性能。本文提供证据表明,即使在这种受限的环境中,性能也严格遵循在大计算环境中观察到的缩放律。通过缩放律的视角,对最近对训练和架构的一系列改进进行了分类,并讨论了它们在有限计算设置中的优点和实际适用性(或不适用性)。
We investigate the downstream performance achievable with a transformer-based language model trained completely from scratch with masked language modeling for a single day on a single consumer GPU. Aside from re-analyzing nearly all components of the pretraining pipeline for this scenario and providing a modified pipeline with performance close to BERT, we investigate why scaling down is hard, and which modifications actually improve performance in this scenario. We provide evidence that even in this constrained setting, performance closely follows scaling laws observed in large-compute settings. Through the lens of scaling laws, we categorize a range of recent improvements to training and architecture and discuss their merit and practical applicability (or lack thereof) for the limited compute setting.
https://arxiv.org/abs/2212.14034
5、[LG] LAMBADA: Backward Chaining for Automated Reasoning in Natural Language
S M Kazemi, N Kim, D Bhatia, X Xu, D Ramachandran
[Google Research]
LAMBADA: 自然语言自动推理的后向链
要点:
通过大型语言模型处理非结构化、自然文本指定的知识取得了显著进展; 文献表明,从结论到支持它的公理集合的后向推理(即从预期结论到公理集合的反向推理)在寻找证据方面更有效率; 开发了名为LAMBADA的后向链算法,可以通过少样本提示语言模型推断来简单实现。
摘要:
通过用大型语言模型(LM)的能力以及思维链提示和选择推理等方法,在将知识指定为非结构化自然文本的自动推理方面取得了显著进展。这些技术从公理到结论的前向寻找证明,该公理受到搜索空间组合爆炸的影响,因此需要更长推理链的问题故障率很高。经典的自动推理文献表明,后向推理(即从预期的结论到支持它的公理集)在证据发现方面效率要高得多。将此直觉引入语言模型设置,开发了一种后向链算法,称为LAMBADA,将推理分解为四个子模块,每个子模块都可以通过少样本提示语言模型推理简单实现。在两个具有挑战性的逻辑推理数据集上,LAMBADA比最先进的前向推理方法实现了巨大的精度提升,特别是在需要深度和准确的证明链时。
Remarkable progress has been made on automated reasoning with knowledge specified as unstructured, natural text, by using the power of large language models (LMs) coupled with methods such as Chain-of-Thought prompting and Selection-Inference. These techniques search for proofs in the forward direction from axioms to the conclusion, which suffers from a combinatorial explosion of the search space, and thus high failure rates for problems requiring longer chains of reasoning. The classical automated reasoning literature has shown that reasoning in the backward direction (i.e. from the intended conclusion to the set of axioms that support it) is significantly more efficient at proof-finding problems. We import this intuition into the LM setting and develop a Backward Chaining algorithm, which we call LAMBADA, that decomposes reasoning into four sub-modules, each of which can be simply implemented by few-shot prompted LM inference. We show that LAMBADA achieves massive accuracy boosts over state-of-the-art forward reasoning methods on two challenging logical reasoning datasets, particularly when deep and accurate proof chains are required.
https://arxiv.org/abs/2212.13894
另外几篇值得关注的论文:
[CL] Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP
O Khattab, K Santhanam, X L Li, D Hall, P Liang, C Potts, M Zaharia
[Stanford University]
Demonstrate-Search-Predict: 为知识密集型NLP编写检索和语言模型
要点:
检索增强的上下文学习,是解决知识密集任务的强大方法; 提出Demonstrate-Search-Predict(DSP)框架,关键在于在语言模型(LM)和检索模型(RM)间复杂管道中传递自然语言文本; 这种可组合性产生了强大的能力,例如可以从终端任务标签自动标注复杂管道的演示。
https://arxiv.org/abs/2212.14024
[RO] A System-Level View on Out-of-Distribution Data in Robotics
R Sinha, A Sharma, S Banerjee, T Lew, R Luo, S M. Richards, Y Sun, E Schmerling, M Pavone
[Stanford University]
机器人分布外数据的系统级视角
要点:
应对分布外(OOD)输入是实现可信赖学习式开放世界自主性的重要挑战; 机器人自主性应该考虑机器人在OOD条件下执行任务的整体系统水平能力; 研究应该聚焦OOD数据如何影响整个自治栈可靠性,以及如何利用整个自治栈来减轻负面影响。
https://arxiv.org/abs/2212.14020
[CV] Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
W Kang, J Mun, S Lee, B Roh
[Kakao Brain]
面向图像描述从网络抓取图像文本数据的噪声感知学习
要点:
提出了一种噪声感知学习框架,可以从整个网络爬取的数据中学习丰富的知识,同时受噪声影响较小; 提出质量可控模型,在训练过程中使用图像文本对的对齐级别作为额外的控制信号进行学习; 证明可控图像描述模型在处理噪声方面是有效的,并且可以在描述性和独特性方面产生高质量的描述。
https://arxiv.org/abs/2212.13563
[CV] A Generalization of ViT/MLP-Mixer to Graphs
X He, B Hooi, T Laurent, A Perold, Y LeCun, X Bresson
[National University of Singapore & Loyola Marymount University & Element, Inc & New York University]
将ViT/MLP-Mixer推广到图
要点:
引入新型GNN——Graph MLP-Mixer,捕捉长程依赖并减轻过压缩问题; Graph MLP-Mixer以节点和边的数量为复杂度,提供更好的速度和内存效率; Graph MLP-Mixer在图同构方面表现出高度的表现力。