爱可可AI前沿推介(12.14)
LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音 RO - 机器人 GR - 图形学
摘要:具身推理的语言模型协同、与完全训练网络具有可比性能的未训练GNN子网络的发现、图神经网络可学习可交换幺半群、拓展器图传播、大型语言模型查询语言、多视重建神经隐曲面快速学习、基于扩散的3D数字化身塑形生成模型、扩散潜空间中的语义控制、掩码生成式视频Transformer
1、[LG] Collaborating with language models for embodied reasoning
I Dasgupta, C Kaeser-Chen, K Marino, A Ahuja…
[DeepMind]
具身推理的语言模型协同
要点:
1. 提出Planner-Actor-Reporter系统,用预训练语言模型来完成复杂且模糊的具身推理任务;
2. 在网格世界中取得了成功,并证明Actor可以通过强化学习进行训练。
摘要:
在复杂和模糊的具身环境中进行推理,是强化学习(RL)智能体的关键目标。虽然一些复杂的强化学习智能体可以成功解决困难任务,但需要大量的训练数据,并且通常难以泛化到新的未知环境和新任务。另一方面,大规模语言模型(LSLM)表现出强大的推理能力和通过上下文学习适应新任务的能力。然而,LSLM 本身并不具备询问或干预环境的能力。本文研究了如何将这些互补的能力结合到一个由三部分组成的系统中:Planner,Actor 和 Reporter。Planner 是一种预训练的语言模型,可以向简单的实体智能体(Actor)发出命令,而 Reporter 与 Planner 通信以通知其下一个命令。本文提出一组需要推理的任务,测试该系统泛化零样本和调查失败案例的能力,并演示如何通过强化学习训练该系统的组件以提高性能。
Reasoning in a complex and ambiguous embodied environment is a key goal for Reinforcement Learning (RL) agents. While some sophisticated RL agents can successfully solve difficult tasks, they require a large amount of training data and often struggle to generalize to new unseen environments and new tasks. On the other hand, Large Scale Language Models (LSLMs) have exhibited strong reasoning ability and the ability to to adapt to new tasks through in-context learning. However, LSLMs do not inherently have the ability to interrogate or intervene on the environment. In this work, we investigate how to combine these complementary abilities in a single system consisting of three parts: a Planner, an Actor, and a Reporter. The Planner is a pre-trained language model that can issue commands to a simple embodied agent (the Actor), while the Reporter communicates with the Planner to inform its next command. We present a set of tasks that require reasoning, test this system's ability to generalize zero-shot and investigate failure cases, and demonstrate how components of this system can be trained with reinforcement-learning to improve performance.
https://openreview.net/forum?id=YoS-abmWjJc
2、[LG] You Can Have Better Graph Neural Networks by Not Training Weights at All: Finding Untrained GNNs Tickets
T Huang, T Chen, M Fang, V Menkovski…
[Eindhoven University of Technology & University of Texas at Austin]
与完全训练网络具有可比性能的未训练GNN子网络的发现
要点:
1.进行发现匹配未训练GNN的探索。
2.发现稀疏性是在发现与完全训练网络具有可比性能的未训练子网络的强大工具。
3.分布外检测和输入扰动鲁棒性性能均有所改善。
摘要:
最近的工作令人印象深刻地证明,在随机初始化的卷积神经网络(CNN)中存在一个子网络,可以在初始化时与完全训练的稠密网络的性能相匹配,而无需对网络权重进行任何优化(即未训练的网络)。然而,图神经网络(GNN)中此类未经训练的子网络的存在仍然是个谜。本文进行了发现匹配未训练 GNN 的首次探索。以稀疏性为核心工具,可以在初始化时找到未经训练的稀疏子网络,可与完全训练的稠密 GNN 的性能相匹配。除了这个已经令人鼓舞的可比性能发现之外,本文还表明,发现的未经训练的子网络可以大大减轻 GNN 过度平滑问题,因此成为一个强大的工具,可以在没有额外技巧的情况下实现更深层的GNN。本文还观察到这种稀疏的未经训练的子网络在分布外检测和输入扰动的鲁棒性方面具有吸引人的性能。在包括开放图基准 (OGB) 在内的各种流行数据集上评估了其在广泛使用的 GNN 架构中的方法。
Recent works have impressively demonstrated that there exists a subnetwork in randomly initialized convolutional neural networks (CNNs) that can match the performance of the fully trained dense networks at initialization, without any optimization of the weights of the network (i.e., untrained networks). However, the presence of such untrained subnetworks in graph neural networks (GNNs) still remains mysterious. In this paper we carry out the first-of-its-kind exploration of discovering matching untrained GNNs. With sparsity as the core tool, we can find untrained sparse subnetworks at the initialization, that can match the performance of fully trained dense GNNs. Besides this already encouraging finding of comparable performance, we show that the found untrained subnetworks can substantially mitigate the GNN over-smoothing problem, hence becoming a powerful tool to enable deeper GNNs without bells and whistles. We also observe that such sparse untrained subnetworks have appealing performance in out-of-distribution detection and robustness of input perturbations. We evaluate our method across widely-used GNN architectures on various popular datasets including the Open Graph Benchmark (OGB).
https://openreview.net/forum?id=dF6aEW3_62O
3、[LG] Learnable Commutative Monoids for Graph Neural Networks
E Ong, P Veličković
[University of Cambridge & DeepMind]
图神经网络可学习可交换幺半群
要点:
1. 利用可学习可交换幺半群的观点,为 GNN 提出一种富有表现力、灵活且高效的聚合器;
2. O(log V) 深度的可学习可交换幺半群 (LCM) 聚合器能实现并行性和依赖长度方面的指数级改进;
3. LCM 聚合器是高效率和表现力之间的有效权衡。图神经网络 (GNN) 已被证明对聚合函数的选择高度敏感。
摘要:
虽然对节点的近邻求和可以近似离散输入上的任何置换不变函数,但存在集合聚合问题,求和不能推广到无界输入,有文献提出将递归神经网络正则化为排列不变性作为更具表现力的聚合器。本文表明,这些结果可以延伸到图域:配备循环聚合器的 GNN 在合成基准和现实世界问题上都可以与最先进的排列不变聚合器竞争。然而,尽管循环聚合器有好处,但它们的 O(V) 深度使它们既难以并行化又难以在大型图上进行训练。受 GNN 的行为良好的聚合器是其潜空间上的可交换幺半群这一观察结果的启发,本文提出一种构建可学习、可交换、关联二元运算符的框架,构建了一个 O(log V) 深度的聚合器,在并行性和依赖长度方面产生指数级改进,同时实现与循环聚合器竞争的性能。根据经验观察,所提出的可学习可交换幺半群 (LCM) 聚合器代表了高效聚合器和表达聚合器之间的有利权衡。
Graph neural networks (GNNs) have been shown to be highly sensitive to the choice of aggregation function. While summing over a node's neighbours can approximate any permutation-invariant function over discrete inputs, Cohen-Karlik et al. [2020] proved there are set-aggregation problems for which summing cannot generalise to unbounded inputs, proposing recurrent neural networks regularised towards permutation-invariance as a more expressive aggregator. We show that these results carry over to the graph domain: GNNs equipped with recurrent aggregators are competitive with state-of-the-art permutation-invariant aggregators, on both synthetic benchmarks and real-world problems. However, despite the benefits of recurrent aggregators, their O(V) depth makes them both difficult to parallelise and harder to train on large graphs. Inspired by the observation that a well-behaved aggregator for a GNN is a commutative monoid over its latent space, we propose a framework for constructing learnable, commutative, associative binary operators. And with this, we construct an aggregator of O(log V) depth, yielding exponential improvements for both parallelism and dependency length while achieving performance competitive with recurrent aggregators. Based on our empirical observations, our proposed learnable commutative monoid (LCM) aggregator represents a favourable tradeoff between efficient and expressive aggregators.
https://openreview.net/forum?id=WtFobB28VDey
4、[LG] Expander Graph Propagation
A Deac, M Lackenby, P Veličković [Mila & University of Oxford & DeepMind]
拓展器图传播
要点:
1. GNN架构需要线性时间和空间复杂度并避免病态行为;
2. 提出一种基于在扩展器图上传播信息的方法,并进一步提出EGP模型,以解决上述问题;
3. 无瓶颈可扩展的消息传递需要负曲率的边。
摘要:
众所周知,在全图分类或回归任务中部署图神经网络(GNN)是一个挑战:通常需要计算节点特征,这些特征既要考虑其邻域的局部交互,又要考虑图结构的整体上下文。在该空间中漫游的GNN架构需要避免病态行为,如瓶颈和过冲,同时最好有线性的时间和空间复杂性要求。本文提出一种基于在扩展器图上传播信息的优雅方法,提供了一种有效的方法来构建给定大小的扩展器图,并利用该洞察提出了EGP模型。我们表明EGP能够解决上述所有的问题,同时只需要最小的努力来建立,并提供证据证明其在相关数据集和Open Graph Benchmark的基线上的经验效用。重要的是,使用扩展器图作为消息传递的模板,必然会产生负的曲率。虽然从最近关于过冲的相关工作来看,这似乎是反直觉的,但本文从理论上证明,要获得没有瓶颈的可扩展的消息传递,可能需要负弯曲的边缘。
Deploying graph neural networks (GNNs) on whole-graph classification or regression tasks is known to be challenging: it often requires computing node features that are mindful of both local interactions in their neighbourhood and the global context of the graph structure. GNN architectures that navigate this space need to avoid pathological behaviours, such as bottlenecks and oversquashing, while ideally having linear time and space complexity requirements. In this work, we propose an elegant approach based on propagating information over expander graphs. We provide an efficient method for constructing expander graphs of a given size, and use this insight to propose the EGP model. We show that EGP is able to address all of the above concerns, while requiring minimal effort to set up, and provide evidence of its empirical utility on relevant datasets and baselines in the Open Graph Benchmark. Importantly, using expander graphs as a template for message passing necessarily gives rise to negative curvature. While this appears to be counterintuitive in light of recent related work on oversquashing, we theoretically demonstrate that negatively curved edges are likely to be required to obtain scalable message passing without bottlenecks. To the best of our knowledge, this is a previously unstudied result in the context of graph representation learning, and we believe our analysis paves the way to a novel class of scalable methods to counter oversquashing in GNNs.
https://arxiv.org/abs/2210.02997
5、[CL] Prompting Is Programming: A Query Language For Large Language Models
L Beurer-Kellner, M Fischer, M Vechev
[ETH Zurich]
提示即编程: 大型语言模型查询语言
要点:
1. 提出语言模型编程(LMP),推广了语言模型提示;
2. 提出一种新的语言和运行时,称为语言模型查询语言(LMP),用于高效的语言模型提示。
摘要:
大型语言模型在诸如问答和代码生成等广泛的任务中表现出卓越的性能。在一个较高的水平上,给定一个输入,一个语言模型可以被用来以统计学上可能性的方式自动完成序列。在此基础上,用户用语言指令或例子提示这些模型,以实现各种下游任务。高级提示方法甚至可以暗示语言模型、用户和外部工具(如计算器)之间的交互。然而,为了获得最先进的性能或使语言模型适应特定的任务,必须实施复杂的任务和特定模型的程序,这可能仍然需要临时的交互。基于此,本文提出语言模型编程(LMP)的概念,将语言模型提示从纯文本提示推广为文本提示和脚本的直观组合。此外,LMP允许对语言模型的输出进行约束。使得能轻松地适应许多任务,同时抽象出语言模型的内部结构并提供高级语义。为实现LMP,本文实现了LMQL(语言模型查询语言),利用LMP提示的约束和控制流来生成一个高效的推理程序,使对底层语言模型的昂贵调用数量最小化。LMQL能以一种直观的方式捕捉广泛的最先进的提示方法,特别是促进那些用现有高级API实现的具有挑战性的交互流程。评估表明,该方法保持或提高了多个下游任务的准确性,同时在付费使用API的情况下,也大大减少了所需的计算量或成本(节省13-85%的成本)。
Large language models have demonstrated outstanding performance on a wide range of tasks such as question answering and code generation. On a high level, given an input, a language model can be used to automatically complete the sequence in a statistically-likely way. Based on this, users prompt these models with language instructions or examples, to implement a variety of downstream tasks. Advanced prompting methods can even imply interaction between the language model, a user, and external tools such as calculators. However, to obtain state-of-the-art performance or adapt language models for specific tasks, complex task- and model-specific programs have to be implemented, which may still require ad-hoc interaction.Based on this, we present the novel idea of Language Model Programming (LMP). LMP generalizes language model prompting from pure text prompts to an intuitive combination of text prompting and scripting. Additionally, LMP allows constraints to be specified over the language model output. This enables easy adaption to many tasks, while abstracting language model internals and providing high-level semantics. To enable LMP, we implement LMQL (short for Language Model Query Language), which leverages the constraints and control flow from an LMP prompt to generate an efficient inference procedure that minimizes the number of expensive calls to the underlying language model. We show that LMQL can capture a wide range of state-of-the-art prompting methods in an intuitive way, especially facilitating interactive flows that are challenging to implement with existing high-level APIs. Our evaluation shows that we retain or increase the accuracy on several downstream tasks, while also significantly reducing the required amount of computation or cost in the case of pay-to-use APIs (13-85% cost savings).
https://arxiv.org/abs/2212.06094
另外几篇值得关注的论文:
[CV] NeuS2: Fast Learning of Neural Implicit Surfaces for Multi-view Reconstruction
Y Wang, Q Han, M Habermann...
[Peking University & Max Planck Institute for Informatics & University of Pennsylvania]
NeuS2:多视重建神经隐曲面快速学习
要点:
1. 提出NeuS2,用于从静态和动态场景多视RGB输入中快速学习神经表面表征,相比最先进方法有明显的速度和重建质量的提升;
2. 提出为基于ReLU的MLP定制的二阶导数的简单表达,以实现GPU计算的有效并行化;
3. 提出一种从粗到细学习多分辨率哈希编码的渐进式训练策略,可实现更好、更快的训练收敛。
https://arxiv.org/abs/2212.05231
[CV] Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion
T Wang, B Zhang, T Zhang, S Gu, J Bao, T Baltrusaitis, J Shen, D Chen, F Wen, Q Chen, B Guo [Microsoft Research & HKUST]
Rodin:基于扩散的3D数字化身塑形生成模型
要点:
1. 提出3D生成模型Rodin,可有效生成表示为神经光辐射场的3D数字化形象;
2. 基于3个关键元素:3D感知卷积、潜条件和分层合成;
3. 实验表明,Rodin模型是一种强大的3D形象生成模型,可以从图像或文本中自定义形象,以及文本引导的语义编辑。
https://arxiv.org/abs/2212.06135
[CV] The Stable Artist: Steering Semantics in Diffusion Latent Space
M Brack, P Schramowski, F Friedrich, D Hintersdorf, K Kersting
[TU Darmstadt]
稳定艺术家:扩散潜空间中的语义控制
要点:
1. 提出Stable Artist,一种能精细控制图像生成过程的图像编辑方法;
2. 通过语义引导(SEGA),指引扩散过程沿可变数量的语义方向进行;
3. 允许对图像进行微妙的编辑,改变构图和风格,以及优化整体艺术构思。
https://arxiv.org/abs/2212.06013
[CV] MAGVIT: Masked Generative Video Transformer
L Yu, Y Cheng, K Sohn, J Lezama, H Zhang, H Chang, A G. Hauptmann, M Yang, Y Hao, I Essa, L Jiang [Google Research & CMU]
MAGVIT:掩码生成式视频Transformer
要点:
1. 提出掩码生成式视频Transformer(MAGVIT),用了解决多种视频合成任务的单一模型;
2. 提出3D tokenizer和掩码视频Token建模的嵌入方法,以量化视频并促进多任务学习;
3. 实验显示,MAGVIT的质量,效率和灵活性优于现有方法,推理时间快2个数量级。
https://arxiv.org/abs/2212.05199