LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
1、[LG] On a continuous time model of gradient descent dynamics and instability in deep learning
2、[LG] Languages are Rewards: Hindsight Finetuning using Human Feedback
3、[CV] CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets
4、[CV] Zero-shot Image-to-Image Translation
5、[LG] Equivariant Architectures for Learning in Deep Weight Spaces
[LG] ReLOAD: Reinforcement Learning with Optimistic Ascent-Descent for Last-Iterate Convergence in Constrained MDPs
[CV] Structure and Content-Guided Video Synthesis with Diffusion Models
[LG] Generating Novel, Designable, and Diverse Protein Structures by Equivariantly Diffusing Oriented Residue Clouds
[LG] Double Permutation Equivariance for Knowledge Graph Completion
摘要: 深度学习中梯度下降动力学和不稳定性的连续时间模型研究、语言即奖励、基于层次标签集的零样本图像分类、零样本图像到图像变换、深度权重空间学习的等变架构、面向约束MDP最终迭代收敛的基于乐观上升-下降的强化学习、基于扩散模型的结构和内容引导视频合成、基于等变扩散定向残基云的可设计多样化新蛋白质结构生成、基于双重置换等变的知识图谱补全
M Rosca, Y Wu, C Qin, B Dherin
[DeepMind & Google]
深度学习中梯度下降动力学和不稳定性的连续时间模型研究
要点:
一句话总结:
关键流(PF)是一种新的连续时间流,用来捕捉深度学习中的梯度下降动力学行为,提供了对不稳定性的洞察,并导致了控制稳定性和性能权衡的漂移调整学习率(DAL)的提出。
摘要:
深度学习成功背后的秘诀是神经网络和基于梯度的优化的结合。然而,对梯度下降行为的理解,特别是其不稳定性,已经落后于其经验上的成功。为了增加研究梯度下降的理论工具,本文提出关键流(PF),一种接近梯度下降动力学的连续时间流。PF 是唯一能捕捉到梯度下降的发散和振荡行为的连续流,包括摆脱局部最小值和鞍点。通过它对Hessian的特征分解的依赖,PF 揭示了最近在深度学习中观察到的边缘稳定性现象。利用对不稳定性的新理解,本文提出一种学习率适应方法,使得能控制训练稳定性和测试集评估性能之间的权衡。
The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.
https://arxiv.org/abs/2302.01952
H Liu, C Sferrazza, P Abbeel
[UC Berkeley]
语言即奖励:人工反馈的事后微调
要点:
一句话总结:
提出了一种新技术,即事后微调链(CoHF),用于从多样化的人工反馈中学习,并证明其在摘要和对话任务中优于有监督微调,在自动评估中具有更好的效果。
摘要:
从人的偏好中学习,对于语言模型对人的帮助和作用,以及与人和社会价值对齐是非常重要的。现有的工作集中在对预训练模型进行有监督的微调,基于标注人所偏好的模型代。这类工作在理解和遵循指令方面取得了显著的成功(例如InstructGPT、ChatGPT等)。然而,到目前为止,监督式微调的一个关键限制是它不能从负面评价中学习;模型只在正面评价的数据上训练,这使得它的数据效率很低。因为收集人类的反馈数据既费时又费钱,所以模型从所有反馈中学习是至关重要的,类似于人从不同的反馈中学习的卓越能力。本文提出一种称为"事后微调”(Hindsight Finetuning)的新技术,使语言模型从不同的人工反馈中学习。该想法是以人如何从事后经验中学习为动机的。把模型的条件放在与事后反馈配对的模型代序列,并对模型进行微调,以预测最喜欢的输出。通过这样做,模型可以学会识别和纠正负面属性或错误。将该方法应用于GPT-J,可以观察到,在使用相同数量的人工反馈的情况下,能显著提高摘要和对话任务的结果。
Learning from human preferences is important for language models to be helpful and useful for humans, and to align with human and social values. Existing works focus on supervised finetuning of pretrained models, based on curated model generations that are preferred by human labelers. Such works have achieved remarkable successes in understanding and following instructions (e.g., InstructGPT, ChatGPT, etc). However, to date, a key limitation of supervised finetuning is that it cannot learn from negative ratings; models are only trained on positive-rated data, which makes it data inefficient. Because collecting human feedback data is both time consuming and expensive, it is vital for the model to learn from all feedback, akin to the remarkable ability of humans to learn from diverse feedback. In this work, we propose a novel technique called Hindsight Finetuning for making language models learn from diverse human feedback. In fact, our idea is motivated by how humans learn from hindsight experience. We condition the model on a sequence of model generations paired with hindsight feedback, and finetune the model to predict the most preferred output. By doing so, models can learn to identify and correct negative attributes or errors. Applying the method to GPT-J, we observe that it significantly improves results on summarization and dialogue tasks using the same amount of human feedback.
https://arxiv.org/abs/2302.02676
Z Novack, S Garg, J McAuley, Z C. Lipton
[UC San Diego & CMU]
CHiLS: 基于层次标签集的零样本图像分类
要点:
一句话总结:
CHiLS 是一种用分层标签集进行零样本图像分类的新方法,提高了现有模型的性能,并且不需要额外的训练或微调。
摘要:
开放词表模型(如CLIP)通过其基于(自然语言)名称为每个类生成嵌入的能力,在零样本分类中表现出强大的性能。之前的工作集中在通过提示工程或通过纳入少量标注的下游数据(通过微调)来提高这些模型的准确性。然而,很少有人关注改善类名称本身的丰富性,当类标签定义粗略且不具信息性时,就会产生问题。本文提出基于分层标签集的分类(CHiLS),一种专门为具有隐含语义层次的数据集设计的零样本分类的替代策略。CHiLS 分三个步骤进行。(i) 对于每个类,使用现有的标签层次或通过查询GPT-3,产生一组子类;(ii) 执行标准的零样本 CLIP 程序,好像这些子类是感兴趣的标签;(iii) 将预测的子类映射回它的父类,产生最终的预测。在众多具有底层层次结构的数据集中,CHiLS 在有和没有基础层次信息的情况下都能提高准确性。CHiLS很容易在现有的CLIP管道中实现,且不需要额外的训练成本。
Open vocabulary models (e.g. CLIP) have shown strong performance on zero-shot classification through their ability generate embeddings for each class based on their (natural language) names. Prior work has focused on improving the accuracy of these models through prompt engineering or by incorporating a small amount of labeled downstream data (via finetuning). However, there has been little focus on improving the richness of the class names themselves, which can pose issues when class labels are coarsely-defined and uninformative. We propose Classification with Hierarchical Label Sets (or CHiLS), an alternative strategy for zero-shot classification specifically designed for datasets with implicit semantic hierarchies. CHiLS proceeds in three steps: (i) for each class, produce a set of subclasses, using either existing label hierarchies or by querying GPT-3; (ii) perform the standard zero-shot CLIP procedure as though these subclasses were the labels of interest; (iii) map the predicted subclass back to its parent to produce the final prediction. Across numerous datasets with underlying hierarchical structure, CHiLS leads to improved accuracy in situations both with and without ground-truth hierarchical information. CHiLS is simple to implement within existing CLIP pipelines and requires no additional training cost.
https://arxiv.org/abs/2302.02551
G Parmar, K K Singh, R Zhang, Y Li, J Lu, J Zhu
[CMU & Adobe Research]
零样本图像到图像变换
要点:
一句话总结:
零样本图像到图像变换方法"pix2pix-zero"通过自动发现编辑方向和交叉注意引导,保留了原始图像的内容,而无需人工提示。
摘要:
大规模文本到图像生成模型已经显示出其合成多样化和高质量图像的显著能力。然而,由于两个原因,直接将这些模型应用于编辑真实图像仍然是一个挑战。首先,用户很难想出一个完美的文字提示,准确描述输入图像中的每一个视觉细节。其次,虽然现有的模型可以在某些区域引入理想的变化,但往往会极大地改变输入内容,在不需要的区域引入意想不到的变化。本文提出 pix2pix-zero,一种图像到图像的变换方法,可保留原始图像的内容,而无需人工提示。首先自动发现反映文本嵌入空间中所需编辑的编辑方向。为了在编辑后保留通用的内容结构,提出交叉注意力引导,其目的是在整个扩散过程中保留输入图像的交叉注意力图。该方法不需要为这些编辑进行额外的训练,可以直接使用现有的预训练的文本到图像的扩散模型。广泛的实验表明,该方法在真实和合成图像编辑方面都优于现有的和同期的工作。
Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard for users to come up with a perfect text prompt that accurately describes every visual detail in the input image. Second, while existing models can introduce desirable changes in certain regions, they often dramatically alter the input content and introduce unexpected changes in unwanted regions. In this work, we propose pix2pix-zero, an image-to-image translation method that can preserve the content of the original image without manual prompting. We first automatically discover editing directions that reflect desired edits in the text embedding space. To preserve the general content structure after editing, we further propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. In addition, our method does not need additional training for these edits and can directly use the existing pre-trained text-to-image diffusion model. We conduct extensive experiments and show that our method outperforms existing and concurrent works for both real and synthetic image editing.
https://arxiv.org/abs/2302.03027
A Navon, A Shamsian, I Achituve, E Fetaya, G Chechik, H Maron
[Bar-Ilan University & NVIDIA]
深度权重空间学习的等变架构
要点:
一句话总结:
提出一种用于深度权重空间学习的新的网络架构,与MLP的权重的自然置换对称性等变,表征了仿射等变层的空间,分析了其表达能力,并在各种学习任务中展示了优势。
摘要:
设计机器学习架构,以处理神经网络的原始权重矩阵形式,是一个新提出的研究方向。不幸的是,深度权重空间的独特对称结构使得这种设计非常具有挑战性。如果成功的话,这样的架构将能执行各种有趣的任务,从使预训练网络适应新域到编辑以函数(INRs或NeRFs)表示的对象。作为实现这一目标的第一步,本文提出一种用于在深度权重空间学习的新的网络架构。将预训练MLP的权重和偏置串联作为输入,用与MLP权重的自然置换对称等变的层组成来处理它。改变MLP中间层的神经元顺序并不影响它所代表的函数。本文为这些对称性提供了所有仿生等变层和不变层的完整特征,并展示了如何用三种基本操作来实现这些层:池化、广播和以适当方式应用于输入的全连接层。在各种学习任务中,证明了所提出架构的有效性及其对自然基线的优势。
Designing machine learning architectures for processing neural networks in their raw weight matrix form is a newly introduced research direction. Unfortunately, the unique symmetry structure of deep weight spaces makes this design very challenging. If successful, such architectures would be capable of performing a wide range of intriguing tasks, from adapting a pre-trained network to a new domain to editing objects represented as functions (INRs or NeRFs). As a first step towards this goal, we present here a novel network architecture for learning in deep weight spaces. It takes as input a concatenation of weights and biases of a pre-trained MLP and processes it using a composition of layers that are equivariant to the natural permutation symmetry of the MLP's weights: Changing the order of neurons in intermediate layers of the MLP does not affect the function it represents. We provide a full characterization of all affine equivariant and invariant layers for these symmetries and show how these layers can be implemented using three basic operations: pooling, broadcasting, and fully connected layers applied to the input in an appropriate manner. We demonstrate the effectiveness of our architecture and its advantages over natural baselines in a variety of learning tasks.
https://arxiv.org/abs/2301.12780
另外几篇值得关注的论文:
T Moskovitz, B O'Donoghue, V Veeriah, S Flennerhag, S Singh, T Zahavy
[DeepMind & UCL]
ReLOAD: 面向约束MDP最终迭代收敛的基于乐观上升-下降的强化学习
要点:
一句话总结:
提出基于乐观上升-下降的强化学习(ReLOAD),一种原则性约束强化学习(CRL)方法,在表格和函数近似设置中保证最后迭代收敛,在具有挑战性的CRL问题上优于现有算法。
In recent years, Reinforcement Learning (RL) has been applied to real-world problems with increasing success. Such applications often require to put constraints on the agent's behavior. Existing algorithms for constrained RL (CRL) rely on gradient descent-ascent, but this approach comes with a caveat. While these algorithms are guaranteed to converge on average, they do not guarantee last-iterate convergence, i.e., the current policy of the agent may never converge to the optimal solution. In practice, it is often observed that the policy alternates between satisfying the constraints and maximizing the reward, rarely accomplishing both objectives simultaneously. Here, we address this problem by introducing Reinforcement Learning with Optimistic Ascent-Descent (ReLOAD), a principled CRL method with guaranteed last-iterate convergence. We demonstrate its empirical effectiveness on a wide variety of CRL problems including discrete MDPs and continuous control. In the process we establish a benchmark of challenging CRL problems.
https://arxiv.org/abs/2302.01275
P Esser, J Chiu, P Atighehchian, J Granskog, A Germanidis
[Runway]
基于扩散模型的结构和内容引导视频合成
要点:
一句话总结:
提出一种结构和内容引导的视频扩散模型,用于基于视觉或文本描述的视频生成和编辑,对时间、内容和结构的一致性进行控制,并能在特定主题上进行微调。
Text-guided generative diffusion models unlock powerful image creation and editing tools. While these have been extended to video generation, current approaches that edit the content of existing footage while retaining structure require expensive re-training for every input or rely on error-prone propagation of image edits across frames. In this work, we present a structure and content-guided video diffusion model that edits videos based on visual or textual descriptions of the desired output. Conflicts between user-provided content edits and structure representations occur due to insufficient disentanglement between the two aspects. As a solution, we show that training on monocular depth estimates with varying levels of detail provides control over structure and content fidelity. Our model is trained jointly on images and videos which also exposes explicit control of temporal consistency through a novel guidance method. Our experiments demonstrate a wide variety of successes; fine-grained control over output characteristics, customization based on a few reference images, and a strong user preference towards results by our model.
https://arxiv.org/abs/2302.03011
Y Lin, M AlQuraishi
[Columbia University]
基于等变扩散定向残基云的可设计多样化新蛋白质结构生成
要点:
一句话总结:
提出一种使用等变扩散概率模型进行蛋白质设计的新方法(Genie),以其对蛋白质残基的双重表示和等变推理机制超越了现有方法。
Proteins power a vast array of functional processes in living cells. The capability to create new proteins with designed structures and functions would thus enable the engineering of cellular behavior and development of protein-based therapeutics and materials. Structure-based protein design aims to find structures that are designable (can be realized by a protein sequence), novel (have dissimilar geometry from natural proteins), and diverse (span a wide range of geometries). While advances in protein structure prediction have made it possible to predict structures of novel protein sequences, the combinatorially large space of sequences and structures limits the practicality of search-based methods. Generative models provide a compelling alternative, by implicitly learning the low-dimensional structure of complex data distributions. Here, we leverage recent advances in denoising diffusion probabilistic models and equivariant neural networks to develop Genie, a generative model of protein structures that performs discrete-time diffusion using a cloud of oriented reference frames in 3D space. Through in silico evaluations, we demonstrate that Genie generates protein backbones that are more designable, novel, and diverse than existing models. This indicates that Genie is capturing key aspects of the distribution of protein structure space and facilitates protein design with high success rates. Code for generating new proteins and training new versions of Genie is available at this https URL.
https://arxiv.org/abs/2301.12485
J Gao, Y Zhou, B Ribeiro
[Purdue University]
基于双重置换等变的知识图谱补全
要点:
一句话总结:
提出知识图谱(KG)的"双重置换等变"概念,提供了双重等变神经网络架构的蓝图,可以在KG中执行复杂的逻辑推理任务,并在归纳性KG补全任务中取得可喜的成果。
This work provides a formalization of Knowledge Graphs (KGs) as a new class of graphs that we denote doubly exchangeable attributed graphs, where node and pairwise (joint 2-node) representations must be equivariant to permutations of both node ids and edge (& node) attributes (relations & node features). Double-permutation equivariant KG representations open a new research direction in KGs. We show that this equivariance imposes a structural representation of relations that allows neural networks to perform complex logical reasoning tasks in KGs. Finally, we introduce a general blueprint for such equivariant representations and test a simple GNN-based double-permutation equivariant neural architecture that achieve 100% Hits@10 test accuracy in both the WN18RRv1 and NELL995v1 inductive KG completion tasks, and can accurately perform logical reasoning tasks that no existing methods can perform, to the best of our knowledge.
https://arxiv.org/abs/2302.01313