工劳快讯:汕尾美团骑手罢工取得阶段性胜利

记者调查泉州欣佳酒店倒塌曝惊人“案中案”:曾是卖淫场所,50名老板、官员卷入其中

退出中国市场的著名外企名单

去泰国看了一场“成人秀”,画面尴尬到让人窒息.....

【少儿禁】马建《亮出你的舌苔或空空荡荡》

生成图片,分享到微信朋友圈

自由微信安卓APP发布,立即下载! | 提交文章网址
查看原文

爱可可AI前沿推介(2.10)

爱可可爱生活 爱可可爱生活 2023-02-21

LG - 机器学习 CV - 计算机视觉 CL - 计算与语言 AS - 音频与语音

1、[LG] PFGM++: Unlocking the Potential of Physics-Inspired Generative Models
2、[AS] Noise2Music: Text-conditioned Music Generation with Diffusion Models
3、[LG] Leveraging Demonstrations to Improve Online Learning: Quality Matters
4、[LG] Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions
5、[CL] GPTScore: Evaluate as You Desire
[CL] Concept Algebra for Text-Controlled Vision Models
[AS] Multi-Source Diffusion Models for Simultaneous Music Generation and Separation
[LG] Recent advances in the Self-Referencing Embedding Strings (SELFIES) library
[LG] Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy

摘要:释放物理学启发生成模型的潜力、基于扩散模型的文本条件音乐生成、利用演示数据改善在线学习中质量很重要、基于频繁方向的内存高效自适应正则化、生成文本的意向评估、文本控制视觉模型的概念代数、面向音乐生成和音源分离的多源扩散模型、自引用嵌入字符串(SELFIES)库最新进展、用更便宜的代理函数进行更快的优化

1、[LG] PFGM++: Unlocking the Potential of Physics-Inspired Generative Models

Y Xu, Z Liu, Y Tian, S Tong, M Tegmark, T Jaakkola
[MIT]

PFGM++: 释放物理学启发生成模型的潜力

要点:

  1. 提出 PFGM++,一个新的物理学启发的生成模型族,统一了扩散模型和泊松流生成模型(PFGM);
  2. 一种基于扰动的目标,免除了任何有偏的大批量衍生电场目标,并允许无偏训练;
  3. 通过改变D的值,证明了鲁棒性和刚性之间的权衡,经验结果表明,具有有限D的模型可以获得比扩散模型更好的性能,同时表现出更好的鲁棒性。

一句话总结:
提出一个新的物理学启发的生成模型族PFGM++,统一了扩散模型和泊松流生成模型,并证明它比之前最先进的扩散模型有更好的性能和鲁棒性。

摘要:
提出一个新的物理学启发的生成模型族——PFGM++,统一了扩散模型和泊松流生成模型(PFGM)。这些模型通过在 N+D 维空间中嵌入路径来实现 N 维数据的生成轨迹,同时仍然用 D 个额外变量的简单标量范数来控制进度。当 D=1 时,新模型还原为 PFGM,当 D→∞ 时还原为扩散模型。选择 D 的灵活性使得能权衡鲁棒性和刚性,因为增加 D 会使数据和附加变量准则之间的耦合更加集中。本文摒弃了 PFGM 中使用的有偏的大批量场目标,而是提供了一个类似于扩散模型的无偏的扰动目标。为了探索 D 的不同选择,本文提供了一种直接对齐的方法,将扩散模型(D→∞)的微调好的超参数迁移到任意有限的D值。实验表明,在 CIFAR-10/FFHQ 64×64 数据集上,有限D值的模型可以优于之前最先进的扩散模型,当 D=2048/128 时,FID 得分为 1.91/2.43。此外,本文证明了具有较小 D 的模型对建模错误表现出更好的鲁棒性。

We introduce a new family of physics-inspired generative models termed PFGM++ that unifies diffusion models and Poisson Flow Generative Models (PFGM). These models realize generative trajectories for N dimensional data by embedding paths in N+D dimensional space while still controlling the progression with a simple scalar norm of the D additional variables. The new models reduce to PFGM when D=1 and to diffusion models when D→∞. The flexibility of choosing D allows us to trade off robustness against rigidity as increasing D results in more concentrated coupling between the data and the additional variable norms. We dispense with the biased large batch field targets used in PFGM and instead provide an unbiased perturbation-based objective similar to diffusion models. To explore different choices of D, we provide a direct alignment method for transferring well-tuned hyperparameters from diffusion models (D→∞) to any finite D values. Our experiments show that models with finite D can be superior to previous state-of-the-art diffusion models on CIFAR-10/FFHQ 64×64 datasets, with FID scores of 1.91/2.43 when D=2048/128. In addition, we demonstrate that models with smaller D exhibit improved robustness against modeling errors. Code is available at this https URL

https://arxiv.org/abs/2302.04265


.

2、[AS] Noise2Music: Text-conditioned Music Generation with Diffusion Models

Q Huang, D S. Park, T Wang, T I. Denk, A Ly…
[Google Research]

Noise2Music: 基于扩散模型的文本条件音乐生成

要点:

  1. 提出名为 Noise2Music 的扩散模型族,以文本提示为条件生成30秒的音乐片段;
  2. 训练并连续利用两类扩散模型,即生成器模型和级联器模型,以生成高保真的音乐;
  3. 探索了音乐的中间表现形式的两种选择,即频谱图和低保真音频,并发现生成的音频能够反映文本提示的元素;
    4、 用预训练大型语言模型来生成配对文本并提取文本提示的嵌入,并在训练和生成时、可扩展性和可解释性方面对频谱图和波形图方法进行了比较。

一句话总结:
提出 "Noise2Music",一个利用一系列扩散模型和大型语言模型从文本提示中生成高质量的30秒音乐片段的系统,探索了中间表征的两种选择,生成的音频反映了文本提示的关键元素,并超越了ground细粒度语义。

摘要:
提出 Noise2Music,其中一系列扩散模型被训练用来从文本提示中生成高质量的30秒音乐片段。两种类型的扩散模型,一种是生成器模型,以文本为条件生成中间表征,另一种是级联器模型,以中间表征和可能的文本为条件生成高保真音频,被连续训练和利用来生成高保真音乐。本文探讨了中间表示的两种选择,一种是使用频谱图,另一种是使用低保真度音频。结果发现,生成的音频不仅能够忠实地反映文本提示的关键元素,如流派、节奏、乐器、情绪和时代,还能超越ground提示细粒度语义。预训练的大型语言模型在其中发挥了关键作用——被用来为训练集的音频生成配对文本,并提取由扩散模型摄入的文本提示的嵌入。

We introduce Noise2Music, where a series of diffusion models is trained to generate high-quality 30-second music clips from text prompts. Two types of diffusion models, a generator model, which generates an intermediate representation conditioned on text, and a cascader model, which generates high-fidelity audio conditioned on the intermediate representation and possibly the text, are trained and utilized in succession to generate high-fidelity music. We explore two options for the intermediate representation, one using a spectrogram and the other using audio with lower fidelity. We find that the generated audio is not only able to faithfully reflect key elements of the text prompt such as genre, tempo, instruments, mood, and era, but goes beyond to ground fine-grained semantics of the prompt. Pretrained large language models play a key role in this story -- they are used to generate paired text for the audio of the training set and to extract embeddings of the text prompts ingested by the diffusion models. Generated examples: this https URL

https://arxiv.org/abs/2302.03917



3、[LG] Leveraging Demonstrations to Improve Online Learning: Quality Matters

B Hao, R Jain, T Lattimore, B V Roy, Z Wen
[Deepmind]

利用演示数据改善在线学习:质量很重要

要点:

  1. 调查离线演示数据对在线学习的影响;
  2. 引入"专家能力水平”的概念,作为演示数据质量的衡量标准;
  3. 提出一种通过贝叶斯法则利用演示数据的知情汤普森采样算法;
  4. 经验证明,通过使用由演示数据和专家能力水平提供信息的汤普森采样算法,大大减少了遗憾。

一句话总结:
研究了使用离线演示数据对在线学习的影响,发现演示数据的质量对改进至关重要,提出了一种通过贝叶斯法则利用演示数据的知情汤普森采样算法,建立了一个依赖先验的贝叶斯遗憾界,并通过贝叶斯 bootstrapping 开发了一种实用的近似算法。

摘要:
本文研究了离线演示数据能在多大程度上改善在线学习。期待一些改进是很自然的,但问题是如何改进,以及改进多少?本文表明,改进的程度必须取决于演示数据的质量。为了产生可迁移的见解,本文把汤普森采样(TS)作为在线学习算法和模型的原型应用于多臂老虎机。演示数据是由一个具有特定能力水平的专家产生的,这是本文引入的一个概念。本文提出了一种知情的TS算法,该算法通过贝叶斯规则以一种连贯的方式利用演示数据,并推导出一种依赖先验的贝叶斯遗憾约束。使得能深入了解预训练如何能极大提高在线性能,以及改进的程度如何随着专家能力水平的提高而增加。本文还通过贝叶斯 bootstrapping 开发了一种实用的、近似的知情TS算法,并通过实验显示了大量的经验性遗憾减少。

We investigate the extent to which offline demonstration data can improve online learning. It is natural to expect some improvement, but the question is how, and by how much? We show that the degree of improvement must depend on the quality of the demonstration data. To generate portable insights, we focus on Thompson sampling (TS) applied to a multi-armed bandit as a prototypical online learning algorithm and model. The demonstration data is generated by an expert with a given competence level, a notion we introduce. We propose an informed TS algorithm that utilizes the demonstration data in a coherent way through Bayes' rule and derive a prior-dependent Bayesian regret bound. This offers insight into how pretraining can greatly improve online performance and how the degree of improvement increases with the expert's competence level. We also develop a practical, approximate informed TS algorithm through Bayesian bootstrapping and show substantial empirical regret reduction through experiments.

https://arxiv.org/abs/2302.03319



4、[LG] Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions

V Feinberg, X Chen, Y. J Sun, R Anil, E Hazan
[Google Research & Princeton University]

Sketchy: 基于频繁方向的内存高效自适应正则化

要点:

  1. 提出一种低秩概要的方法,以减少自适应正则化方法中的内存和计算要求;
  2. 利用了深度学习训练任务中 Kronecker 系数梯度协方差矩阵的谱集中在一个小的主导特征空间上;
  3. 通过对频繁方向(FD)概要应用动态对角线正则化,在内存约束下,全矩阵 AdaGrad 遗憾可以恢复到加性谱项。

一句话总结:
提出一种内存高效的自适应正则化方法,使用频繁方向概要来减少深度学习训练任务的内存和计算要求。

摘要:
利用超出对角项的自适应正则化方法在许多任务中表现出最先进的性能,但在内存和运行时间方面可能是令人望而却步的。本文发现深度学习(DL)训练任务中的 Kronecker 系数梯度协方差矩阵的谱集中在一个小的主导特征空间上,这个特征空间在整个训练过程中会发生变化,这促使我们采用低秩概要方法。本文描述了一种通用的方法,以减少用频繁方向(FD)概要维护矩阵预调节器的内存和计算要求。所提出技术允许在资源要求和遗憾保证随秩 k 的下降之间进行插值:在维度为 d 的在线凸优化(OCO)设置下,只用 d-k 内存来匹配全矩阵 d2 内存的遗憾,直到梯度协方差的底部 d-k 特征值的加性误差。此外,本文展示了对 Shampoo 的扩展,将该方法置于几个大规模基准的内存-质量帕累托前沿。

Adaptive regularization methods that exploit more than the diagonal entries exhibit state of the art performance for many tasks, but can be prohibitive in terms of memory and running time. We find the spectra of the Kronecker-factored gradient covariance matrix in deep learning (DL) training tasks are concentrated on a small leading eigenspace that changes throughout training, motivating a low-rank sketching approach. We describe a generic method for reducing memory and compute requirements of maintaining a matrix preconditioner using the Frequent Directions (FD) sketch. Our technique allows interpolation between resource requirements and the degradation in regret guarantees with rank k: in the online convex optimization (OCO) setting over dimension d, we match full-matrix d2 memory regret using only dk memory up to additive error in the bottom d−k eigenvalues of the gradient covariance. Further, we show extensions of our work to Shampoo, placing the method on the memory-quality Pareto frontier of several large scale benchmarks.

https://arxiv.org/abs/2302.03764



5、[CL] GPTScore: Evaluate as You Desire

J Fu, S Ng, Z Jiang, P Liu
[National University of Singapore & CMU]

GPTScore: 生成文本的意向评估

要点:

  1. 提出一种新的评估框架GPTScore,利用生成式预训练模型的能力对生成文本进行评分;
  2. 实现了可定制性、多方面的评价和免训练的评价,而不需要标注样本;
  3. 通过自然语言指示,证明了在22个评价方面和37个数据集上对文本的有效评价。

一句话总结:
提出GPTScore框架,一种使用生成式预训练模型对生成文本进行评价的新框架,提供了可定制、多层面的评价和免训练的方法,在37个数据集和22个评价方面取得了有竞争力的性能。

摘要:
生成式人工智能(AI)使得复杂的模型得以发展,利用大型预训练模型,能生成高水准的文本、图像和其他输出。然而,生成质量的评估是一项比生成本身更艰巨的任务,而该问题最近还没有得到充分的考虑。本文提出了一种新的评估框架GPTScore,利用生成性预训练模型的涌现能力(如零样本指令)对生成文本进行评分。在四个文本生成任务、22个评价方面和相应的37个数据集上的实验结果表明,该方法可以只用简单的自然语言指示有效地实现对文本的意向评价。该特性有助于克服文本评价中几个长期存在的挑战——如何在不需要标注样本的情况下实现定制的、多方位的评价。

Generative Artificial Intelligence (AI) has enabled the development of sophisticated models that are capable of producing high-caliber text, images, and other outputs through the utilization of large pre-trained models. Nevertheless, assessing the quality of the generation is an even more arduous task than the generation itself, and this issue has not been given adequate consideration recently. This paper proposes a novel evaluation framework, GPTScore, which utilizes the emergent abilities (e.g., zero-shot instruction) from generative pre-trained models to score generated texts. Experimental results on four text generation tasks, 22 evaluation aspects, and corresponding 37 datasets demonstrate that this approach can effectively allow us to achieve what one desires to evaluate for texts simply by natural language instructions. This nature helps us overcome several long-standing challenges in text evaluation--how to achieve customized, multi-faceted evaluation without the need for annotated samples. We make our code publicly available at this https URL.

https://arxiv.org/abs/2302.04166




另外几篇值得关注的论文:

[CL] Concept Algebra for Text-Controlled Vision Models

Z Wang, L Gui, J Negrea, V Veitch
[University of Chicago]

文本控制视觉模型的概念代数

要点:

  1. 为文本控制视觉模型提出一种基于数据生成过程中隐含潜概念的"用户意图"的形式化;
  2. 该形式化被用来确定提示的局限,并提出概念代数来克服这些局限性;
  3. 概念代数是一种通过对输入提示的代数运算直接操作输出中表达概念的方法。

一句话总结:
提出一种基于潜概念的概念代数,以解决文本控制生成模型的局限性,通过克服提示的局限性证明了其效用,并为模型控制的未来工作开辟了新的方向。

This paper concerns the control of text-guided generative models, where a user provides a natural language prompt and the model generates samples based on this input. Prompting is intuitive, general, and flexible. However, there are significant limitations: prompting can fail in surprising ways, and it is often unclear how to find a prompt that will elicit some desired target behavior. A core difficulty for developing methods to overcome these issues is that failures are know-it-when-you-see-it -- it's hard to fix bugs if you can't state precisely what the model should have done! In this paper, we introduce a formalization of "what the user intended" in terms of latent concepts implicit to the data generating process that the model was trained on. This formalization allows us to identify some fundamental limitations of prompting. We then use the formalism to develop concept algebra to overcome these limitations. Concept algebra is a way of directly manipulating the concepts expressed in the output through algebraic operations on a suitably defined representation of input prompts. We give examples using concept algebra to overcome limitations of prompting, including concept transfer through arithmetic, and concept nullification through projection. Code available at this https URL.

https://arxiv.org/abs/2302.03693



[AS] Multi-Source Diffusion Models for Simultaneous Music Generation and Separation

G Mariani, I Tallini, E Postolache, M Mancusi, L Cosmo, E Rodolà
[Sapienza University of Rome]

面向音乐生成和音源分离的多源扩散模型

要点:

  1. 提出多源扩散模型(MSDM),通过共享上下文的源的联合概率密度学习,能进行音乐合成和音源分离;
  2. 通过训练MSDM,在去噪分数匹配框架的基础上,对生成任务进行新的表述,包括音源的归属;
  3. 提出推理时基于狄拉克三角函数的源分离的新采样程序。

一句话总结:
提出一种用于同步音乐生成和分离的多源扩散模型(MSDM),基于去噪分数匹配,可通过对混合的先验条件和对所得后验分布的采样来处理生成和分离任务,并在推理时引入一种新的分离采样方法,在分离任务中取得了有竞争力的结果。

In this work, we define a diffusion-based generative model capable of both music synthesis and source separation by learning the score of the joint probability density of sources sharing a context. Alongside the classic total inference tasks (i.e. generating a mixture, separating the sources), we also introduce and experiment on the partial inference task of source imputation, where we generate a subset of the sources given the others (e.g., play a piano track that goes well with the drums). Additionally, we introduce a novel inference method for the separation task. We train our model on Slakh2100, a standard dataset for musical source separation, provide qualitative results in the generation settings, and showcase competitive quantitative results in the separation setting. Our method is the first example of a single model that can handle both generation and separation tasks, thus representing a step toward general audio models.

https://arxiv.org/abs/2302.02257


[LG] Recent advances in the Self-Referencing Embedding Strings (SELFIES) library

A Lo, R Pollice, A Nigam, A D. White, M Krenn, A Aspuru-Guzik
[University of Toronto & Stanford University & University of Rochester & MPL]

自引用嵌入字符串(SELFIES)库最新进展

要点:

  1. SELFIES是一种新的分子表示法,具有高鲁棒性,解决了传统的基于字符串的表示法的问题;
  2. 自2019年首次发布以来,SELFIES库经历了重大变化,重点是扩展功能,提高用户友好性,并使实施速度更快;
  3. 发布最新版本selfies 2.1.1,详细介绍了其历史、发展、算法、设计和性能。

一句话总结:
介绍了分子物质SELFIES表示法的开源Python实现库的最新版本,讨论了它的历史、发展、设计、性能和未来计划,使其成为该领域的标准表示法。

String-based molecular representations play a crucial role in cheminformatics applications, and with the growing success of deep learning in chemistry, have been readily adopted into machine learning pipelines. However, traditional string-based representations such as SMILES are often prone to syntactic and semantic errors when produced by generative models. To address these problems, a novel representation, SELF-referencIng Embedded Strings (SELFIES), was proposed that is inherently 100% robust, alongside an accompanying open-source implementation. Since then, we have generalized SELFIES to support a wider range of molecules and semantic constraints and streamlined its underlying grammar. We have implemented this updated representation in subsequent versions of \selfieslib, where we have also made major advances with respect to design, efficiency, and supported features. Hence, we present the current status of \selfieslib (version 2.1.1) in this manuscript.

https://arxiv.org/abs/2302.03620


[LG] Two Losses Are Better Than One: Faster Optimization Using a Cheaper Proxy

B Woodworth, K Mishchenko, F Bach
[PSL Research University & Samsung AI Center]

两个损失比一个强:用更便宜的代理函数进行更快的优化

要点:

  1. 提出一种新算法,通过使用相关的、更容易获得的代理函数作为替代,使具有难以计算梯度的目标函数最小化;
  2. 该算法保证了收敛的速度与平滑目标随机梯度下降相匹配,从而提高了采样效率;
  3. 该算法允许用一个简单的标准精确地解决近似点子问题,该标准可以在执行过程中评估,并由于子问题目标的强凸性而有效地满足。

一句话总结:
提出一种新的优化算法,利用更便宜、更容易获得的函数作为具有难以计算梯度的目标函数的代理,具有收敛保证并在机器学习中有效应用。

We present an algorithm for minimizing an objective with hard-to-compute gradients by using a related, easier-to-access function as a proxy. Our algorithm is based on approximate proximal point iterations on the proxy combined with relatively few stochastic gradients from the objective. When the difference between the objective and the proxy is δ-smooth, our algorithm guarantees convergence at a rate matching stochastic gradient descent on a δ-smooth objective, which can lead to substantially better sample efficiency. Our algorithm has many potential applications in machine learning, and provides a principled means of leveraging synthetic data, physics simulators, mixed public and private data, and more.

https://arxiv.org/abs/2302.03542



文章有问题?点此查看未经处理的缓存