工劳快讯:汕尾美团骑手罢工取得阶段性胜利

记者调查泉州欣佳酒店倒塌曝惊人“案中案”:曾是卖淫场所,50名老板、官员卷入其中

退出中国市场的著名外企名单

去泰国看了一场“成人秀”,画面尴尬到让人窒息.....

【少儿禁】马建《亮出你的舌苔或空空荡荡》

生成图片,分享到微信朋友圈

自由微信安卓APP发布,立即下载! | 提交文章网址
查看原文

爱可可AI前沿推介(12.11)

爱可可爱生活 爱可可爱生活 2022-12-16


LG - 机器学习   CV - 计算机视觉   CL - 计算与语言   AS - 音频与语音 RO - 机器人 GR - 图形学

摘要:多语种语言模型语言中立子网络发现、在单个Transformer中对检索和阅读进行端到端学习、分解复杂问题的连续提示法、提高多模态模型零样本泛化性和鲁棒性、基于含噪几何先验的即时神经场景重建加速、基于交叉相机融合的全景3D目标追踪、语义图像合成类特定GAN控制发现、深度模型组装、3TB自由授权源代码数据集

1、[CL] Discovering Language-neutral Sub-networks in Multilingual Language Models

N Foroutan, M Banaei, R Lebret, A Bosselut, K Aberer
[EPFL]

多语种语言模型语言中立子网络发现

多语言预训练语言模型在不同语言间转换效果非常好。本文对mBERT进行了评估,表明语言编码子网络是语言中立的,并且在跨语言和任务迁移时取得了很高的性能,模型中的语言中立成分在其跨语言迁移中起着关键作用。

多语言预训练语言模型在跨语言下游任务上有明显的迁移。然而,其在多大程度上学习了语言中立的表示(即在不同语言中编码类似现象的共享表示),以及这种表示对跨语言迁移性能的影响,仍然是开放的问题。本文将多语言模型的语言中立性概念化为这些模型的语言编码子网络之间重叠的功能,采用彩票假说来发现为各种语言和任务单独优化的子网络。本文对三种不同的任务和11种不同类型的语言进行的评估表明,不同语言的子网络在拓扑上是相似的(即语言中立),这使得它们在性能下降有限的情况下有效地进行跨语言迁移的初始化。

Multilingual pre-trained language models transfer remarkably well on cross-lingual downstream tasks. However, the extent to which they learn language-neutral representations (i.e., shared representations that encode similar phenomena across languages), and the effect of such representations on cross-lingual transfer performance, remain open questions. In this work, we conceptualize language neutrality of multilingual models as a function of the overlap between language-encoding sub-networks of these models. We employ the lottery ticket hypothesis to discover sub-networks that are individually optimized for various languages and tasks. Our evaluation across three distinct tasks and eleven typologically-diverse languages demonstrates that sub-networks for different languages are topologically similar (i.e., language-neutral), making them effective initializations for cross-lingual transfer with limited performance degradation.

https://arxiv.org/abs/2205.12672

2、[CL] Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer

Z Jiang, L Gao, J Araki, H Ding, Z Wang, J Callan, G Neubig
[CMU & Bosch Research]

检索即注意力:在单个Transformer中对检索和阅读进行端到端学习

提出检索即注意力(ReAt),一种只基于终端任务监督的端到端学习的单个Transformer模型,展示了具有竞争力的检索和质量保证性能,并在有监督和无监督的情况下容易自适应到其他域。

用于知识密集型任务的系统,如开放域问答(QA),通常包括两个阶段:从大型语料库中有效地检索相关文档,并详细阅读选定文档以生成答案。检索器和阅读器通常是分开建模的,这就需要繁琐的实现,而且很难以端到端方式进行训练和调整。本文重新审视了这种设计,摒弃了单独的架构和训练,而采用了一个执行检索即注意(ReAt)的单个Transformer,以及完全基于终端QA任务的监督的端到端训练。本文首次证明,一个端到端训练的单一模型可以实现有竞争力的检索和QA性能,与最先进的单独训练的检索器和阅读器相匹配或略胜一筹。此外,在有监督和无监督的情况下,端到端的适应性大大提升了其在域外数据集上的性能,使该模型成为知识密集型任务的简单和可适应的解决方案。

Systems for knowledge-intensive tasks such as open-domain question answering (QA) usually consist of two stages: efficient retrieval of relevant documents from a large corpus and detailed reading of the selected documents to generate answers. Retrievers and readers are usually modeled separately, which necessitates a cumbersome implementation and is hard to train and adapt in an end-to-end fashion. In this paper, we revisit this design and eschew the separate architecture and training in favor of a single Transformer that performs Retrieval as Attention (ReAtt), and end-to-end training solely based on supervision from the end QA task. We demonstrate for the first time that a single model trained end-to-end can achieve both competitive retrieval and QA performance, matching or slightly outperforming state-of-the-art separately trained retrievers and readers. Moreover, end-to-end adaptation significantly boosts its performance on out-of-domain datasets in both supervised and unsupervised settings, making our model a simple and adaptable solution for knowledge-intensive tasks. Code and models are available at this https URL.

https://arxiv.org/abs/2212.02027


3、[CL] Successive Prompting for Decomposing Complex Questions

D Dua, S Gupta, S Singh, M Gardner
[University of California, Irvine & Microsoft]

分解复杂问题的连续提示法

连续提示法是一种将复杂问题分解为简单的QA对的方法,从而使模块化QD和QA系统可以独立地进行训练和查询,这种模块化方法对于解决复杂任务比单独的大型语言模型更有效。

回答需要做出潜在决策的复杂问题是一项具有挑战性的任务,尤其是在监督有限的情况下。最近的工作利用大型语言模型(LM),通过演示如何在单次解答复杂问题的同时输出中间合理化,在少样本情况下执行复杂的问答。本文提出“连续提示”,将一个复杂的任务迭代地分解成简单任务并寻求解答,重复该过程,直到得到最终解答。连续提示将分解复杂问题的监督与回答简单问题的监督分开,使得能够 (1)在每个推理步骤中有多个机会查询上下文中的示例 (2)问题分解学习与问答学习分开,包括使用合成数据,以及  (3)在大型语言模型表现不佳的情况下使用定制(微调)组件进行推理步骤。中间监督通常是手动编写的,收集起来可能很昂贵。本文提出一种生成合成数据集的方法,该数据集可用于引导模型分解和回答中间问题的能力。与具有相同监督的最先进模型相比,最好的模型(基于连续提示)在 DROP 数据集的少样本版本上实现了约 5% 的绝对 F1 改进。

Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting by demonstrating how to output intermediate rationalizations while solving the complex question in a single pass. We introduce ``Successive Prompting'', where we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution. Successive prompting decouples the supervision for decomposing complex questions from the supervision for answering simple questions, allowing us to (1) have multiple opportunities to query in-context examples at each reasoning step (2) learn question decomposition separately from question answering, including using synthetic data, and (3) use bespoke (fine-tuned) components for reasoning steps where a large LM does not perform well. The intermediate supervision is typically manually written, which can be expensive to collect. We introduce a way to generate a synthetic dataset which can be used to bootstrap a model's ability to decompose and answer intermediate questions. Our best model (with successive prompting) achieves an improvement of ~5% absolute F1 on a few-shot version of the DROP dataset when compared with a state-of-the-art model with the same supervision.

https://arxiv.org/abs/2212.04092

4、[CV] Improving Zero-shot Generalization and Robustness of Multi-modal Models

Y Ge, J Ren, Y Wang, A Gallagher, M Yang, L Itti, H Adam, B Lakshminarayanan, J Zhao
[Google Research]

提高多模态模型零样本泛化性和鲁棒性

聚焦多模态模型的零样本泛化性和鲁棒性问题,通过一致性识别和WordNet层次结构增强的事后方法提高了top-1准确性,这是一种有效且无超参数的方法,可推广到不同的数据集和多模态架构。

CLIP 和 LiT 等多模态图像-文本模型在图像分类基准上表现出了令人印象深刻的性能,其零样本泛化能力尤其令人兴奋。虽然这些模型的top-5零样本准确率非常高,但top-1准确率要低得多(在某些情况下差距超过 25%)。本文调查了造成这种性能差距的原因,发现许多失败案例都是由文本提示中的歧义引起的。本文提出一种简单有效的零样本事后方法,通过测量在多个提示和图像变换上的预测一致性来识别其top-1预测可能不正确的图像。 该程序可以更好地预测错误,在选择性预测任务上优于流行的max logit基线。本文提出一种简单有效的方法来利用 WordNet 层次结构来提高此类不确定图像的准确性,通过从语义标签层次结构中合并其父类和子类来扩充原始类,并将扩充插入文本提示中。用五个不同的基于 ImageNet 的数据集对 CLIP 和 LiT 模型进行实验。对于 CLIP,该方法在不确定子集上将 top-1 精度提高了 17.13%,在整个 ImageNet 验证集上提高了 3.6%。该方法在 ImageNet 移位数据集和其他模型架构(例如 LiT)中得到了改进。所提出的方法是无超参数的,不需要额外的模型训练,可以很容易地扩展到其他大型多模态架构。

Multi-modal image-text models such as CLIP and LiT have demonstrated impressive performance on image classification benchmarks and their zero-shot generalization ability is particularly exciting. While the top-5 zero-shot accuracies of these models are very high, the top-1 accuracies are much lower (over 25% gap in some cases). We investigate the reasons for this performance gap and find that many of the failure cases are caused by ambiguity in the text prompts. First, we develop a simple and efficient zero-shot post-hoc method to identify images whose top-1 prediction is likely to be incorrect, by measuring consistency of the predictions w.r.t. multiple prompts and image transformations. We show that our procedure better predicts mistakes, outperforming the popular max logit baseline on selective prediction tasks. Next, we propose a simple and efficient way to improve accuracy on such uncertain images by making use of the WordNet hierarchy; specifically we augment the original class by incorporating its parent and children from the semantic label hierarchy, and plug the augmentation into text promts. We conduct experiments on both CLIP and LiT models with five different ImageNet-based datasets. For CLIP, our method improves the top-1 accuracy by 17.13% on the uncertain subset and 3.6% on the entire ImageNet validation set. We also show that our method improves across ImageNet shifted datasets and other model architectures such as LiT. Our proposed method is hyperparameter-free, requires no additional model training and can be easily scaled to other large multi-modal architectures.

https://arxiv.org/abs/2212.01758

5、[CV] INGeo: Accelerating Instant Neural Scene Reconstruction with Noisy Geometry Priors

C Li, B Wu, A Pumarola, P Zhang, Y Lin, P Vajda
[Georgia Institute of Technology & Meta Reality Labs]

INGeo:基于含噪几何先验的即时神经场景重建加速

INGeo 是一种加速 3D 场景重建的方法,建立在 SotA Instant-NGP 的基础上,并添加了几何先验,提出三种降低噪声的策略,可用一半的训练迭代达到 >30 的测试 PSNR,边缘设备可以两倍速在几秒钟内准确重建场景。

本文提出一种加速3D场景和物体重建的方法,旨在实现在手机和 AR/VR 耳机等边缘设备上的即时重建。虽然最近的工作已将高端 GPU 上的场景重建训练加速到分钟/秒级,但与边缘设备上的即时训练目标仍有很大差距,而这在许多新兴应用程序(如沉浸式 AR/VR)中仍然非常需要。本文工作旨在通过利用目标场景的几何先验来进一步加速训练。本文提出了减轻不完美几何先验噪声的策略,以在高度优化的 Instant-NGP 上加快训练速度。在 NeRF 合成数据集上,本文用一半的训练迭代来达到 >30 的平均测试 PSNR。

We present a method that accelerates reconstruction of 3D scenes and objects, aiming to enable instant reconstruction on edge devices such as mobile phones and AR/VR headsets. While recent works have accelerated scene reconstruction training to minute/second-level on high-end GPUs, there is still a large gap to the goal of instant training on edge devices which is yet highly desired in many emerging applications such as immersive AR/VR. To this end, this work aims to further accelerate training by leveraging geometry priors of the target scene. Our method proposes strategies to alleviate the noise of the imperfect geometry priors to accelerate the training speed on top of the highly optimized Instant-NGP. On the NeRF Synthetic dataset, our work uses half of the training iterations to reach an average test PSNR of >30.

https://arxiv.org/abs/2212.01959


另外几篇值得关注的论文:

[CV] CC-3DT: Panoramic 3D Object Tracking via Cross-Camera Fusion

T Fischer, Y Yang, S Kumar, M Sun, F Yu
[ETH Zurich]

CC-3DT:基于交叉相机融合的全景3D目标追踪

CC-3DT提供了一种全景3D目标追踪方法,在关联之前融合来自多个相机的 3D 检测,以减少身份切换提高运动模型质量,可实现更好的目标关联和更平滑的轨迹。

https://arxiv.org/abs/2212.01247

[CV] Discovering Class-Specific GAN Controls for Semantic Image Synthesis

E Schönfeld, J Borges, V Sushko, B Schiele, A Khoreva
[Bosch Center for AI & MPI for Informatics]

语义图像合成类特定GAN控制发现

Ctrl-SIS通过优化过程发现可解释的类特定潜方向,允许在不影响图像中其他类的情况下对目标语义类进行局部更改,可产生高质量和多样性的图像编辑。

https://arxiv.org/abs/2212.01455

[CV] Deep Model Assembling

Z Ni, Y Wang, J Yu, H Jiang, Y Cao, G Huang
[Tsinghua University & Beijing Academy of Artificial Intelligence]

深度模型组装

为降低训练成本减少
过拟合,采用分而治之的策略,将大模型分成小模块,独立训练,再重新组装得到目标模型。通过全局共享元模型来隐式地将所有模块链接在一起,以保证独立训练模块的兼容性。实现了高性能和高效率的完全分布式训练,在计算和数据方面都显著优于端到端训练。

https://arxiv.org/abs/2212.04129

[CL] The Stack: 3 TB of permissively licensed source code

D Kocetkov, R Li, L B Allal, J Li, C Mou...
[ServiceNow Research & Hugging Face]

The Stack:3TB自由授权源代码数据集

The Stack是一个3.1TB的数据集,由30种编程语言的自由授权源代码组成,实验表明,近似重复数据删除是获得有竞争力的结果的重要预处理步骤。未来的计划涉及删除
PII 和恶意代码的方法,以及支持开发人员将其数据从数据集中删除。

https://arxiv.org/abs/2211.15533



文章有问题?点此查看未经处理的缓存