机器人相关学术速递[12.24]

格林先生MrGreen arXiv每日学术速递 2022-04-26

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.RO机器人相关，共计8篇

【1】 Learning Cooperative Multi-Agent Policies with Partial Reward Decoupling
标题：基于部分报酬解耦的协作多Agent策略学习
链接：https://arxiv.org/abs/2112.12740

作者：Benjamin Freed,Aditya Kapoor,Ian Abraham,Jeff Schneider,Howie Choset
备注：in IEEE Robotics and Automation Letters
摘要：将多智能体强化学习扩展到大量智能体的一个突出障碍是为单个智能体的行为分配信用。在本文中，我们使用一种称为{部分报酬解耦}（PRD）的方法来解决这个信用分配问题，该方法试图将大型合作多代理RL问题分解为涉及代理子集的解耦子问题，从而简化信用分配。我们的经验表明，与其他各种actor-critic方法相比，在actor-critic算法中使用PRD分解RL问题会导致较低的方差策略梯度估计，从而提高数据效率、学习稳定性和跨多agent RL任务的渐近性能。此外，我们将我们的方法与反事实多智能体策略梯度（COMA）相关联，这是一种最先进的MARL算法，并通过经验证明，我们的方法通过更好地利用智能体奖励流中的信息，以及通过使用优势估计的最新进展，优于COMA。
摘要：One of the preeminent obstacles to scaling multi-agent reinforcement learning to large numbers of agents is assigning credit to individual agents' actions. In this paper, we address this credit assignment problem with an approach that we call \textit{partial reward decoupling} (PRD), which attempts to decompose large cooperative multi-agent RL problems into decoupled subproblems involving subsets of agents, thereby simplifying credit assignment. We empirically demonstrate that decomposing the RL problem using PRD in an actor-critic algorithm results in lower variance policy gradient estimates, which improves data efficiency, learning stability, and asymptotic performance across a wide array of multi-agent RL tasks, compared to various other actor-critic approaches. Additionally, we relate our approach to counterfactual multi-agent policy gradient (COMA), a state-of-the-art MARL algorithm, and empirically show that our approach outperforms COMA by making better use of information in agents' reward streams, and by enabling recent advances in advantage estimation to be used.

【2】 Towards Disturbance-Free Visual Mobile Manipulation
标题：走向无干扰的视觉移动操作
链接：https://arxiv.org/abs/2112.12612

作者：Tianwei Ni,Kiana Ehsani,Luca Weihs,Jordi Salvador
机构：Universit´e de Montr´eal & Mila, ∗ Work was primarily done during internship at AI, † Equal advising
摘要：嵌入式人工智能在大量机器人任务的仿真中显示了有希望的结果，包括视觉导航和操纵。以前的工作通常追求高成功率和最短路径，而在很大程度上忽略了交互过程中碰撞引起的问题。这种缺乏优先级的情况是可以理解的：在模拟环境中，破坏虚拟对象没有固有的成本。因此，尽管最终取得了成功，但训练有素的代理经常会与对象发生灾难性碰撞。在机器人领域，碰撞的成本非常高，避免碰撞是确保机器人能够安全部署在现实世界中的一个长期而关键的主题。在这项工作中，我们朝着无碰撞/干扰的嵌入式人工智能代理迈出了第一步，实现了视觉移动操作，促进了在真实机器人中的安全部署。我们开发了一种新的干扰避免方法，其核心是干扰预测的辅助任务。当与干扰惩罚相结合时，我们的辅助任务通过将干扰知识提取到agent中，极大地提高了样本效率和最终性能。我们在Manufolhor上的实验表明，与原始基线相比，我们的方法在测试具有新对象的场景时，成功率从61.7%提高到85.6%，无干扰的成功率从29.8%提高到50.2%。广泛的消融研究表明了我们的管道方法的价值。项目现场位于https://sites.google.com/view/disturb-free
摘要：Embodied AI has shown promising results on an abundance of robotic tasks in simulation, including visual navigation and manipulation. The prior work generally pursues high success rates with shortest paths while largely ignoring the problems caused by collision during interaction. This lack of prioritization is understandable: in simulated environments there is no inherent cost to breaking virtual objects. As a result, well-trained agents frequently have catastrophic collision with objects despite final success. In the robotics community, where the cost of collision is large, collision avoidance is a long-standing and crucial topic to ensure that robots can be safely deployed in the real world. In this work, we take the first step towards collision/disturbance-free embodied AI agents for visual mobile manipulation, facilitating safe deployment in real robots. We develop a new disturbance-avoidance methodology at the heart of which is the auxiliary task of disturbance prediction. When combined with a disturbance penalty, our auxiliary task greatly enhances sample efficiency and final performance by knowledge distillation of disturbance into the agent. Our experiments on ManipulaTHOR show that, on testing scenes with novel objects, our method improves the success rate from 61.7% to 85.6% and the success rate without disturbance from 29.8% to 50.2% over the original baseline. Extensive ablation studies show the value of our pipelined approach. Project site is at https://sites.google.com/view/disturb-free

【3】 PandaSet: Advanced Sensor Suite Dataset for Autonomous Driving
标题：PandaSet：用于自动驾驶的高级传感器套件数据集
链接：https://arxiv.org/abs/2112.12610

作者：Pengchuan Xiao,Zhenlei Shao,Steven Hao,Zishuo Zhang,Xiaolin Chai,Judy Jiao,Zesong Li,Jian Wu,Kai Sun,Kun Jiang,Yunlong Wang,Diange Yang
机构： School of Vehicle and Mobility, TsinghuaUniversity
备注：This paper has been published on ITSC'2021, please check the website of the PandaSet for more information: this https URL
摘要：自动驾驶技术的加速发展对获取大量高质量数据提出了更高的要求。具有代表性的、有标签的真实世界数据是训练深度学习网络的燃料，对于改进自驾驶感知算法至关重要。在本文中，我们介绍了PandaSet，这是第一个由一个完整的、高精度的、具有免费商业许可证的自动车辆传感器套件生成的数据集。使用一台360度机械旋转激光雷达、一台前向远程激光雷达和6台摄像机收集数据集。该数据集包含100多个场景，每个场景长8秒，并提供28种类型的对象分类标签和37种类型的语义分割标签。我们为纯激光雷达三维目标检测、激光雷达相机融合三维目标检测和激光雷达点云分割提供基线。有关PandaSet和开发工具包的更多详细信息，请参阅https://scale.com/open-datasets/pandaset.
摘要：The accelerating development of autonomous driving technology has placed greater demands on obtaining large amounts of high-quality data. Representative, labeled, real world data serves as the fuel for training deep learning networks, critical for improving self-driving perception algorithms. In this paper, we introduce PandaSet, the first dataset produced by a complete, high-precision autonomous vehicle sensor kit with a no-cost commercial license. The dataset was collected using one 360{\deg} mechanical spinning LiDAR, one forward-facing, long-range LiDAR, and 6 cameras. The dataset contains more than 100 scenes, each of which is 8 seconds long, and provides 28 types of labels for object classification and 37 types of labels for semantic segmentation. We provide baselines for LiDAR-only 3D object detection, LiDAR-camera fusion 3D object detection and LiDAR point cloud segmentation. For more details about PandaSet and the development kit, see https://scale.com/open-datasets/pandaset.

【4】 Curriculum Learning for Safe Mapless Navigation
标题：关于安全无人驾驶的课程学习
链接：https://arxiv.org/abs/2112.12490

作者：Luca Marzari,Davide Corsi,Enrico Marchesini,Alessandro Farinelli
机构：Computer Science Department, University of Verona, Verona, Italy
备注：8 pages, 5 figures. The poster version of this paper has been accepted by The 37th ACM/SIGAPP Symposium on Applied Computing Proceedings (SAC IRMAS 2022)
摘要：这项工作调查了基于课程学习（CL）的方法对代理绩效的影响。我们特别关注mapless机器人导航的安全方面，与标准端到端（E2E）训练策略进行比较。为此，我们提出了一种CL方法，利用基于统一的模拟中的学习转移（ToL）和微调，以Robotnik Kairos作为机器人代理。为了进行公平比较，我们的评估考虑了每种学习方法的同等计算需求（即，相同数量的交互和环境难度），并确认我们基于CL的使用ToL的方法优于E2E方法。特别是，我们提高了平均成功率和经过训练的策略的安全性，从而在看不见的测试场景中减少了10%的冲突。为了进一步证实这些结果，我们使用了一个正式的验证工具来量化强化学习策略在期望规范下的正确行为数量。
摘要：This work investigates the effects of Curriculum Learning (CL)-based approaches on the agent's performance. In particular, we focus on the safety aspect of robotic mapless navigation, comparing over a standard end-to-end (E2E) training strategy. To this end, we present a CL approach that leverages Transfer of Learning (ToL) and fine-tuning in a Unity-based simulation with the Robotnik Kairos as a robotic agent. For a fair comparison, our evaluation considers an equal computational demand for every learning approach (i.e., the same number of interactions and difficulty of the environments) and confirms that our CL-based method that uses ToL outperforms the E2E methodology. In particular, we improve the average success rate and the safety of the trained policy, resulting in 10% fewer collisions in unseen testing scenarios. To further confirm these results, we employ a formal verification tool to quantify the number of correct behaviors of Reinforcement Learning policies over desired specifications.

【5】 Globally convergent visual-feature range estimation with biased inertial measurements
标题：有偏惯性测量下全局收敛的视觉特征范围估计
链接：https://arxiv.org/abs/2112.12325

作者：Bowen Yi,Chi Jin,Ian R. Manchester
机构：Australian Centre for Field Robotics, The University of Sydney, Sydney, NSW , Australia, Sydney Institute for Robotics and Intelligent Systems, Sydney, NSW , Australia, DJI Innovation Inc. Shenzhen , China
摘要：基于视觉信息的特征点全局收敛位置观测器的设计是一个具有挑战性的问题，特别是对于只有惯性测量且没有一致可观测性假设的情况，这一问题一直存在。我们在本文中给出了一个问题的解决方案，假设只有一个特征点的方位，以及机器人的偏置线加速度和旋转速度——都在身体固定的框架内——是可用的。此外，与现有的相关结果相比，我们也不需要引力常数的值。提出的方法基于最近在（Ortega等人，Syst.Control.Lett.，vol.852015）中开发的基于参数估计的观测器及其在我们之前工作中对矩阵李群的扩展。给出了机器人轨迹上观测器收敛的条件，这些条件严格弱于标准的激励持续性和一致完全可观测性条件。最后，我们将所提出的设计应用于视觉惯性导航问题。仿真结果也说明了我们的观测器设计。
摘要：The design of a globally convergent position observer for feature points from visual information is a challenging problem, especially for the case with only inertial measurements and without assumptions of uniform observability, which remained open for a long time. We give a solution to the problem in this paper assuming that only the bearing of a feature point, and biased linear acceleration and rotational velocity of a robot -- all in the body-fixed frame -- are available. Further, in contrast to existing related results, we do not need the value of the gravitational constant either. The proposed approach builds upon the parameter estimation-based observer recently developed in (Ortega et al., Syst. Control. Lett., vol.85, 2015) and its extension to matrix Lie groups in our previous work. Conditions on the robot trajectory under which the observer converges are given, and these are strictly weaker than the standard persistency of excitation and uniform complete observability conditions. Finally, we apply the proposed design to the visual inertial navigation problem. Simulation results are also presented to illustrate our observer design.

【6】 Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning
标题：通过REACH-AUSE强化学习保证安全性和活动性
链接：https://arxiv.org/abs/2112.12288

作者：Kai-Chieh Hsu,Vicenç Rubies-Royo,Claire J. Tomlin,Jaime F. Fisac
机构：∗Department of Electrical and Computer Engineering, Princeton University, United States, †Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, United States
备注：Accepted in Robotics: Science and Systems (RSS), 2021
摘要：Reach避免最优控制问题，即系统必须达到特定的目标条件，同时避免不可接受的故障模式，是自治机器人系统安全性和活性保证的核心，但对于复杂的动力学和环境，其精确解是难以解决的。最近，强化学习方法在近似解决具有性能目标的最优控制问题方面取得的成功使其在认证问题中的应用具有吸引力；然而，强化学习中使用的拉格朗日型目标不适合编码时态逻辑需求。最近的工作显示了将强化学习机制扩展到安全型问题的希望，安全型问题的目标不是总和，而是随时间推移的最小值（或最大值）。在这项工作中，我们推广了强化学习公式来处理到达-避免范畴中的所有最优控制问题。我们推导了一个具有收缩映射性质的时间折扣到达避免Bellman备份，并证明了由此产生的到达避免Q-学习算法在类似于传统Lagrange型问题的条件下收敛，从而得到到达避免集的任意紧保守近似。我们进一步展示了该公式与深度强化学习方法的使用，通过在模型预测监控框架中将近似解视为不可信预言来保持零违反保证。我们在一系列非线性系统上评估了我们提出的框架，通过解析解和数值解验证了结果，并通过对以前棘手问题的蒙特卡罗模拟验证了结果。我们的研究结果开启了一系列基于学习的安全和实时自主行为方法的大门，应用于机器人技术和自动化领域。看见https://github.com/SafeRoboticsLab/safety_rl代码和补充资料。
摘要：Reach-avoid optimal control problems, in which the system must reach certain goal conditions while staying clear of unacceptable failure modes, are central to safety and liveness assurance for autonomous robotic systems, but their exact solutions are intractable for complex dynamics and environments. Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive; however, the Lagrange-type objective used in reinforcement learning is not suitable to encode temporal logic requirements. Recent work has shown promise in extending the reinforcement learning machinery to safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time. In this work, we generalize the reinforcement learning formulation to handle all optimal control problems in the reach-avoid category. We derive a time-discounted reach-avoid Bellman backup with contraction mapping properties and prove that the resulting reach-avoid Q-learning algorithm converges under analogous conditions to the traditional Lagrange-type problem, yielding an arbitrarily tight conservative approximation to the reach-avoid set. We further demonstrate the use of this formulation with deep reinforcement learning methods, retaining zero-violation guarantees by treating the approximate solutions as untrusted oracles in a model-predictive supervisory control framework. We evaluate our proposed framework on a range of nonlinear systems, validating the results against analytic and numerical solutions, and through Monte Carlo simulation in previously intractable problems. Our results open the door to a range of learning-based methods for safe-and-live autonomous behavior, with applications across robotics and automation. See https://github.com/SafeRoboticsLab/safety_rl for code and supplementary material.

【7】 Safety assurance of an industrial robotic control system using hardware/software co-verification
标题：基于软硬件协同验证的工业机器人控制系统安全性保证
链接：https://arxiv.org/abs/2112.12248

作者：Yvonne Murray,Martin Sirevåg,Pedro Ribeiro,David A. Anisi,Morten Mossige
机构：Dept. of Mechatronics, University of Agder (UiA), Norway, Dept. of Computer Science, University of York, UK, Robotics Group, Norwegian University of Life Sciences (NMBU), Norway, ABB Robotics, Bryne, Norway
摘要：作为工业机器人技术的一个总趋势，越来越多的安全功能正在开发或重新设计，以通过软件而不是通过物理硬件（如安全继电器或联锁电路）来处理。这一趋势加强了用形式验证和模型检查方法补充传统的、基于输入的测试和质量程序的重要性，这些程序在当今工业中广泛使用。为此，本文重点介绍ABB工业喷漆机器人中具有代表性的安全关键系统，即高压静电控制系统（HVC）。HVC产生的高压对安全运行至关重要，其实际收敛性通过一个新的通用协同验证框架进行正式验证，其中硬件和软件模型通过平台映射进行关联。这种方法可以实现高度多样化和专业化工具的务实组合。本文的主要贡献包括详细介绍如何在工具之间传递硬件抽象和验证结果，以验证系统级安全属性。值得注意的是，本文考虑的HVC应用有一个相当通用的反馈控制器形式。因此，此处报告的共同验证框架和经验也与跟踪设定点参考的任何网络物理系统高度相关。
摘要：As a general trend in industrial robotics, an increasing number of safety functions are being developed or re-engineered to be handled in software rather than by physical hardware such as safety relays or interlock circuits. This trend reinforces the importance of supplementing traditional, input-based testing and quality procedures which are widely used in industry today, with formal verification and model-checking methods. To this end, this paper focuses on a representative safety-critical system in an ABB industrial paint robot, namely the High-Voltage electrostatic Control system (HVC). The practical convergence of the high-voltage produced by the HVC, essential for safe operation, is formally verified using a novel and general co-verification framework where hardware and software models are related via platform mappings. This approach enables the pragmatic combination of highly diverse and specialised tools. The paper's main contribution includes details on how hardware abstraction and verification results can be transferred between tools in order to verify system-level safety properties. It is noteworthy that the HVC application considered in this paper has a rather generic form of a feedback controller. Hence, the co-verification framework and experiences reported here are also highly relevant for any cyber-physical system tracking a setpoint reference.

【8】 Real-Time Multi-Convex Model Predictive Control for Occlusion Free Target Tracking
标题：无遮挡目标跟踪的实时多凸模型预测控制
链接：https://arxiv.org/abs/2112.12177

作者：Houman Masnavi,Vivek Adajania,Karl Kruusamae,Arun Kumar Singh
机构： AsVivek Adajania is with the University of Toronto and the rest of theauthors are with the University of Tartu
摘要：提出了一种模型预测控制（MPC）算法，用于静态和动态障碍物之间的目标跟踪。我们的主要贡献在于提高了基础非凸轨迹优化的计算可处理性和可靠性。结果是MPC算法在笔记本电脑和嵌入式硬件设备（如Jetson TX2）上实时运行。我们的方法依赖于跟踪、碰撞和遮挡约束的新形式，这些约束在最终的轨迹优化中导致多凸结构。我们使用分裂Bregman迭代技术利用这些数学结构，最终将MPC简化为一系列可在几毫秒内求解的凸二次规划。MPC的快速重新规划允许在复杂环境中进行遮挡和无碰撞跟踪，即使考虑目标轨迹和动态障碍物的简单等速预测。我们在一个真实的物理引擎中执行了大量的基准测试，并表明我们的MPC在可见性、平滑度和计算时间度量方面优于最先进的算法。
摘要：This paper proposes a Model Predictive Control (MPC) algorithm for target tracking amongst static and dynamic obstacles. Our main contribution lies in improving the computational tractability and reliability of the underlying non-convex trajectory optimization. The result is an MPC algorithm that runs real-time on laptops and embedded hardware devices such as Jetson TX2. Our approach relies on novel reformulations for the tracking, collision, and occlusion constraints that induce a multi-convex structure in the resulting trajectory optimization. We exploit these mathematical structures using the split Bregman Iteration technique, eventually reducing our MPC to a series of convex Quadratic Programs solvable in a few milliseconds. The fast re-planning of our MPC allows for occlusion and collision-free tracking in complex environments even while considering a simple constant-velocity prediction for the target trajectory and dynamic obstacles. We perform extensive bench-marking in a realistic physics engine and show that our MPC outperforms the state-of-the-art algorithms in visibility, smoothness, and computation-time metrics.

机器翻译，仅供参考

点击“阅读原文”获取带摘要的学术速递

反向激励，在加速这个社会的黑化

呼伦贝尔跨省抓捕，我弟弟指居期间死亡，泣求自治区调查真相

把抄袭说的如此冠冕堂皇，雷军让年轻人丢掉了耻辱感

女律师白天上法庭，晚上去卖淫，协会判定：具有“良好品质”，有资格担任律师！

一般人看不到的内部公众号！（刚刚开放）

机器人相关学术速递[12.24]

您可能也对以下帖子感兴趣

反向激励，在加速这个社会的黑化

呼伦贝尔跨省抓捕，我弟弟指居期间死亡，泣求自治区调查真相

把抄袭说的如此冠冕堂皇，雷军让年轻人丢掉了耻辱感

女律师白天上法庭，晚上去卖淫，协会判定：具有“良好品质”，有资格担任律师！

一般人看不到的内部公众号！（刚刚开放）

生成图片，分享到微信朋友圈

机器人相关学术速递[12.24]

您可能也对以下帖子感兴趣