AIRS in the AIR 预告 | Multi-Agent Reinforcement Learning
第4期
Multi-Agent Reinforcement Learning
— Human-AI Coordination and Cognition
在过去的十年里,我们看到了很多人工智能在竞技中打败人类的案例,例如围棋、游戏等,由此引发一轮又一轮的关注和热议。
人工智能真的只能用于超越人类吗?我们是否能构建出可以与人类合作的人工智能?如何赋予人工智能系统社会认知和社会属性,并构建人工智能系统与人共同协作的社会?
相信大家对上述问题也有自己的想法,本月22号,AIRS in the AIR 第四期将邀请两位 AI 领域的专家,让我们听听他们对这些问题的看法。
近年来,DeepMind 已经成为 AI 领域的明星,不仅取得了不少创新性进展,更开创多个 AI 领域里程碑事件。本期的第一位讲者正是来自 DeepMind 的研究科学家 Joel Z. Leibo,他将分享 DeepMind 在多智能体强化学习方面的相关研究,探讨如何构建能够像人一样合作的通用人工智能。Joel Z. Leibo 于2013年在 MIT 获得博士学位,博士期间,他师从 Tamaso Poggio,从事大脑与认知科学相关研究。
另一位讲者是来自牛津大学工程科学系的 Jakob Foerster 副教授,他也曾在 DeepMind 有过工作经历。在当天讲座中,他将围绕多智能体强化学习,与观众分享如何构建可以与人类共同协作的人工智能系统。Jakob Foerster 多年来一直致力于多智能体强化学习的相关研究工作,也曾在 Facebook 人工智能研究中心担任研究科学家,从事相关领域的基础性工作。
接下来,就让我们期待明天的 AIRS in the AIR 活动,一起发现多智能体强化学习相关的最新研究进展吧!
01
执行主席
查宏远
AIRS 副院长
香港中文大学(深圳)教授、数据科学学院执行院长
02
讲座嘉宾
Joel Z. Leibo
DeepMind 研究员
Joel Z. Leibo is a research scientist at DeepMind. He obtained his PhD in 2013 from MIT where he worked on the computational neuroscience of face recognition with Tomaso Poggio. Nowadays, Joel's research is aimed at the following questions:
- How can we get deep reinforcement learning agents to perform complex cognitive behaviors like cooperating with one another in groups?
- How should we evaluate the performance of deep reinforcement learning agents?
- How can we model processes like cumulative culture that gave rise to unique aspects of human intelligence?
Jakob Foerster
牛津大学工程科学系副教授
Jakob Foerster started as an Associate Professor at the department of engineering science at the University of Oxford in the fall of 2021. During his PhD at Oxford he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind.
After his PhD he worked as a research scientist at Facebook AI Research in California, where he continued doing foundational work. He was the lead organizer of the first Emergent Communication workshop at NeurIPS in 2017, which he has helped organize ever since and was awarded a prestigious CIFAR AI chair in 2019.
His past work addresses how AI agents can learn to cooperate and communicate with other agents, most recently he has been developing and addressing the zero-shot coordination problem setting.
03
讲座介绍
主题报告:Reverse engineering the social-cognitive capacities, representations, and motivations that underpin human cooperation to help build cooperative artificial general intelligence
报告嘉宾:Joel Z. Leibo
As a route to building cooperative artificial general intelligence, I propose we try to reverse engineer human cooperation. As humans, we employ a set of social-cognitive capacities, representations, and motivations which underlie our critical ability to cooperate with one another.
Here I will argue that we need to figure out how human cooperation works so that we can build general artificial intelligence that cooperates like humans do. Specifically, in this talk I will describe how to use Melting Pot, an evaluation methodology and suite of test scenarios for multi-agent reinforcement learning, to further this goal of reverse engineering human cooperation in order to build cooperative artificial general intelligence.
主题报告:Zero-shot coordination and off-belief
报告嘉宾:Jakob Foerster
There has been a large body of work studying how agents can learn communication protocols in decentralized settings, using their actions to communicate information. Surprisingly little work has studied how this can be prevented, yet this is a crucial prerequisite from a human-AI coordination and AI-safety point of view.
The standard problem setting in Dec-POMDPs is self-play, where the goal is to find a set of policies that play optimally together. Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time. To address this, we present off-belief learning (OBL). At each timestep OBL agents follow a policy pi_1 that is optimized assuming past actions were taken by a given, fixed policy, pi_0, but assuming that future actions will be taken by pi_1. When pi_0 is uniform random, OBL converges to an optimal policy that does not rely on inferences based on other agents' behavior.
OBL can be iterated in a hierarchy, where the optimal policy from one level becomes the input to the next, thereby introducing multi-level cognitive reasoning in a controlled manner. Unlike existing approaches, which may converge to any equilibrium policy, OBL converges to a unique policy, making it suitable for zero-shot coordination (ZSC).
OBL can be scaled to high-dimensional settings with a fictitious transition mechanism and shows strong performance in both a toy-setting and the benchmark human-AI & ZSC problem Hanabi.
活动时间
2022年3月22日 16:00 - 18:00
参与方式
请通过下方二维码免费报名线上观看
AIRS in the AIR 为 AIRS 重磅推出的系列活动,每周二与您相约线上,一起探索人工智能与机器人领域的前沿技术、产业应用、发展趋势。