查看原文
其他

IJTCS 2021 | 分论坛日程:多智能体强化学习与算法博弈论


编者按


第二届国际理论计算机联合大会(International Joint Conference on Theoretical Computer Science,IJTCS)将于2021年8月16日-20日在线上线下交互举行,由北京大学中国工业与应用数学学会(CSIAM)、中国计算机学会(CCF)、国际计算机学会中国委员会(ACM China Council)联合主办,北京大学前沿计算研究中心承办,图灵奖获得者、中科院外籍院士、北京大学访问讲席教授John Hopcroft教授任大会主席。


本期带来多智能体强化学习“算法博弈论”分论坛精彩介绍。

多智能体强化学习”介绍

多智能体强化学习是研究“群体智能学习”的新一代重要理论与方法,是近年来兴起的深度强化学习与历史悠久的博弈论交叉形成的新研究方向。多智能体深度强化学习主要研究环境中多个具有学习能力的智能体间合作与竞争策略的优化算法,目前在算法理论上已取得了一系列进展,同时在游戏智能体、智能交通、智能供应链等领域具有广泛的应用前景。


“多智能体强化学习”分论坛主席

 

Haifeng Zhang

Institute of Automation,

Chinese Academy of Sciences

 

Wenxin Li

Peking University


“多智能体强化学习”分论坛议程

主持人:张海峰

时间:2021年8月17日

时间

讲者

报告题目

09:00-09:25 

蒲志强

Integrating knowledge-based and data-driven paradigms for collective intelligence decision making: algorithms and experiments

09:30-09:55 

温颖

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

10:00-10:25 

余超

Reinforcement Learning for Incomplete Information Games


时间:2021年8月19日

时间

讲者

报告题目

20:00-20:25 

杨耀东

Dealing with Non-transitivity in Two-player Zero-sum Games


“多智能体强化学习”分论坛报告简介


 

Integrating knowledge-based and data-driven paradigms for collective intelligence decision making: algorithms and experiments

Zhiqiang Pu, Institute of Automation, Chinese Academy of Sciences

Abstract

Collective intelligence (CI) shows promising application prospects. Current research methodologies of intelligent decision making for CI systems can be categorized as knowledge-based and data-driven methods, both showing inherent advantages and disadvantages. Therefore, we claim that integrating knowledge-based and data-driven paradigms offers a new and prospective research direction. In this talk, some possible methods of this integration are introduced, and all of these methods are classified into a framework level and an algorithm level. As examples, two representative algorithms are exemplified, one for a multi-agent formation maintenance and collision avoidance task, the other for a multi-agent aera coverage and connectivity maintenance problem. We also glance over some other CI related algorithms proposed by our team and showcase some experimental demos based on unmanned aerial vehicles and unmanned ground robots developed by our team.


Biography

Zhiqiang Pu received the Ph. D. degree in control theory and control engineering from Institute of Automation, Chinese Academy of Sciences (CASIA), Beijing, China, in 2014. His research interests include nonlinear control, collective intelligence, and unmanned autonomous systems. He is currently an associate professor in CASIA, and also the vice president of Taizhou Institute of Intelligent Manufacturing. He has been appointed as members of several national technical committees related to unmanned autonomous systems and collective intelligence. He published 60+ papers, owned 20+ patents, and won several conference paper awards such as WCICA Best Paper award and IFAC-ICONS Best Paper Finalist. He has also been supported by the Talent Program of Youth Innovation Promotion Association CAS since 2017. He won the Youth Talent Award of Chinese Institute of Command and Control 2020, and the 1st Prize and 2nd Prize of the National Multi-agent Combat Challenge in 2020, in the heterogeneous group and homogeneous group, respectively.


 

MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning

Ying Wen, Shanghai Jiao Tong University

Abstract

Population-based multi-agent reinforcement learning (PB-MARL) refers to the series of methods nested with reinforcement learning (RL) algorithms, which produces a self-generated sequence of tasks arising from the coupled population dynamics. By leveraging auto-curricula to induce a population of distinct emergent strategies, PB-MARL has achieved impressive success in tackling multi-agent tasks. Despite remarkable prior arts of distributed RL frameworks, PB-MARL poses new challenges for parallelizing the training frameworks due to the additional complexity of multiple nested workloads between sampling, training and evaluation involved with heterogeneous policy interactions. To solve these problems, we present MALib, a scalable and efficient computing framework for PB-MARL. Our framework is comprised of three key components: (1) a centralized task dispatching model, which supports the self-generated tasks and scalable training with heterogeneous policy combinations; (2) a programming architecture named Actor-Evaluator-Learner, which achieves high parallelism for both training and sampling, and meets the evaluation requirement of auto-curriculum learning; (3) a higher-level abstraction of MARL training paradigms, which enables efficient code reuse and flexible deployments on different distributed computing paradigms. Experiments on a series of complex tasks such as multi-agent Atari Games show that MALib achieves throughput higher than 40K FPS on a single machine with 32 CPU cores; 5x speedup than RLlib and at least 3x speedup than OpenSpiel in multi-agent training tasks. MALib is publicly available at https://malib.io .


Biography

Ying Wen is a tenure-track Assistant Professor in John Hopcroft Center for Computer Science at Shanghai Jiao Tong University. His research interests include reinforcement learning, multi-agent systems and machine learning systems. He has published over 20 research papers about multi-agent reinforcement learning on top-tier international conferences(ICML, ICLR, IJCAI, and AAMAS). He has been serving as a PC member at ICML, NeurIPS, ICLR, AAAI, IJCAI, ICAPS and a reviewer at TIFS,Operational Research etc. He was granted Best Paper Award in AAMAS 2021 Blue Sky Track and the Best System Paper Award in CoRL 2020.


 

Reinforcement Learning for Incomplete Information Games

Chao Yu, Sun Yat-sen University

Abstract

Incomplete information games (IIGs) exist widely in real life, such as chess and card games, network security, military deployment and so on. General solutions to IIGs are thus of great practical importance, and are also considered to be one of the most fundamental topics in AI research. In this talk, I will introduce several RL solutions to IIGs, from perspectives of theoretical algorithms and training models. Evaluation in the Texas Hold'em environment proves the efficiency of these RL algorithms in addressing IIGs problems.


Biography

Chao Yu received his PhD degree from the University of Wollongong, Australia in 2013. Now, he is an Associate Professor in the School of Computer Science and Engineering, Sun Yat-Sen University (SYSU), Guangzhou, China. Dr. Yu is the receiptant of “Hongkong Scholar” of China and “100 Talents Program” of SYSU. His main research interest includes but is not limited to Reinforcement Learning, Multiagent Systems, Game Theory, and their wide applications in transportations, financial and healthcare etc. He has published more than 70 research papers in rigorously referred journals and conferences such as IEEE TNNLS, IEEE CYB, IEEE ITS, IEEE TVT. He holds more than 20 research funding from organizations such as NFSC, Tencent, Huawei.


 

Dealing with Non-transitivity in Two-player Zero-sum Games

Yaodong Yang, King's College London

Abstract

The issue of non-transitivity (A wins B, B wins C, but A cannot win C) in the strategy space renders many challenges in designing effective learning algorithms to solve two-player zero-sum games. Such an issue exists in many real-world games, such as StarCraft, Chinese Chess, and Pokers. In this talk, I will introduce our recent works on how to design an effective league training method that can generate agents that approximate the Nash equilibrium in two-player zero-sum games.


Biography

Yaodong is an assistant professor at King's College London. His research is about reinforcement learning and multi-agent systems. He has maintained a track record of more than 30 research papers at top venues along with best paper awards at CoRL 2020 and AAMAS 2021 (blue sky track). Before KCL, he was a principal research scientist at Huawei UK where he headed the multi-agent system team in London, working on autonomous driving applications. Before Huawei, he was a senior research manager at AIG, working on AI applications in finance. He holds a PhD degree from UCL, an MSc degree from Imperial College London and a Bachelor degree from USTC.

算法博弈论”介绍

随着互联网和移动网络的兴起,博客社交网络、微博、微信等自媒体为众多群体活动提供了大规模的交流平台;互联网的兴起以及信息技术、通信技术的高速发展为人类生活带来了福利,促进了商业、经济的发展。在大规模网络环境下,个人自利行为的相互节制与系统机制的整体约束,成为虚拟社会、经济系统以及社交网络在成长过程中保持稳定的一类重要因素。在此背景之下,算法博弈论脱颖而出,使得人们可以利用算法博弈论对在大规模网络环境下,研究人与人、人与网络交互系统,以及人与市场规则相互作用下的规律。


算法博弈论作为理论计算机科学的一个新兴领域,旨在利用计算机学科的工具对博弈论中的问题进行算法设计与理论分析。它与微观经济学和传统博弈论的不同之处在于:一是应用领域方面的不同,主要包括互联网和非传统拍卖;二是应用定量工程性的方法,从具体优化问题的角度对应用进行建模,寻求最优解、判断不可解问题以及研究可解优化的上下限问题;三是涉及计算效率问题,设计的算法需要高效地执行,算法博弈论将计算效率作为算法实施必须考虑的限制条件。


在信息技术快速发展的今天,各类新兴技术的出现为算法博弈论赋予了新的研究背景,也为算法博弈论带来更多活力,在计算机科学理论与技术的推动下,算法博弈论正成为跨学科的重要方法论之一。


“算法博弈论”分论坛主席

   

Yukun Cheng

Suzhou University of Science and Technology

Zhihao Tang

Shanghai University of Finance and Economics

Zhengyang Liu

Beijing Institute of Technology


“算法博弈论”分论坛议程

主持人:程郁琨

时间:2021年8月16日

时间

讲者

报告题目

13:00-13:55 

伏虎

Learning Utilities and Equilibria in Non-Truthful Auctions

14:00-14:25 

曹志刚

Abstract Market Games with Gross Substitutes / Complements

14:30-14:55 

杜野

Strongly Robust Exotic Option Pricing Via No-Regret Learning

15:00-15:25 

王长军

Assortment games under Markov Chain Choice Model


“算法博弈论”分论坛报告简介


 

Learning Utilities and Equilibria in Non-Truthful Auctions

Hu Fu, Shanghai University of Finance and Economics

Abstract

In non-truthful auctions, agents' utility for a strategy depends on the strategies of the opponents and also the prior distribution over their private types; the set of Bayes Nash equilibria generally has an intricate dependence on the prior. Using the First Price Auction as our main demonstrating example, we show that  samples from the prior with n agents suffice for an algorithm to learn the interim utilities for all monotone bidding strategies, up to  additive error.  As a consequence, this number of samples suffice for learning all approximate equilibria.  We give almost matching (up to polylog factors) lower bound on the sample complexity for learning utilities.


Biography

Hu Fu is associate professor at the Institute for Theoretical Computer Science (ITCS) at Shanghai University of Finance and Economics (SHUFE).  Before joining SHUFE, he was assistant professor at the University of British Columbia from 2016 to 2020.  He earned his PhD in computer science at Cornell University, supervised by Bobby Kleinberg.  Subsequently, he was a postdoc at Microsoft Research, New England Lab and Caltech.  Hu's research interest is mainly in computational questions in economics and online algorithms.


 

Abstract Market Games with Gross Substitutes/Complements

Zhigang Cao, Bejing Jiaotong University

Abstract

Shapley (1955) introduced the model of an abstract market game as a generalization of the assignment game. This is a class of cooperative games with certain complements/substitutes constraints. Shapley conjectured that, as assignment games, abstract market games possess non-empty cores. Unfortunately, the structure of an abstract market game is not strong enough to guarantee the non-emptiness of the core. We show that Shapley's conjecture can be reestablished if we modify the complements/substitutes constraints to similar gross complements/substitutes constraints. (joint work with Ning Sun, Xiaoguang Yang, and Ning Yu)


Biography

Zhigang Cao is a professor of economics at School of Economics and Management, Bejing Jiaotong University. His research interest is game theory, including algorithmic, network, and cooperative game theories. Several of his recent works appear in Operations Research, Games and Economic Behavior, Journal of Mathematical Economics, and International Journal of Game Theory etc.


 

Strongly Robust Exotic Option Pricing Via No-Regret Learning

Ye Du, Southwestern University Of Finance And Economics

Abstract

In this paper, we derive robust and closed-form upper bounds for four exotic options, i.e., Asian options, shout options, forward start options as well as exchange options based on the no-regret learning strategy. Different from the classic finance literature, our bounds do not rely on any specific assumptions about the dynamics of underlying assets, except the no-arbitrage principle. In particular, compared with the work of [16], our results get rid of the specific assumption on the maximal possible absolute return in each period. This makes our results be the first strongly robust pricing models for exotic options. Numerical simulations demonstrate that our results are not only significantly tighter than those in existing literature, but also more suitable for the estimation of the Greek letters for exotic options.


Biography

Ye Du is a professor in the Western Business School, Southwestern University of Finance and Economics. His research interests are theoretical computer science and game theory. He has published in top venues like FOCS,Games and Economic Behaviors, Journal of Mathematical Economics,Journal of Futures Markets.


 

Assortment games under Markov Chain Choice Model

Changjun Wang, Academy of Mathematics and System Science, Chinese Academy of Sciences

Abstract

In this work, we study the assortment planning games in which multiple retailers interact in the market. Each retailer owns some of the products and their goal is to select a subset of products, i.e., an assortment to offer to the customers so as to maximize their expected revenue. The purchase behavior of the customer is assumed to follow the Markov chain choice model. We consider two types of assortment games under the Markov chain choice model --- a competitive game and a cooperative game. In the assortment competition game, we show that there always exists a pure-strategy Nash equilibrium and such equilibrium can be found in polynomial time. We also identify an easy-to-check condition for the uniqueness of the Nash equilibrium. In the assortment cooperative games, we consider two settings of cooperative games distinguished by the way we assume other players' behaviors outside a coalition and show whether the coalitions are stable.


Biography

Changjun Wang is currently an associate researcher at the Academy of Mathematics and System Science, Chinese Academy of Sciences. Previously, he was an assitant professor at Beijing University of Technology.  His research interests include algorithmic game theory and combinatorial optimization with a current focus on dynamic routing games and assortment planning. He received his BSc degree in Mathematics from Shandong University in 2010, and PhD degree in Operations Research from Academy of Mathematics and Systems Science, Chinese Academy of Sciences in 2015. 

关于IJTCS

回顾 → 对话邓小铁:在首届IJTCS中,我看到了中国计算理论的成长

日程 → 分论坛:区块链技术

IJTCS注册信息

大会现已正式面向公众开放注册


观看线上报告:免费

通过在线观看直播的方式参与大会,可通过直播平台提问。


线上会议注册

(普通)$100 /¥700

(学生)$50 /¥350*

获得所有Zoom会议参会链接,作为参会人在线参加全部会议,直接在线提问讨论并参与特设互动环节


线下会议注册

(普通)$200 / ¥1400

(学生)$100 / ¥700*

作为参会人在线下(北京大学)参加会议,与知名学者们面对面交流;同时享受线上注册的所有权益。

*因防疫要求,仅开放10个校外线下参会名额。


点击 ↓↓↓二维码↓↓↓ 跳转注册页面

*学生注册:网站上注册后需将学生证含有个人信息和学校信息的页拍照发送至IJTCS@pku.edu.cn,邮件主题格式为"Student Registration+姓名+线上/线下"。


大会主席 

 

John Hopcroft

图灵奖获得者

中国科学院外籍院士

北京大学访问讲席教授


大会联合主席 

 

邓小铁

北京大学讲席教授

欧洲科学院外籍院士

ACM/IEEE Fellow


顾问委员会主席 

   

高  文

中国工程院院士

北京大学教授

梅  宏

中国科学院院士

CCF理事长

张平文

中国科学院院士

CSIAM理事长

北京大学教授


程序委员会主席 

   

孙晓明

中科院计算所

研究员

邓小铁

北京大学

讲席教授

李闽溟

香港城市大学

副教授


  

陆品燕

上海财经大学

教授

李  建

清华大学

副教授


组织单位 


合作媒体


大会赞助


联系人

大会赞助、合作等事宜

请联系

IJTCS@pku.edu.cn

010-62761029

大会网站

https://econcs.pku.edu.cn/ijtcs2021/index.htm

↑↑扫码直达大会官网↑↑



—   版权声明  —

本微信公众号所有内容,由北京大学前沿计算研究中心微信自身创作、收集的文字、图片和音视频资料,版权属北京大学前沿计算研究中心微信所有;从公开渠道收集、整理及授权转载的文字、图片和音视频资料,版权属原作者。本公众号内容原作者如不愿意在本号刊登内容,请及时通知本号,予以删除。


“阅读原文”转大会注册页面

您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存