IJTCS | 分论坛日程:机器学习理论
编者按
首届国际理论计算机联合大会(International Joint Conference on Theoretical Computer Science,IJTCS)将于2020年8月17日-21日在线上举行,由北京大学与中国工业与应用数学学会(CSIAM)、中国计算机学会(CCF)、国际计算机学会中国委员会(ACM China Council)联合主办,北京大学前沿计算研究中心承办。
本次大会的主题为“理论计算机科学领域的最新进展与焦点问题”。大会共设7个分论坛,分别对算法博弈论、区块链技术、多智能体强化学习、机器学习理论、量子计算、机器学习与形式化方法和算法与复杂性等领域进行深入探讨。同时,大会特别开设了青年博士论坛、女性学者论坛与本科生科研论坛,荟集海内外知名专家学者,聚焦理论计算机前沿问题。有关信息将持续更新,敬请关注!
本期带来“机器学习理论”分论坛精彩介绍。
“机器学习理论”介绍
机器学习,尤其是深度学习,已经在社会生活的很多领域得到了广泛应用,取得了巨大的成果。但是机器学习理论的发展的研究却相对不足。传统的学习理论对于深度神经网络惊人的表示、优化、泛化和迁移能力不能给出满意的解释和理论指导。最近几年,机器学习理论界提出了一系列创新的的理论模型和范式,从多个角度去靠近机器学习和深度学习的数学本质。但是国内目前从事学习理论的学者相对较少,相对国际最前沿还有一定差距。
本次会议邀请了机器学习理论领域著名的国内外青年学者,演讲的内容覆盖优化算法、表示学习、泛化能力、迁移能力等学习理论多个方面。希望本次会议能够让更多国内学者关注到理论机器学习领域,能吸引更多学生投入该领域的研究。
“机器学习理论”分论坛主席
李 建
清华大学
“机器学习理论”分论坛议程
“机器学习理论”分论坛报告简介
李 建
On Generalization and Implicit Bias of Gradient Methods in Deep Learning
Abstract
Deep learning has enjoyed huge empirical success in recent years. Although training a deep neural network is a highly non-convex optimization problem, simple (stochastic) gradient methods are able to produce good solutions that minimize the training error, and more surprisingly, can generalize well to out of sample data, even when the number of parameters is significantly larger than the amount of training data. It is known that changing the optimization algorithm, even without changing the model, changes the implicit bias, and also the generalization properties. In this talk, we present new generalization bounds and investigating the implicit bias of various gradient methods.
(1) We develop a new framework, termed Bayes-Stability, for proving algorithm-dependent generalization error bounds. Using the new framework, we obtain new data-dependent generalization bounds for stochastic gradient Langevin dynamics (SGLD) and several other noisy gradient methods (e.g., with momentum, mini-batch and acceleration, Entropy-SGD). Our result recovers (and is typically tighter than) a recent result in Mou et al. (2018) and improves upon the results in Pensia et al. (2018). Our experiments demonstrate that our data-dependent bounds can distinguish randomly labelled data from normal data, which provides an explanation to the intriguing phenomena observed in Zhang et al. (2017a).
(2) We show gradient descent converges to the max-margin direction for homogeneousneural networks, including fully-connected and convolutional neural networks with ReLU or LeakyReLU activations, generalizing previous work for logistic regression with one-layer or multi-layer linear networks. Finally, as margin is closely related to robustness, we discuss potential benefits of training longer for improving the robustness of the model.
Jason Lee
Provable Representation Learning in Deep Learning
Abstract
Deep representation learning seeks to learn a data representation that transfers to downstream tasks. In this talk, we study two forms of representation learning: supevised pre-training and self-supervised learning.
Supervised pre-training uses a large labled source dataset to learn a representation, then trains a classifier on top of the representation. We prove that supervised pre-training can pool the data from all source tasks to learn a good representation which transfers to downstream tasks with few labeled examples.
Self-supervised learning creates auxilary pretext tasks that do not require labeled data to learn representations. These pretext tasks are created solely using input features, such as predicting a missing image patch, recovering the color channels of an image, or predicting missing words. Surprisingly, predicting this known information helps in learning a representation effective for downtream tasks. We prove that under a conditional independence assumption, self-supervised learning provably learns representations.
朱占星
Understanding Deep "Alchemy"
Abstract
It has been a long-standing debate that “Is deep learning alchemy or science? ”, since the success of deep learning mostly relies on various engineering design and tricks, lack of theoretical foundation. Unfortunately, the underlying mechanism of deep learning is still mysterious, which severely limits its further development from both theoretical and application aspects.
In this talk, I will introduce some of our attempts on theoretically understanding deep learning, mainly focusing on analyzing its training methods and tricks, including stochastic gradient descent, batch normalization and knowledge distillation. (1) We analyze the implicit regularization property of stochastic gradient descent (SGD), i.e. interpreting why SGD could find well generalizing minima compared with other alternatives; (2) We comprehensively reveal the learning dynamics of batch normalization and weight decay, and show its benefits on avoiding vanishing/exploding gradients, not being trapped into sharp minima and loss drop when decaying learning rate. (3) We also show the underlying mechanism of knowledge distillation, including its transfer risk bound, data efficiency and imperfect teacher distillation. These new findings shed some light on understanding the deep learning towards opening this black-box and also inspires new algorithmic design.
鬲 融
Guarantees for Tuning the Step Size Using a Learning-to-Learn Approach
Abstract
Learning-to-learn using optimization algorithms to learn a new optimizer has successfully trained efficient optimizers in practice. This approach relies on meta-gradient descent on a meta-objective based on the trajectory that the optimizer generates. However, there were few theoretical guarantees on how to avoid meta-gradient explosion/vanishing problems, or how to train an optimizer with good generalization performance. In this paper we study the learning-to-learn approach on a simple problem of tuning the step size for quadratic loss. Our results show that although there is a way to design the meta-objective so that the meta-gradient remain polynomially bounded, computing the meta-gradient directly using backpropagation leads to numerical issues that look similar to gradient explosion/vanishing problems. We also characterize when it is necessary to compute the meta-objective on a separate validation set instead of the original training set. Finally, we verify our results empirically and show that a similar phenomenon appears even for more complicated learned optimizers parametrized by neural networks. Based on joint work with Xiang Wang, Shuai Yuan and Chenwei Wu.
宋 乐
Deep Learning for Algorithm Design
Abstract
Algorithms are step-by-step instructions designed by human experts to solve a problem. Effective algorithms play central roles in modern computing, and have impacted many industrial applications, such as recommendation and advertisement in internet, resource allocation in cloud computing, robot and route planning, disease understanding and drug design.
However, designing effective algorithms is a time-consuming and difficult task. It often requires lots of intuition and expertise to tailor algorithmic choices in particular applications. Furthermore, when complex application data are involved, it becomes even more challenging for human experts to reason about algorithm behavior.
Can we use deep learning and AI to help algorithm design? There have been a number of recent advancements that have allowed algorithms to designed from specific algorithmic families automatically using data, often leading to either state-of-the-art empirical performance or provable performance guarantees on observed instance distributions. In this talk, I will provide an introduction to this area, and explain a few pieces of work along this direction.
马腾宇
Domain Adaptation with Theoretical Guarantees
Abstract
In unsupervised domain adaptation, the existing theory focuses on situations where the source and target domains are close. In this talk, I will discuss a few algorithms with theoretical guarantees for larger domain shifts. First, I will show that self-training can provably avoid using spurious features that correlate with the source labels but not the target labels, for linear models. Second, I will discuss algorithms that leverage the sequential structure of the domains shifts with provable guarantees. Based on work https://arxiv.org/abs/2006.10032, https://arxiv.org/abs/2002.11361, https://arxiv.org/abs/2006.14481.
关于IJTCS
简介 → 国际理论计算机联合大会重磅登场
推荐 → 大会特邀报告(一)
推荐 → 大会特邀报告(二)
日程 → 分论坛:算法博弈论
IJTCS注册信息
本次大会已经正式面向公众开放注册!每位参与者可以选择免费注册以观看线上报告,或是支付一定费用以进一步和讲者就报告内容进行交流,深度参与大会的更多环节。
观看线上报告:免费
完全注册:
(普通)$100 /¥700
(学生)$50 /¥350*
作为参会人参加全部会议,直接在线提问讨论并参与特设互动环节
注册截止:2020年8月15日23:59
点击 ↓↓↓二维码↓↓↓ 跳转注册页面:
*学生注册:网站上注册后需将学生证含有个人信息和学校信息的页拍照发送至IJTCS@pku.edu.cn,邮件主题格式为"Student Registration + 姓名"。
大会主席
John Hopcroft
中国科学院外籍院士、北京大学访问讲席教授
林惠民
中国科学院院士、中国科学院软件研究所专家
大会联合主席
邓小铁
北京大学教授
顾问委员会主席
高 文
中国工程院院士、北京大学教授
梅 宏
中国科学院院士、CCF理事长
张平文
中国科学院院士、CSIAM理事长、北京大学教授
组织单位
欢迎注册
大会网站:
https://econcs.pku.edu.cn/ijtcs2020/IJTCS2020.html
注册链接:
https://econcs.pku.edu.cn/ijtcs2020/Registration.htm
联系人
大会赞助、合作等信息,请联系:IJTCS@pku.edu.cn
— 版权声明 —
本微信公众号所有内容,由北京大学前沿计算研究中心微信自身创作、收集的文字、图片和音视频资料,版权属北京大学前沿计算研究中心微信所有;从公开渠道收集、整理及授权转载的文字、图片和音视频资料,版权属原作者。本公众号内容原作者如不愿意在本号刊登内容,请及时通知本号,予以删除。
点“阅读原文”转大会注册页面