IJTCS | 分论坛日程：机器学习理论

北京大学前沿计算研究中心 2022-09-21

编者按

首届国际理论计算机联合大会（International Joint Conference on Theoretical Computer Science，IJTCS）将于2020年8月17日-21日在线上举行，由北京大学与中国工业与应用数学学会（CSIAM）、中国计算机学会（CCF）、国际计算机学会中国委员会（ACM China Council）联合主办，北京大学前沿计算研究中心承办。

本次大会的主题为“理论计算机科学领域的最新进展与焦点问题”。大会共设7个分论坛，分别对算法博弈论、区块链技术、多智能体强化学习、机器学习理论、量子计算、机器学习与形式化方法和算法与复杂性等领域进行深入探讨。同时，大会特别开设了青年博士论坛、女性学者论坛与本科生科研论坛，荟集海内外知名专家学者，聚焦理论计算机前沿问题。有关信息将持续更新，敬请关注！

本期带来“机器学习理论”分论坛精彩介绍。

“机器学习理论”介绍

机器学习，尤其是深度学习，已经在社会生活的很多领域得到了广泛应用，取得了巨大的成果。但是机器学习理论的发展的研究却相对不足。传统的学习理论对于深度神经网络惊人的表示、优化、泛化和迁移能力不能给出满意的解释和理论指导。最近几年，机器学习理论界提出了一系列创新的的理论模型和范式，从多个角度去靠近机器学习和深度学习的数学本质。但是国内目前从事学习理论的学者相对较少，相对国际最前沿还有一定差距。

本次会议邀请了机器学习理论领域著名的国内外青年学者，演讲的内容覆盖优化算法、表示学习、泛化能力、迁移能力等学习理论多个方面。希望本次会议能够让更多国内学者关注到理论机器学习领域，能吸引更多学生投入该领域的研究。

“机器学习理论”分论坛主席

李建

清华大学

“机器学习理论”分论坛议程

“机器学习理论”分论坛报告简介

李建

On Generalization and Implicit Bias of Gradient Methods in Deep Learning

Abstract

Deep learning has enjoyed huge empirical success in recent years. Although training a deep neural network is a highly non-convex optimization problem, simple (stochastic) gradient methods are able to produce good solutions that minimize the training error, and more surprisingly, can generalize well to out of sample data, even when the number of parameters is significantly larger than the amount of training data. It is known that changing the optimization algorithm, even without changing the model, changes the implicit bias, and also the generalization properties. In this talk, we present new generalization bounds and investigating the implicit bias of various gradient methods.

(1) We develop a new framework, termed Bayes-Stability, for proving algorithm-dependent generalization error bounds. Using the new framework, we obtain new data-dependent generalization bounds for stochastic gradient Langevin dynamics (SGLD) and several other noisy gradient methods (e.g., with momentum, mini-batch and acceleration, Entropy-SGD). Our result recovers (and is typically tighter than) a recent result in Mou et al. (2018) and improves upon the results in Pensia et al. (2018). Our experiments demonstrate that our data-dependent bounds can distinguish randomly labelled data from normal data, which provides an explanation to the intriguing phenomena observed in Zhang et al. (2017a).

(2) We show gradient descent converges to the max-margin direction for homogeneousneural networks, including fully-connected and convolutional neural networks with ReLU or LeakyReLU activations, generalizing previous work for logistic regression with one-layer or multi-layer linear networks. Finally, as margin is closely related to robustness, we discuss potential benefits of training longer for improving the robustness of the model.

Jason Lee

Provable Representation Learning in Deep Learning

Abstract

Deep representation learning seeks to learn a data representation that transfers to downstream tasks. In this talk, we study two forms of representation learning: supevised pre-training and self-supervised learning.

Supervised pre-training uses a large labled source dataset to learn a representation, then trains a classifier on top of the representation. We prove that supervised pre-training can pool the data from all source tasks to learn a good representation which transfers to downstream tasks with few labeled examples.

Self-supervised learning creates auxilary pretext tasks that do not require labeled data to learn representations. These pretext tasks are created solely using input features, such as predicting a missing image patch, recovering the color channels of an image, or predicting missing words. Surprisingly, predicting this known information helps in learning a representation effective for downtream tasks. We prove that under a conditional independence assumption, self-supervised learning provably learns representations.

朱占星

Understanding Deep "Alchemy"

Abstract

It has been a long-standing debate that “Is deep learning alchemy or science? ”, since the success of deep learning mostly relies on various engineering design and tricks, lack of theoretical foundation. Unfortunately, the underlying mechanism of deep learning is still mysterious, which severely limits its further development from both theoretical and application aspects.

In this talk, I will introduce some of our attempts on theoretically understanding deep learning, mainly focusing on analyzing its training methods and tricks, including stochastic gradient descent, batch normalization and knowledge distillation. (1) We analyze the implicit regularization property of stochastic gradient descent (SGD), i.e. interpreting why SGD could find well generalizing minima compared with other alternatives; (2) We comprehensively reveal the learning dynamics of batch normalization and weight decay, and show its benefits on avoiding vanishing/exploding gradients, not being trapped into sharp minima and loss drop when decaying learning rate. (3) We also show the underlying mechanism of knowledge distillation, including its transfer risk bound, data efficiency and imperfect teacher distillation. These new findings shed some light on understanding the deep learning towards opening this black-box and also inspires new algorithmic design.

鬲融

Guarantees for Tuning the Step Size Using a Learning-to-Learn Approach

Abstract

Learning-to-learn using optimization algorithms to learn a new optimizer has successfully trained efficient optimizers in practice. This approach relies on meta-gradient descent on a meta-objective based on the trajectory that the optimizer generates. However, there were few theoretical guarantees on how to avoid meta-gradient explosion/vanishing problems, or how to train an optimizer with good generalization performance. In this paper we study the learning-to-learn approach on a simple problem of tuning the step size for quadratic loss. Our results show that although there is a way to design the meta-objective so that the meta-gradient remain polynomially bounded, computing the meta-gradient directly using backpropagation leads to numerical issues that look similar to gradient explosion/vanishing problems. We also characterize when it is necessary to compute the meta-objective on a separate validation set instead of the original training set. Finally, we verify our results empirically and show that a similar phenomenon appears even for more complicated learned optimizers parametrized by neural networks. Based on joint work with Xiang Wang, Shuai Yuan and Chenwei Wu.

宋乐

Deep Learning for Algorithm Design

Abstract

Algorithms are step-by-step instructions designed by human experts to solve a problem. Effective algorithms play central roles in modern computing, and have impacted many industrial applications, such as recommendation and advertisement in internet, resource allocation in cloud computing, robot and route planning, disease understanding and drug design.

However, designing effective algorithms is a time-consuming and difficult task. It often requires lots of intuition and expertise to tailor algorithmic choices in particular applications. Furthermore, when complex application data are involved, it becomes even more challenging for human experts to reason about algorithm behavior.

Can we use deep learning and AI to help algorithm design? There have been a number of recent advancements that have allowed algorithms to designed from specific algorithmic families automatically using data, often leading to either state-of-the-art empirical performance or provable performance guarantees on observed instance distributions. In this talk, I will provide an introduction to this area, and explain a few pieces of work along this direction.

马腾宇

Domain Adaptation with Theoretical Guarantees

Abstract

In unsupervised domain adaptation, the existing theory focuses on situations where the source and target domains are close. In this talk, I will discuss a few algorithms with theoretical guarantees for larger domain shifts. First, I will show that self-training can provably avoid using spurious features that correlate with the source labels but not the target labels, for linear models. Second, I will discuss algorithms that leverage the sequential structure of the domains shifts with provable guarantees. Based on work https://arxiv.org/abs/2006.10032, https://arxiv.org/abs/2002.11361, https://arxiv.org/abs/2006.14481.

关于IJTCS

简介 → 国际理论计算机联合大会重磅登场

大摩宏观策略谈：2025中美变局展望

2024年心理咨询师报名通道开启！可考心理证书，无需辞职，名额有限，11月30日截止报名！！！

穿了跟没穿一样，胸型赞到爆！天然乳胶，性感到让男人腿软！

高三女生醉酒后被强奸致死？检方回应

高三女生醉酒后被强奸致死？检方回应

IJTCS | 分论坛日程：机器学习理论

您可能也对以下帖子感兴趣

大摩宏观策略谈：2025中美变局展望

2024年心理咨询师报名通道开启！可考心理证书，无需辞职，名额有限，11月30日截止报名！！！

穿了跟没穿一样，胸型赞到爆！天然乳胶，性感到让男人腿软！

高三女生醉酒后被强奸致死？检方回应

高三女生醉酒后被强奸致死？检方回应

生成图片，分享到微信朋友圈

IJTCS | 分论坛日程：机器学习理论

您可能也对以下帖子感兴趣