文章速览 | 联邦学习 x ICML'2023(上)
本文是由白小鱼博主整理的ICML 2023会议中,与联邦学习相关的论文合集及摘要翻译。
Authors: Junyi Zhu; Ruicong Yao; Matthew B. Blaschko
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/zhu23m.html
Abstract: In Federated Learning (FL) and many other distributed training frameworks, collaborators can hold their private data locally and only share the network weights trained with the local data after multiple iterations. Gradient inversion is a family of privacy attacks that recovers data from its generated gradients. Seemingly, FL can provide a degree of protection against gradient inversion attacks on weight updates, since the gradient of a single step is concealed by the accumulation of gradients over multiple local iterations. In this work, we propose a principled way to extend gradient inversion attacks to weight updates in FL, thereby better exposing weaknesses in the presumed privacy protection inherent in FL. In particular, we propose a surrogate model method based on the characteristic of two-dimensional gradient flow and low-rank property of local updates. Our method largely boosts the ability of gradient inversion attacks on weight updates containing many iterations and achieves state-of-the-art (SOTA) performance. Additionally, our method runs up to 100×100×100\times faster than the SOTA baseline in the common FL scenario. Our work re-evaluates and highlights the privacy risk of sharing network weights. Our code is available at https://github.com/JunyiZhu-AI/surrogate_model_extension.
ISSN: 2640-3498 abstractTranslation: 在联邦学习(FL)和许多其他分布式训练框架中,协作者可以在本地保存自己的私有数据,并且仅在多次迭代后共享用本地数据训练的网络权重。梯度反转是一系列隐私攻击,可从生成的梯度中恢复数据。表面上,FL 可以在一定程度上防止权重更新时的梯度反转攻击,因为单个步骤的梯度被多次局部迭代的梯度累积所隐藏。在这项工作中,我们提出了一种原则性方法,将梯度反转攻击扩展到 FL 中的权重更新,从而更好地暴露 FL 固有的假定隐私保护的弱点。特别是,我们提出了一种基于二维梯度流特征和局部更新的低秩特性的代理模型方法。我们的方法极大地提高了对包含多次迭代的权重更新进行梯度反转攻击的能力,并实现了最先进的(SOTA)性能。此外,我们的方法在常见 FL 场景中的运行速度比 SOTA 基线快 100×100×100 倍。我们的工作重新评估并强调了共享网络权重的隐私风险。我们的代码可在 https://github.com/JunyiZhu-AI/surrogate_model_extension 获取。
Notes:
PUB (https://openreview.net/forum?id=Kz0IODB2kj)
PDF (https://arxiv.org/abs/2306.00127)
CODE (https://github.com/junyizhu-ai/surrogate_model_extension)
LeadFL: Client Self-Defense against Model Poisoning in Federated Learning
Authors: Chaoyi Zhu; Stefanie Roos; Lydia Y. Chen
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/zhu23j.html
Abstract: Federated Learning is highly susceptible to backdoor and targeted attacks as participants can manipulate their data and models locally without any oversight on whether they follow the correct process. There are a number of server-side defenses that mitigate the attacks by modifying or rejecting local updates submitted by clients. However, we find that bursty adversarial patterns with a high variance in the number of malicious clients can circumvent the existing defenses. We propose a client-self defense, LeadFL, that is combined with existing server-side defenses to thwart backdoor and targeted attacks. The core idea of LeadFL is a novel regularization term in local model training such that the Hessian matrix of local gradients is nullified. We provide the convergence analysis of LeadFL and its robustness guarantee in terms of certified radius. Our empirical evaluation shows that LeadFL is able to mitigate bursty adversarial patterns for both iid and non-iid data distributions. It frequently reduces the backdoor accuracy from more than 75% for state-of-the-art defenses to less than 10% while its impact on the main task accuracy is always less than for other client-side defenses.
ISSN: 2640-3498 abstractTranslation: 联邦学习很容易受到后门和有针对性的攻击,因为参与者可以在本地操纵他们的数据和模型,而无需监督他们是否遵循正确的流程。有许多服务器端防御措施可以通过修改或拒绝客户端提交的本地更新来减轻攻击。然而,我们发现恶意客户端数量差异较大的突发对抗模式可以绕过现有防御。我们提出了一种客户端自我防御 LeadFL,它与现有的服务器端防御相结合,以阻止后门和有针对性的攻击。LeadFL 的核心思想是局部模型训练中的一个新颖的正则化项,使得局部梯度的 Hessian 矩阵无效。我们提供 LeadFL 的收敛分析及其在认证半径方面的稳健性保证。我们的实证评估表明,LeadFL 能够减轻独立同分布和非独立同分布数据分布的突发对抗模式。它经常将后门准确率从最先进防御的 75% 以上降低到 10% 以下,而其对主要任务准确度的影响始终小于其他客户端防御。
Notes:
PUB (https://openreview.net/forum?id=2CiaH2Tq4G)
CODE (https://github.com/chaoyitud/LeadFL)
XTab: Cross-table Pretraining for Tabular Transformers
Authors: Bingzhao Zhu; Xingjian Shi; Nick Erickson; Mu Li; George Karypis; Mahsa Shoaran
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/zhu23k.html
Abstract: The success of self-supervised learning in computer vision and natural language processing has motivated pretraining methods on tabular data. However, most existing tabular self-supervised learning models fail to leverage information across multiple data tables and cannot generalize to new tables. In this work, we introduce XTab, a framework for cross-table pretraining of tabular transformers on datasets from various domains. We address the challenge of inconsistent column types and quantities among tables by utilizing independent featurizers and using federated learning to pretrain the shared component. Tested on 84 tabular prediction tasks from the OpenML-AutoML Benchmark (AMLB), we show that (1) XTab consistently boosts the generalizability, learning speed, and performance of multiple tabular transformers, (2) by pretraining FT-Transformer via XTab, we achieve superior performance than other state-of-the-art tabular deep learning models on various tasks such as regression, binary, and multiclass classification.
ISSN: 2640-3498 abstractTranslation: 计算机视觉和自然语言处理中自我监督学习的成功激发了表格数据预训练方法的发展。然而,大多数现有的表格自监督学习模型无法利用多个数据表中的信息,并且无法推广到新表。在这项工作中,我们介绍了 XTab,一个用于在不同领域的数据集上对表格转换器进行跨表预训练的框架。我们通过利用独立特征器并使用联邦学习来预训练共享组件,解决了表之间列类型和数量不一致的挑战。在 OpenML-AutoML Benchmark (AMLB) 的 84 个表格预测任务上进行测试,我们表明 (1) XTab 持续提高了多个表格转换器的泛化性、学习速度和性能,(2) 通过 XTab 预训练 FT-Transformer,我们在回归、二元和多类分类等各种任务上取得比其他最先进的表格深度学习模型更优越的性能。
Notes:
PUB (https://openreview.net/forum?id=uGORNDmIdr)
PDF (https://arxiv.org/abs/2305.06090)
CODE (https://github.com/bingzhaozhu/xtab)
Addressing Budget Allocation and Revenue Allocation in Data Market Environments Using an Adaptive Sampling Algorithm
Authors: Boxin Zhao; Boxiang Lyu; Raul Castro Fernandez; Mladen Kolar
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/zhao23e.html
Abstract: High-quality machine learning models are dependent on access to high-quality training data. When the data are not already available, it is tedious and costly to obtain them. Data markets help with identifying valuable training data: model consumers pay to train a model, the market uses that budget to identify data and train the model (the budget allocation problem), and finally the market compensates data providers according to their data contribution (revenue allocation problem). For example, a bank could pay the data market to access data from other financial institutions to train a fraud detection model. Compensating data contributors requires understanding data’s contribution to the model; recent efforts to solve this revenue allocation problem based on the Shapley value are inefficient to lead to practical data markets. In this paper, we introduce a new algorithm to solve budget allocation and revenue allocation problems simultaneously in linear time. The new algorithm employs an adaptive sampling process that selects data from those providers who are contributing the most to the model. Better data means that the algorithm accesses those providers more often, and more frequent accesses corresponds to higher compensation. Furthermore, the algorithm can be deployed in both centralized and federated scenarios, boosting its applicability. We provide theoretical guarantees for the algorithm that show the budget is used efficiently and the properties of revenue allocation are similar to Shapley’s. Finally, we conduct an empirical evaluation to show the performance of the algorithm in practical scenarios and when compared to other baselines. Overall, we believe that the new algorithm paves the way for the implementation of practical data markets.
ISSN: 2640-3498 abstractTranslation: 高质量的机器学习模型取决于对高质量训练数据的访问。当数据尚不可用时,获取这些数据既乏味又昂贵。数据市场有助于识别有价值的训练数据:模型消费者支付训练模型的费用,市场使用该预算来识别数据并训练模型(预算分配问题),最后市场根据数据提供者的数据贡献(收入)对数据提供者进行补偿分配问题)。例如,银行可以向数据市场付费以访问其他金融机构的数据来训练欺诈检测模型。补偿数据贡献者需要了解数据对模型的贡献;最近为解决基于 Shapley 值的收入分配问题而做出的努力并不足以带来实际的数据市场。在本文中,我们引入了一种新算法,可以在线性时间内同时解决预算分配和收入分配问题。新算法采用自适应采样过程,从对模型贡献最大的提供商中选择数据。更好的数据意味着算法更频繁地访问这些提供者,而更频繁的访问对应着更高的补偿。此外,该算法既可以部署在集中式场景中,也可以部署在联邦场景中,增强了算法的适用性。我们为该算法提供了理论保证,表明预算得到了有效利用,并且收入分配的属性与沙普利的算法类似。最后,我们进行实证评估,以显示该算法在实际场景中以及与其他基线相比的性能。总的来说,我们相信新算法为实际数据市场的实施铺平了道路。
Notes:
PUB (https://openreview.net/forum?id=iAgQfF3atY)
PDF (https://arxiv.org/abs/2306.02543)
CODE (https://github.com/boxinz17/data-market-via-adaptive-sampling)
Towards Unbiased Training in Federated Open-world Semi-supervised Learning
Authors: Jie Zhang; Xiaosong Ma; Song Guo; Wenchao Xu
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/zhang23af.html
Abstract: Federated Semi-supervised Learning (FedSSL) has emerged as a new paradigm for allowing distributed clients to collaboratively train a machine learning model over scarce labeled data and abundant unlabeled data. However, existing works for FedSSL rely on a closed-world assumption that all local training data and global testing data are from seen classes observed in the labeled dataset. It is crucial to go one step further: adapting FL models to an open-world setting, where unseen classes exist in the unlabeled data. In this paper, we propose a novel Federatedopen-world Semi-Supervised Learning (FedoSSL) framework, which can solve the key challenge in distributed and open-world settings, i.e., the biased training process for heterogeneously distributed unseen classes. Specifically, since the advent of a certain unseen class depends on a client basis, the locally unseen classes (exist in multiple clients) are likely to receive differentiated superior aggregation effects than the globally unseen classes (exist only in one client). We adopt an uncertainty-aware suppressed loss to alleviate the biased training between locally unseen and globally unseen classes. Besides, we enable a calibration module supplementary to the global aggregation to avoid potential conflicting knowledge transfer caused by inconsistent data distribution among different clients. The proposed FedoSSL can be easily adapted to state-of-the-art FL methods, which is also validated via extensive experiments on benchmarks and real-world datasets (CIFAR-10, CIFAR-100 and CINIC-10).
ISSN: 2640-3498 abstractTranslation: 联邦半监督学习 (FedSSL) 已成为一种新范例,允许分布式客户端在稀缺的标记数据和大量未标记数据上协作训练机器学习模型。然而,FedSSL 的现有工作依赖于一个封闭世界的假设,即所有本地训练数据和全局测试数据都来自标记数据集中观察到的类。更进一步至关重要:使 FL 模型适应开放世界环境,其中未标记数据中存在未见的类。在本文中,我们提出了一种新颖的联邦开放世界半监督学习(FedoSSL)框架,它可以解决分布式和开放世界环境中的关键挑战,即异构分布的看不见的类的有偏差训练过程。具体而言,由于某一未见类的出现取决于客户端基础,因此局部未见类(存在于多个客户端中)可能比全局未见类(仅存在于一个客户端中)获得差异化的优越聚合效果。我们采用不确定性感知抑制损失来减轻局部未见和全局未见类之间的偏差训练。此外,我们启用了一个校准模块来补充全局聚合,以避免由于不同客户端之间的数据分布不一致而导致潜在的知识传输冲突。所提出的 FedoSSL 可以轻松适应最先进的 FL 方法,该方法也通过基准和真实数据集(CIFAR-10、CIFAR-100 和 CINIC-10)的大量实验得到验证。
Notes:
PUB (https://openreview.net/forum?id=gHfybro5Sj)
PDF (https://arxiv.org/abs/2305.00771)
SLIDES(https://icml.cc/media/icml-2023/Slides/25109.pdf)
Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated Learning via Class-Imbalance Reduction
Authors: Jianyi Zhang; Ang Li; Minxue Tang; Jingwei Sun; Xiang Chen; Fan Zhang; Changyou Chen; Yiran Chen; Hai Li
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/zhang23y.html
Abstract: Due to the often limited communication bandwidth of edge devices, most existing federated learning (FL) methods randomly select only a subset of devices to participate in training at each communication round. Compared with engaging all the available clients, such a random-selection mechanism could lead to significant performance degradation on non-IID (independent and identically distributed) data. In this paper, we present our key observation that the essential reason resulting in such performance degradation is the class-imbalance of the grouped data from randomly selected clients. Based on this observation, we design an efficient heterogeneity-aware client sampling mechanism, namely, Federated Class-balanced Sampling (Fed-CBS), which can effectively reduce class-imbalance of the grouped dataset from the intentionally selected clients. We first propose a measure of class-imbalance which can be derived in a privacy-preserving way. Based on this measure, we design a computation-efficient client sampling strategy such that the actively selected clients will generate a more class-balanced grouped dataset with theoretical guarantees. Experimental results show that Fed-CBS outperforms the status quo approaches in terms of test accuracy and the rate of convergence while achieving comparable or even better performance than the ideal setting where all the available clients participate in the FL training.
ISSN: 2640-3498 abstractTranslation: 由于边缘设备的通信带宽通常有限,大多数现有的联邦学习(FL)方法在每轮通信中仅随机选择设备的子集参与训练。与吸引所有可用客户端相比,这种随机选择机制可能会导致非 IID(独立同分布)数据的性能显着下降。在本文中,我们提出了我们的关键观察结果,即导致这种性能下降的根本原因是来自随机选择的客户端的分组数据的类不平衡。基于这一观察,我们设计了一种有效的异构感知客户端采样机制,即联邦类平衡采样(Fed-CBS),它可以有效地减少有意选择的客户端分组数据集的类不平衡。我们首先提出了一种可以通过隐私保护方式导出的阶级不平衡衡量标准。基于此测量,我们设计了一种计算高效的客户端采样策略,以便主动选择的客户端将生成具有理论保证的更加类平衡的分组数据集。实验结果表明,Fed-CBS 在测试准确性和收敛速度方面优于现状方法,同时实现了与所有可用客户端都参与 FL 训练的理想设置相当甚至更好的性能。
Notes:
PUB (https://openreview.net/forum?id=NcbY2UOfko)
PDF (https://arxiv.org/abs/2209.15245)
FedCR: Personalized Federated Learning Based on Across-Client Common Representation with Conditional Mutual Information Regularization
Authors: Hao Zhang; Chenglin Li; Wenrui Dai; Junni Zou; Hongkai Xiong
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/zhang23w.html
Abstract: In personalized federated learning (PFL), multiple clients train customized models to fulfill their personal objectives, which, however, are prone to overfitting to local data due to the heterogeneity and scarcity of local data. To address this, we propose from the information-theoretic perspective a personalized federated learning framework based on the common representation learned across clients, named FedCR. Specifically, we introduce to the local client update a regularizer that aims at minimizing the discrepancy between local and global conditional mutual information (CMI), such that clients are encouraged to learn and exploit the common representation. Upon this, each client learns individually a customized predictor (head), while the extractor (body) remains to be aggregated by the server. Our CMI regularizer leads to a theoretically sound alignment between the local and global stochastic feature distributions in terms of their Kullback-Leibler (KL) divergence. More importantly, by modeling the global joint feature distribution as a product of multiple local feature distributions, clients can efficiently extract diverse information from the global data but without need of the raw data from other clients. We further show that noise injection via feature alignment and ensemble of local predictors in FedCR would help enhance its generalization capability. Experiments on benchmark datasets demonstrate a consistent performance gain and better generalization behavior of FedCR.
ISSN: 2640-3498 abstractTranslation: 在个性化联邦学习(PFL)中,多个客户训练定制模型来实现他们的个人目标,然而,由于本地数据的异构性和稀缺性,这些模型很容易过度拟合本地数据。为了解决这个问题,我们从信息论的角度提出了一种基于跨客户端学习的共同表示的个性化联邦学习框架,名为 FedCR。具体来说,我们向本地客户端引入了一个正则化器,旨在最小化本地和全局条件互信息(CMI)之间的差异,从而鼓励客户端学习和利用共同表示。在此基础上,每个客户端单独学习一个定制的预测器(头部),而提取器(主体)仍然由服务器聚合。我们的 CMI 正则化器在理论上使局部和全局随机特征分布在 Kullback-Leibler (KL) 散度方面实现了良好的对齐。更重要的是,通过将全局联邦特征分布建模为多个局部特征分布的乘积,客户端可以有效地从全局数据中提取不同的信息,而不需要来自其他客户端的原始数据。我们进一步表明,通过 FedCR 中的特征对齐和局部预测变量集成进行噪声注入将有助于增强其泛化能力。基准数据集上的实验证明了 FedCR 具有一致的性能增益和更好的泛化行为。
Notes:
PUB (https://openreview.net/forum?id=YDC5jTS3LR)
CODE (https://github.com/haozzh/FedCR)
Authors: Feilong Zhang; Xianming Liu; Shiyi Lin; Gang Wu; Xiong Zhou; Junjun Jiang; Xiangyang Ji
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/zhang23aa.html
Abstract: Federated learning suffers from a latency bottleneck induced by network stragglers, which hampers the training efficiency significantly. In addition, due to the heterogeneous data distribution and security requirements, simple and fast averaging aggregation is not feasible anymore. Instead, complicated aggregation operations, such as knowledge distillation, are required. The time cost for complicated aggregation becomes a new bottleneck that limits the computational efficiency of FL. In this work, we claim that the root cause of training latency actually lies in the aggregation-then-broadcasting workflow of the server. By swapping the computational order of aggregation and broadcasting, we propose a novel and efficient parallel federated learning (PFL) framework that unlocks the edge nodes during global computation and the central server during local computation. This fully asynchronous and parallel pipeline enables handling complex aggregation and network stragglers, allowing flexible device participation as well as achieving scalability in computation. We theoretically prove that synchronous and asynchronous PFL can achieve a similar convergence rate as vanilla FL. Extensive experiments empirically show that our framework brings up to 5.56×5.56×5.56\times speedup compared with traditional FL. Code is available at: https://github.com/Hypervoyager/PFL.
ISSN: 2640-3498 abstractTranslation: 联邦学习受到网络落后者引起的延迟瓶颈的影响,这显着降低了训练效率。此外,由于异构数据分布和安全要求,简单快速的平均聚合已经不再可行。相反,需要复杂的聚合操作,例如知识蒸馏。复杂聚合的时间成本成为限制FL计算效率的新瓶颈。在这项工作中,我们声称训练延迟的根本原因实际上在于服务器的聚合然后广播工作流程。通过交换聚合和广播的计算顺序,我们提出了一种新颖且高效的并行联邦学习(PFL)框架,该框架在全局计算期间解锁边缘节点,在本地计算期间解锁中央服务器。这种完全异步和并行的管道能够处理复杂的聚合和网络落后者,允许灵活的设备参与以及实现计算的可扩展性。我们从理论上证明同步和异步 PFL 可以实现与 vanilla FL 相似的收敛速度。大量实验表明,与传统 FL 相比,我们的框架带来了 5.56×5.56×5.56 倍的加速。代码位于:https://github.com/Hypervoyager/PFL。
Notes:
PUB (https://openreview.net/forum?id=AMuNQEUmGr)
CODE (https://github.com/Hypervoyager/PFL)
Authors: Jialin Yi; Milan Vojnovic
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/yi23a.html
Abstract: We study a new non-stochastic federated multiarmed bandit problem with multiple agents collaborating via a communication network. The losses of the arms are assigned by an oblivious adversary that specifies the loss of each arm not only for each time step but also for each agent, which we call doubly adversarial. In this setting, different agents may choose the same arm in the same time step but observe different feedback. The goal of each agent is to find a globally best arm in hindsight that has the lowest cumulative loss averaged over all agents, which necessities the communication among agents. We provide regret lower bounds for any federated bandit algorithm under different settings, when agents have access to full-information feedback, or the bandit feedback. For the bandit feedback setting, we propose a near-optimal federated bandit algorithm called FEDEXP3. Our algorithm gives a positive answer to an open question proposed in (Cesa-Bianchi et al., 2016): FEDEXP3 can guarantee a sub-linear regret without exchanging sequences of selected arm identities or loss sequences among agents. We also provide numerical evaluations of our algorithm to validate our theoretical results and demonstrate its effectiveness on synthetic and real-world datasets.
ISSN: 2640-3498 abstractTranslation: 我们研究了一种新的非随机联邦多臂老虎机问题,其中多个代理通过通信网络进行协作。臂的损失是由一个不经意的对手分配的,该对手不仅指定每个时间步的损失,还指定每个代理的损失,我们称之为双重对抗。在这种情况下,不同的智能体可能会在相同的时间步长中选择相同的手臂,但会观察到不同的反馈。每个智能体的目标是事后找到一个全局最佳的手臂,该手臂在所有智能体中平均累积损失最低,这需要智能体之间的通信。当代理可以访问完整信息反馈或强盗反馈时,我们为不同设置下的任何联邦强盗算法提供后悔下限。对于强盗反馈设置,我们提出了一种接近最优的联邦强盗算法,称为 FEDEXP3。我们的算法对(Cesa-Bianchi et al., 2016)中提出的一个悬而未决的问题给出了肯定的答案:FEDEXP3 可以保证亚线性后悔,而无需在代理之间交换所选手臂身份的序列或丢失序列。我们还提供算法的数值评估,以验证我们的理论结果并证明其在合成和现实数据集上的有效性。
Notes:
PUB (https://openreview.net/forum?id=FjOB0g7iRf)
PDF (https://arxiv.org/abs/2301.09223) CODE (https://github.com/jialinyi94/doubly-stochastic-federataed-bandit)
Authors: Rui Ye; Mingkai Xu; Jianyu Wang; Chenxin Xu; Siheng Chen; Yanfeng Wang
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/ye23f.html
Abstract: This work considers the category distribution heterogeneity in federated learning. This issue is due to biased labeling preferences at multiple clients and is a typical setting of data heterogeneity. To alleviate this issue, most previous works consider either regularizing local models or fine-tuning the global model, while they ignore the adjustment of aggregation weights and simply assign weights based on the dataset size. However, based on our empirical observations and theoretical analysis, we find that the dataset size is not optimal and the discrepancy between local and global category distributions could be a beneficial and complementary indicator for determining aggregation weights. We thus propose a novel aggregation method, Federated Learning with Discrepancy-Aware Collaboration (FedDisco), whose aggregation weights not only involve both the dataset size and the discrepancy value, but also contribute to a tighter theoretical upper bound of the optimization error. FedDisco can promote utility and modularity in a communication- and computation-efficient way. Extensive experiments show that our FedDisco outperforms several state-of-the-art methods and can be easily incorporated with many existing methods to further enhance the performance. Our code will be available at https://github.com/MediaBrain-SJTU/FedDisco.
ISSN: 2640-3498 abstractTranslation: 这项工作考虑了联邦学习中的类别分布异质性。此问题是由于多个客户的标签偏好存在偏差造成的,并且是数据异质性的典型设置。为了缓解这个问题,之前的大多数工作要么考虑对局部模型进行正则化,要么对全局模型进行微调,而忽略了聚合权重的调整,只是根据数据集大小分配权重。然而,根据我们的经验观察和理论分析,我们发现数据集大小并不是最优的,局部和全局类别分布之间的差异可能是确定聚合权重的有益和补充指标。因此,我们提出了一种新颖的聚合方法,即具有差异感知协作的联邦学习(FedDisco),其聚合权重不仅涉及数据集大小和差异值,而且有助于更严格的优化误差理论上限。FedDisco 可以通过高效通信和计算的方式提升实用性和模块化性。大量实验表明,我们的 FedDisco 优于多种最先进的方法,并且可以轻松与许多现有方法结合以进一步提高性能。我们的代码将在 https://github.com/MediaBrain-SJTU/FedDisco 上提供。
Notes:
PUB (https://openreview.net/forum?id=cHJ1VuZorx)
PDF (https://arxiv.org/abs/2305.19229)
CODE (https://github.com/MediaBrain-SJTU/FedDisco)
Authors: Rui Ye; Zhenyang Ni; Fangzhao Wu; Siheng Chen; Yanfeng Wang
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/ye23b.html
Abstract: Personalized federated learning (FL) aims to collaboratively train a personalized model for each client. Previous methods do not adaptively determine who to collaborate at a fine-grained level, making them difficult to handle diverse data heterogeneity levels and those cases where malicious clients exist. To address this issue, our core idea is to learn a collaboration graph, which models the benefits from each pairwise collaboration and allocates appropriate collaboration strengths. Based on this, we propose a novel personalized FL algorithm, pFedGraph, which consists of two key modules: (1) inferring the collaboration graph based on pairwise model similarity and dataset size at server to promote fine-grained collaboration and (2) optimizing local model with the assistance of aggregated model at client to promote personalization. The advantage of pFedGraph is flexibly adaptive to diverse data heterogeneity levels and model poisoning attacks, as the proposed collaboration graph always pushes each client to collaborate more with similar and beneficial clients. Extensive experiments show that pFedGraph consistently outperforms the other 141414 baseline methods across various heterogeneity levels and multiple cases where malicious clients exist. Code will be available at https://github.com/MediaBrain-SJTU/pFedGraph.
ISSN: 2640-3498 abstractTranslation: 个性化联邦学习(FL)旨在为每个客户协作训练个性化模型。以前的方法不能自适应地确定谁在细粒度的级别上进行协作,这使得它们难以处理不同的数据异构级别以及存在恶意客户端的情况。为了解决这个问题,我们的核心思想是学习协作图,该图对每次配对协作的好处进行建模并分配适当的协作优势。基于此,我们提出了一种新颖的个性化 FL 算法 pFedGraph,它由两个关键模块组成:(1)基于成对模型相似性和服务器数据集大小推断协作图,以促进细粒度协作;(2)优化本地协作模型在客户端聚合模型的帮助下促进个性化。pFedGraph 的优点是可以灵活地适应不同的数据异构级别和模型中毒攻击,因为所提出的协作图总是促使每个客户端与相似且有益的客户端进行更多协作。大量实验表明,在各种异质性级别和存在恶意客户端的多种情况下,pFedGraph 始终优于其他 141414 个基线方法。代码可在 https://github.com/MediaBrain-SJTU/pFedGraph 获取。
Notes:
PUB (https://openreview.net/forum?id=33fj5Ph3ot)
CODE (https://github.com/MediaBrain-SJTU/pFedGraph)
Authors: Peiyao Xiao; Kaiyi Ji
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/xiao23b.html
Abstract: Federated bilevel optimization has attracted increasing attention due to emerging machine learning and communication applications. The biggest challenge lies in computing the gradient of the upper-level objective function (i.e., hypergradient) in the federated setting due to the nonlinear and distributed construction of a series of global Hessian matrices. In this paper, we propose a novel communication-efficient federated hypergradient estimator via aggregated iterative differentiation (AggITD). AggITD is simple to implement and significantly reduces the communication cost by conducting the federated hypergradient estimation and the lower-level optimization simultaneously. We show that the proposed AggITD-based algorithm achieves the same sample complexity as existing approximate implicit differentiation (AID)-based approaches with much fewer communication rounds in the presence of data heterogeneity. Our results also shed light on the great advantage of ITD over AID in the federated/distributed hypergradient estimation. This differs from the comparison in the non-distributed bilevel optimization, where ITD is less efficient than AID. Our extensive experiments demonstrate the great effectiveness and communication efficiency of the proposed method.
ISSN: 2640-3498 abstractTranslation: 由于新兴的机器学习和通信应用,联邦双层优化引起了越来越多的关注。由于一系列全局 Hessian 矩阵的非线性和分布式构造,最大的挑战在于计算联邦设置中上层目标函数(即超梯度)的梯度。在本文中,我们通过聚合迭代微分(AggITD)提出了一种新颖的通信高效联邦超梯度估计器。AggITD 实现简单,并且通过同时进行联邦超梯度估计和较低级别的优化,显着降低了通信成本。我们表明,所提出的基于 AggITD 的算法实现了与现有的基于近似隐式微分 (AID) 的方法相同的样本复杂性,并且在存在数据异构性的情况下,通信轮次要少得多。我们的结果还揭示了在联邦/分布式超梯度估计中 ITD 相对于 AID 的巨大优势。这与非分布式双层优化中的比较不同,其中 ITD 的效率低于 AID。我们广泛的实验证明了该方法的巨大有效性和通信效率。
Notes:
PUB (https://openreview.net/forum?id=IYyhNudD9V)
PDF (https://arxiv.org/abs/2302.04969)
Personalized Federated Learning under Mixture of Distributions
Authors: Yue Wu; Shuaicheng Zhang; Wenchao Yu; Yanchi Liu; Quanquan Gu; Dawei Zhou; Haifeng Chen; Wei Cheng
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/wu23z.html
Abstract: The recent trend towards Personalized Federated Learning (PFL) has garnered significant attention as it allows for the training of models that are tailored to each client while maintaining data privacy. However, current PFL techniques primarily focus on modeling the conditional distribution heterogeneity (i.e. concept shift), which can result in suboptimal performance when the distribution of input data across clients diverges (i.e. covariate shift). Additionally, these techniques often lack the ability to adapt to unseen data, further limiting their effectiveness in real-world scenarios. To address these limitations, we propose a novel approach, FedGMM, which utilizes Gaussian mixture models (GMM) to effectively fit the input data distributions across diverse clients. The model parameters are estimated by maximum likelihood estimation utilizing a federated Expectation-Maximization algorithm, which is solved in closed form and does not assume gradient similarity. Furthermore, FedGMM possesses an additional advantage of adapting to new clients with minimal overhead, and it also enables uncertainty quantification. Empirical evaluations on synthetic and benchmark datasets demonstrate the superior performance of our method in both PFL classification and novel sample detection.
ISSN: 2640-3498 abstractTranslation: 最近的个性化联邦学习(PFL)趋势引起了人们的广泛关注,因为它允许训练针对每个客户量身定制的模型,同时保持数据隐私。然而,当前的 PFL 技术主要侧重于对条件分布异质性(即概念转变)进行建模,当客户端之间的输入数据分布发散(即协变量转变)时,这可能会导致性能不佳。此外,这些技术通常缺乏适应看不见的数据的能力,进一步限制了它们在现实场景中的有效性。为了解决这些限制,我们提出了一种新方法 FedGMM,它利用高斯混合模型 (GMM) 来有效地拟合不同客户之间的输入数据分布。模型参数是利用联邦期望最大化算法通过最大似然估计来估计的,该算法以封闭形式求解并且不假设梯度相似性。此外,FedGMM 还具有以最小的开销适应新客户的额外优势,并且还可以实现不确定性量化。对合成数据集和基准数据集的实证评估证明了我们的方法在 PFL 分类和新样本检测方面的卓越性能。
Notes:
PUB (https://openreview.net/forum?id=nmVOTsQGR9)
PDF (https://arxiv.org/abs/2305.01068)
CODE (https://github.com/zshuai8/FedGMM_ICML2023)
Anchor Sampling for Federated Learning with Partial Client Participation
Authors: Feijie Wu; Song Guo; Zhihao Qu; Shiqi He; Ziming Liu; Jing Gao
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/wu23e.html
Abstract: Compared with full client participation, partial client participation is a more practical scenario in federated learning, but it may amplify some challenges in federated learning, such as data heterogeneity. The lack of inactive clients’ updates in partial client participation makes it more likely for the model aggregation to deviate from the aggregation based on full client participation. Training with large batches on individual clients is proposed to address data heterogeneity in general, but their effectiveness under partial client participation is not clear. Motivated by these challenges, we propose to develop a novel federated learning framework, referred to as FedAMD, for partial client participation. The core idea is anchor sampling, which separates partial participants into anchor and miner groups. Each client in the anchor group aims at the local bullseye with the gradient computation using a large batch. Guided by the bullseyes, clients in the miner group steer multiple near-optimal local updates using small batches and update the global model. By integrating the results of the two groups, FedAMD is able to accelerate the training process and improve the model performance. Measured by ϵϵ\epsilon-approximation and compared to the state-of-the-art methods, FedAMD achieves the convergence by up to O(1/ϵ)O(1/ϵ)O(1/\epsilon) fewer communication rounds under non-convex objectives. Empirical studies on real-world datasets validate the effectiveness of FedAMD and demonstrate the superiority of the proposed algorithm: Not only does it considerably save computation and communication costs, but also the test accuracy significantly improves.
ISSN: 2640-3498 abstractTranslation: 与完全客户端参与相比,部分客户端参与是联邦学习中更实用的场景,但它可能会放大联邦学习中的一些挑战,例如数据异构性。部分客户参与中缺乏不活跃客户的更新使得模型聚合更有可能偏离基于完全客户参与的聚合。建议对单个客户进行大批量培训以解决一般数据异构性,但其在部分客户参与下的有效性尚不清楚。受这些挑战的推动,我们建议开发一种新颖的联邦学习框架,称为 FedAMD,以供部分客户参与。核心思想是锚定抽样,将部分参与者分为锚定组和矿工组。锚定组中的每个客户端都针对本地靶心,使用大批量进行梯度计算。在牛眼的引导下,矿工组中的客户使用小批量引导多个接近最优的本地更新,并更新全局模型。通过整合两组的结果,FedAMD 能够加速训练过程并提高模型性能。通过 εε\epsilon 近似进行测量,并与最先进的方法相比,FedAMD 在以下情况下实现了最多 O(1/ε)O(1/ε)O(1/\epsilon) 更少的通信轮数的收敛非凸目标。对真实世界数据集的实证研究验证了FedAMD的有效性,并证明了所提出算法的优越性:不仅大大节省了计算和通信成本,而且测试精度也显着提高。
Notes:
PUB (https://openreview.net/forum?id=Ht9r3P6Lts)
PDF (https://arxiv.org/abs/2206.05891)
CODE (https://github.com/harliwu/fedamd)
The Blessing of Heterogeneity in Federated Q-Learning: Linear Speedup and Beyond
Authors: Jiin Woo; Gauri Joshi; Yuejie Chi
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/woo23a.html
Abstract: In this paper, we consider federated Q-learning, which aims to learn an optimal Q-function by periodically aggregating local Q-estimates trained on local data alone. Focusing on infinite-horizon tabular Markov decision processes, we provide sample complexity guarantees for both the synchronous and asynchronous variants of federated Q-learning. In both cases, our bounds exhibit a linear speedup with respect to the number of agents and sharper dependencies on other salient problem parameters. Moreover, existing approaches to federated Q-learning adopt an equally-weighted averaging of local Q-estimates, which can be highly sub-optimal in the asynchronous setting since the local trajectories can be highly heterogeneous due to different local behavior policies. Existing sample complexity scales inverse proportionally to the minimum entry of the stationary state-action occupancy distributions over all agents, requiring that every agent covers the entire state-action space. Instead, we propose a novel importance averaging algorithm, giving larger weights to more frequently visited state-action pairs. The improved sample complexity scales inverse proportionally to the minimum entry of the average stationary state-action occupancy distribution of all agents, thus only requiring the agents collectively cover the entire state-action space, unveiling the blessing of heterogeneity.
ISSN: 2640-3498 abstractTranslation: 在本文中,我们考虑联邦 Q 学习,其目的是通过定期聚合仅在本地数据上训练的本地 Q 估计来学习最优 Q 函数。我们专注于无限范围的表格马尔可夫决策过程,为联邦 Q 学习的同步和异步变体提供样本复杂性保证。在这两种情况下,我们的边界都表现出相对于代理数量的线性加速以及对其他显着问题参数的更清晰的依赖性。此外,现有的联邦 Q 学习方法采用局部 Q 估计的等权平均,这在异步设置中可能非常次优,因为由于不同的局部行为策略,局部轨迹可能高度异构。现有样本复杂性与所有智能体上静态状态动作占用分布的最小条目成反比,要求每个智能体覆盖整个状态动作空间。相反,我们提出了一种新颖的重要性平均算法,为更频繁访问的状态-动作对赋予更大的权重。改进的样本复杂度与所有智能体的平均静态状态动作占用分布的最小条目成反比,因此只需要智能体共同覆盖整个状态动作空间,揭示了异质性的好处。
Notes:
PUB (https://openreview.net/forum?id=WfI3I8OjHS)
PDF (https://arxiv.org/abs/2305.10697)
SLIDES(https://icml.cc/media/icml-2023/Slides/24679_ljO6pDE.pdf)
FedHPO-Bench: A Benchmark Suite for Federated Hyperparameter Optimization
Authors: Zhen Wang; Weirui Kuang; Ce Zhang; Bolin Ding; Yaliang Li
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/wang23n.html
Abstract: Research in the field of hyperparameter optimization (HPO) has been greatly accelerated by existing HPO benchmarks. Nonetheless, existing efforts in benchmarking all focus on HPO for traditional learning paradigms while ignoring federated learning (FL), a promising paradigm for collaboratively learning models from dispersed data. In this paper, we first identify some uniqueness of federated hyperparameter optimization (FedHPO) from various aspects, showing that existing HPO benchmarks no longer satisfy the need to study FedHPO methods. To facilitate the research of FedHPO, we propose and implement a benchmark suite FedHPO-Bench that incorporates comprehensive FedHPO problems, enables flexible customization of the function evaluations, and eases continuing extensions. We conduct extensive experiments based on FedHPO-Bench to provide the community with more insights into FedHPO. We open-sourced FedHPO-Bench at https://github.com/alibaba/FederatedScope/tree/master/benchmark/FedHPOBench.
ISSN: 2640-3498 abstractTranslation: 现有的 HPO 基准极大地加速了超参数优化 (HPO) 领域的研究。尽管如此,现有的基准测试工作都集中在传统学习范式的 HPO 上,而忽略了联邦学习 (FL),这是一种有前途的基于分散数据的协作学习模型的范式。在本文中,我们首先从各个方面识别联邦超参数优化(FedHPO)的一些独特性,表明现有的HPO基准不再满足研究FedHPO方法的需要。为了促进FedHPO的研究,我们提出并实现了一个基准套件FedHPO-Bench,它包含了全面的FedHPO问题,能够灵活定制功能评估,并简化持续扩展。我们基于 FedHPO-Bench 进行了广泛的实验,为社区提供更多关于 FedHPO 的见解。我们在 https://github.com/alibaba/FederatedScope/tree/master/benchmark/FedHPOBench 开源了 FedHPO-Bench。
Notes:
PUB (https://openreview.net/forum?id=891ytYlYgB)
PDF (https://arxiv.org/abs/2206.03966)
CODE (https://github.com/fedhpo/icml2023)
CODE (https://github.com/alibaba/FederatedScope/tree/master/benchmark/FedHPOBench)
TabLeak: Tabular Data Leakage in Federated Learning
Authors: Mark Vero; Mislav Balunovic; Dimitar Iliev Dimitrov; Martin Vechev
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/vero23a.html
Abstract: While federated learning (FL) promises to preserve privacy, recent works in the image and text domains have shown that training updates leak private client data. However, most high-stakes applications of FL (e.g., in healthcare and finance) use tabular data, where the risk of data leakage has not yet been explored. A successful attack for tabular data must address two key challenges unique to the domain: (i) obtaining a solution to a high-variance mixed discrete-continuous optimization problem, and (ii) enabling human assessment of the reconstruction as unlike for image and text data, direct human inspection is not possible. In this work we address these challenges and propose TabLeak, the first comprehensive reconstruction attack on tabular data. TabLeak is based on two key contributions: (i) a method which leverages a softmax relaxation and pooled ensembling to solve the optimization problem, and (ii) an entropy-based uncertainty quantification scheme to enable human assessment. We evaluate TabLeak on four tabular datasets for both FedSGD and FedAvg training protocols, and show that it successfully breaks several settings previously deemed safe. For instance, we extract large subsets of private data at >>>90% accuracy even at the large batch size of 128. Our findings demonstrate that current high-stakes tabular FL is excessively vulnerable to leakage attacks.
ISSN: 2640-3498 abstractTranslation: 虽然联邦学习(FL)承诺保护隐私,但图像和文本领域的最新研究表明,训练更新会泄露私人客户数据。然而,大多数 FL 的高风险应用(例如医疗保健和金融领域)都使用表格数据,而数据泄露的风险尚未得到探索。对表格数据的成功攻击必须解决该领域特有的两个关键挑战:(i) 获得高方差混合离散连续优化问题的解决方案,以及 (ii) 实现与图像和文本不同的重建的人工评估数据,直接人工检查是不可能的。在这项工作中,我们解决了这些挑战并提出了 TabLeak,这是第一个针对表格数据的全面重建攻击。TabLeak 基于两个关键贡献:(i) 一种利用 softmax 松弛和池化集成来解决优化问题的方法,以及 (ii) 一种基于熵的不确定性量化方案,以实现人类评估。我们在 FedSGD 和 FedAvg 训练协议的四个表格数据集上评估 TabLeak,并表明它成功地打破了以前认为安全的几个设置。例如,即使在 128 的大批量大小下,我们也能以 >>>90% 的准确率提取大量私有数据子集。我们的研究结果表明,当前高风险的表格 FL 非常容易受到泄漏攻击。
Notes:
PUB (https://openreview.net/forum?id=mRiDy4qGwB)
PDF (https://arxiv.org/abs/2210.01785)
CODE (https://github.com/eth-sri/tableak)
Private Federated Learning with Autotuned Compression
Authors: Enayat Ullah; Christopher A. Choquette-Choo; Peter Kairouz; Sewoong Oh
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/ullah23b.html
Abstract: We propose new techniques for reducing communication in private federated learning without the need for setting or tuning compression rates. Our on-the-fly methods automatically adjust the compression rate based on the error induced during training, while maintaining provable privacy guarantees through the use of secure aggregation and differential privacy. Our techniques are provably instance-optimal for mean estimation, meaning that they can adapt to the “hardness of the problem” with minimal interactivity. We demonstrate the effectiveness of our approach on real-world datasets by achieving favorable compression rates without the need for tuning.
ISSN: 2640-3498 abstractTranslation: 我们提出了减少私有联邦学习中通信的新技术,而无需设置或调整压缩率。我们的动态方法会根据训练过程中产生的误差自动调整压缩率,同时通过使用安全聚合和差分隐私来保持可证明的隐私保证。事实证明,我们的技术对于均值估计来说是实例最优的,这意味着它们可以以最小的交互性来适应“问题的难度”。我们通过无需调整即可实现良好的压缩率来证明我们的方法在现实数据集上的有效性。
Notes:
PUB (https://openreview.net/forum?id=y8qAZhWbNs)
PDF (https://arxiv.org/abs/2307.10999)
Dynamic Regularized Sharpness Aware Minimization in Federated Learning: Approaching Global Consistency and Smooth Landscape
Authors: Yan Sun; Li Shen; Shixiang Chen; Liang Ding; Dacheng Tao
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/sun23h.html
Abstract: In federated learning (FL), a cluster of local clients are chaired under the coordination of the global server and cooperatively train one model with privacy protection. Due to the multiple local updates and the isolated non-iid dataset, clients are prone to overfit into their own optima, which extremely deviates from the global objective and significantly undermines the performance. Most previous works only focus on enhancing the consistency between the local and global objectives to alleviate this prejudicial client drifts from the perspective of the optimization view, whose performance would be prominently deteriorated on the high heterogeneity. In this work, we propose a novel and general algorithm FedSMOO by jointly considering the optimization and generalization targets to efficiently improve the performance in FL. Concretely, FedSMOO adopts a dynamic regularizer to guarantee the local optima towards the global objective, which is meanwhile revised by the global Sharpness Aware Minimization (SAM) optimizer to search for the consistent flat minima. Our theoretical analysis indicates that FedSMOO achieves fast O(1/T)O(1/T)\mathcal{O}(1/T) convergence rate with low generalization bound. Extensive numerical studies are conducted on the real-world dataset to verify its peerless efficiency and excellent generality.
ISSN: 2640-3498 abstractTranslation: 在联邦学习(FL)中,一组本地客户端在全局服务器的协调下主持,并合作训练一个具有隐私保护的模型。由于多次本地更新和孤立的非独立同分布数据集,客户端很容易过度拟合自己的最优值,这极大地偏离了全局目标并显着降低了性能。以前的大多数工作仅侧重于增强局部目标和全局目标之间的一致性,以从优化角度缓解这种有偏见的客户端漂移,其性能在高异质性下会显着恶化。在这项工作中,我们提出了一种新颖且通用的算法 FedSMOO,通过联邦考虑优化和泛化目标来有效提高 FL 的性能。具体来说,FedSMOO采用动态正则化器来保证全局目标的局部最优,同时通过全局锐度感知最小化(SAM)优化器进行修正以搜索一致的平坦最小值。我们的理论分析表明,FedSMOO 实现了 O(1/T)O(1/T)\mathcal{O}(1/T) 的快速收敛速度,且泛化界限较低。对现实世界的数据集进行了广泛的数值研究,以验证其无与伦比的效率和出色的通用性。
Notes:
PUB (https://openreview.net/forum?id=vD1R00hROK)
PDF (https://arxiv.org/abs/2305.11584)
slides(https://icml.cc/media/icml-2023/Slides/24651.pdf)
Momentum Ensures Convergence of SIGNSGD under Weaker Assumptions
Authors: Tao Sun; Qingsong Wang; Dongsheng Li; Bao Wang
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/sun23l.html
Abstract: Sign Stochastic Gradient Descent (signSGD) is a communication-efficient stochastic algorithm that only uses the sign information of the stochastic gradient to update the model’s weights. However, the existing convergence theory of signSGD either requires increasing batch sizes during training or assumes the gradient noise is symmetric and unimodal. Error feedback has been used to guarantee the convergence of signSGD under weaker assumptions at the cost of communication overhead. This paper revisits the convergence of signSGD and proves that momentum can remedy signSGD under weaker assumptions than previous techniques; in particular, our convergence theory does not require the assumption of bounded stochastic gradient or increased batch size. Our results resonate with echoes of previous empirical results where, unlike signSGD, signSGD with momentum maintains good performance even with small batch sizes. Another new result is that signSGD with momentum can achieve an improved convergence rate when the objective function is second-order smooth. We further extend our theory to signSGD with major vote and federated learning.
ISSN: 2640-3498 abstractTranslation: 符号随机梯度下降(signSGD)是一种通信高效的随机算法,仅使用随机梯度的符号信息来更新模型的权重。然而,signSGD 现有的收敛理论要么需要在训练期间增加批量大小,要么假设梯度噪声是对称的和单峰的。错误反馈已被用来保证在较弱的假设下signSGD的收敛,但代价是通信开销。本文重新审视了signSGD的收敛性,并证明动量可以在比先前技术更弱的假设下弥补signSGD;特别是,我们的收敛理论不需要有界随机梯度或增加批量大小的假设。我们的结果与之前的实证结果相呼应,与signSGD不同,具有动量的signSGD即使在小批量的情况下也能保持良好的性能。另一个新结果是,当目标函数是二阶光滑时,具有动量的signSGD可以实现更高的收敛速度。我们进一步将我们的理论扩展到通过主要投票和联邦学习来签署SGD。
Notes:
PUB (https://openreview.net/forum?id=a0kGwNUwil)
Sketching for First Order Method: Efficient Algorithm for Low-Bandwidth Channel and Vulnerability
Authors: Zhao Song; Yitan Wang; Zheng Yu; Lichen Zhang
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/song23h.html
Abstract: Sketching is one of the most fundamental tools in large-scale machine learning. It enables runtime and memory saving via randomly compressing the original large problem into lower dimensions. In this paper, we propose a novel sketching scheme for the first order method in large-scale distributed learning setting, such that the communication costs between distributed agents are saved while the convergence of the algorithms is still guaranteed. Given gradient information in a high dimension ddd, the agent passes the compressed information processed by a sketching matrix R∈Rs×dR∈Rs×dR\in \mathbb{R}^{s\times d} with s≪ds≪ds\ll d, and the receiver de-compressed via the de-sketching matrix R⊤R⊤R^\top to “recover” the information in original dimension. Using such a framework, we develop algorithms for federated learning with lower communication costs. However, such random sketching does not protect the privacy of local data directly. We show that the gradient leakage problem still exists after applying the sketching technique by presenting a specific gradient attack method. As a remedy, we prove rigorously that the algorithm will be differentially private by adding additional random noises in gradient information, which results in a both communication-efficient and differentially private first order approach for federated learning tasks. Our sketching scheme can be further generalized to other learning settings and might be of independent interest itself.
ISSN: 2640-3498 abstractTranslation: 草图绘制是大规模机器学习中最基本的工具之一。它通过将原始大问题随机压缩为较低维度来节省运行时间和内存。在本文中,我们为大规模分布式学习环境中的一阶方法提出了一种新颖的草图方案,从而节省了分布式代理之间的通信成本,同时仍然保证了算法的收敛性。给定高维 ddd 中的梯度信息,代理传递由草图矩阵 R∈Rs×dR∈Rs×dR\in \mathbb{R}^{s\times d} 处理的压缩信息,其中 s≪ds≪ds\ ll d,接收器通过去草图矩阵 R⊤R⊤R^\top 进行解压缩,以“恢复”原始维度的信息。使用这样的框架,我们开发了具有较低通信成本的联邦学习算法。然而,这种随机绘制并不能直接保护本地数据的隐私。我们通过提出一种特定的梯度攻击方法来证明应用草图技术后梯度泄漏问题仍然存在。作为补救措施,我们通过在梯度信息中添加额外的随机噪声来严格证明该算法将具有差分隐私性,从而为联邦学习任务提供通信高效且差分隐私的一阶方法。我们的素描方案可以进一步推广到其他学习环境,并且本身可能具有独立的兴趣。
Notes:
PUB (https://openreview.net/forum?id=uIzkbJgyqc)
PDF (https://arxiv.org/abs/2210.08371)
FedAvg Converges to Zero Training Loss Linearly for Overparameterized Multi-Layer Neural Networks
Authors: Bingqing Song; Prashant Khanduri; Xinwei Zhang; Jinfeng Yi; Mingyi Hong
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/song23e.html
Abstract: Federated Learning (FL) is a distributed learning paradigm that allows multiple clients to learn a joint model by utilizing privately held data at each client. Significant research efforts have been devoted to develop advanced algorithms that deal with the situation where the data at individual clients have heterogeneous distributions. In this work, we show that data heterogeneity can be dealt from a different perspective. That is, by utilizing a certain overparameterized multi-layer neural network at each client, even the vanilla FedAvg (a.k.a. the Local SGD) algorithm can accurately optimize the training problem: When each client has a neural network with one wide layer of size NNN (where NNN is the number of total training samples), followed by layers of smaller widths, FedAvg converges linearly to a solution that achieves (almost) zero training loss, without requiring any assumptions on the clients’ data distributions. To our knowledge, this is the first work that demonstrates such resilience to data heterogeneity for FedAvg when trained on multi-layer neural networks. Our experiments also confirm that, neural networks of large size can achieve better and more stable performance for FL problems.
ISSN: 2640-3498 abstractTranslation: 联邦学习 (FL) 是一种分布式学习范例,允许多个客户端通过利用每个客户端的私有数据来学习联邦模型。人们投入了大量的研究工作来开发先进的算法,以处理各个客户端的数据具有异构分布的情况。在这项工作中,我们表明可以从不同的角度处理数据异构性。也就是说,通过在每个客户端使用特定的超参数化多层神经网络,即使是普通的 FedAvg(也称为 Local SGD)算法也可以准确地优化训练问题:当每个客户端都有一个具有大小为 NNN 的宽层的神经网络时(其中 NNN 是训练样本总数),然后是较小宽度的层,FedAvg 线性收敛到实现(几乎)零训练损失的解决方案,而不需要对客户的数据分布进行任何假设。据我们所知,这是第一个在多层神经网络上训练时证明 FedAvg 对数据异构性具有如此弹性的工作。我们的实验也证实,大尺寸的神经网络可以在 FL 问题上获得更好、更稳定的性能。
Notes:
PUB (https://openreview.net/forum?id=eqTWOzheZT)
Improving the Model Consistency of Decentralized Federated Learning
Authors: Yifan Shi; Li Shen; Kang Wei; Yan Sun; Bo Yuan; Xueqian Wang; Dacheng Tao
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/shi23d.html
Abstract: To mitigate the privacy leakages and communication burdens of Federated Learning (FL), decentralized FL (DFL) discards the central server and each client only communicates with its neighbors in a decentralized communication network. However, existing DFL suffers from high inconsistency among local clients, which results in severe distribution shift and inferior performance compared with centralized FL (CFL), especially on heterogeneous data or sparse communication topologies. To alleviate this issue, we propose two DFL algorithms named DFedSAM and DFedSAM-MGS to improve the performance of DFL. Specifically, DFedSAM leverages gradient perturbation to generate local flat models via Sharpness Aware Minimization (SAM), which searches for models with uniformly low loss values. DFedSAM-MGS further boosts DFedSAM by adopting Multiple Gossip Steps (MGS) for better model consistency, which accelerates the aggregation of local flat models and better balances communication complexity and generalization. Theoretically, we present improved convergence rates and in non-convex setting for DFedSAM and DFedSAM-MGS, respectively, where is the spectral gap of gossip matrix and Q is the number of MGS. Empirically, our methods can achieve competitive performance compared with CFL methods and outperform existing DFL methods.
ISSN: 2640-3498 abstractTranslation: 为了减轻联邦学习(FL)的隐私泄露和通信负担,去中心化 FL(DFL)抛弃了中央服务器,每个客户端仅与去中心化通信网络中的邻居进行通信。然而,现有的DFL在本地客户端之间存在高度不一致的问题,这导致与集中式FL(CFL)相比严重的分布偏移和较差的性能,特别是在异构数据或稀疏通信拓扑上。为了缓解这个问题,我们提出了两种 DFL 算法 DFedSAM 和 DFedSAM-MGS 来提高 DFL 的性能。具体来说,DFedSAM 利用梯度扰动通过锐度感知最小化 (SAM) 生成局部平坦模型,该模型搜索具有一致低损失值的模型。DFedSAM-MGS 通过采用多个 Gossip Steps (MGS) 来进一步增强 DFedSAM,以实现更好的模型一致性,从而加速本地平面模型的聚合并更好地平衡通信复杂性和泛化性。理论上,我们提出了改进的收敛速度 和 非分别为 DFedSAM 和 DFedSAM-MGS 的凸设置,其中 是 gossip 矩阵的谱间隙,Q 是 MGS 的数量。根据经验,与 CFL 方法相比,我们的方法可以实现具有竞争力的性能,并且优于现有的 DFL 方法。
Notes:
PUB (https://openreview.net/forum?id=fn2NFlYLBL)
PDF (https://arxiv.org/abs/2302.04083)
Conformal Prediction for Federated Uncertainty Quantification Under Label Shift
Authors: Vincent Plassier; Mehdi Makni; Aleksandr Rubashevskii; Eric Moulines; Maxim Panov
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/plassier23a.html
Abstract: Federated Learning (FL) is a machine learning framework where many clients collaboratively train models while keeping the training data decentralized. Despite recent advances in FL, the uncertainty quantification topic (UQ) remains partially addressed. Among UQ methods, conformal prediction (CP) approaches provides distribution-free guarantees under minimal assumptions. We develop a new federated conformal prediction method based on quantile regression and take into account privacy constraints. This method takes advantage of importance weighting to effectively address the label shift between agents and provides theoretical guarantees for both valid coverage of the prediction sets and differential privacy. Extensive experimental studies demonstrate that this method outperforms current competitors.
ISSN: 2640-3498 abstractTranslation: 联邦学习 (FL) 是一种机器学习框架,许多客户在其中协作训练模型,同时保持训练数据分散。尽管 FL 最近取得了进展,但不确定性量化主题 (UQ) 仍然得到部分解决。在 UQ 方法中,保形预测 (CP) 方法在最小假设下提供无分布保证。我们开发了一种基于分位数回归的新联邦共形预测方法,并考虑了隐私约束。该方法利用重要性加权有效解决智能体之间的标签转移问题,为预测集的有效覆盖和差分隐私提供了理论保证。大量的实验研究表明,这种方法优于当前的竞争对手。
Notes:
PUB (https://openreview.net/forum?id=ytpEqHYSEy)
PDF (https://arxiv.org/abs/2306.05131)
Authors: Kumar Kshitij Patel; Lingxiao Wang; Aadirupa Saha; Nathan Srebro
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/patel23a.html
Abstract: We study the problems of distributed online and bandit convex optimization against an adaptive adversary. We aim to minimize the average regret on MMM machines working in parallel over TTT rounds with RRR intermittent communications. Assuming the underlying cost functions are convex and can be generated adaptively, our results show that collaboration is not beneficial when the machines have access to the first-order gradient information at the queried points. This is in contrast to the case for stochastic functions, where each machine samples the cost functions from a fixed distribution. Furthermore, we delve into the more challenging setting of federated online optimization with bandit (zeroth-order) feedback, where the machines can only access values of the cost functions at the queried points. The key finding here is identifying the high-dimensional regime where collaboration is beneficial and may even lead to a linear speedup in the number of machines. We further illustrate our findings through federated adversarial linear bandits by developing novel distributed single and two-point feedback algorithms. Our work is the first attempt towards a systematic understanding of federated online optimization with limited feedback, and it attains tight regret bounds in the intermittent communication setting for both first and zeroth-order feedback. Our results thus bridge the gap between stochastic and adaptive settings in federated online optimization.
ISSN: 2640-3498 abstractTranslation: 我们研究针对自适应对手的分布式在线和强盗凸优化问题。我们的目标是最大限度地减少 MMM 机器在 TTT 轮次中与 RRR 间歇性通信并行工作的平均遗憾。假设底层成本函数是凸的并且可以自适应生成,我们的结果表明,当机器可以访问查询点的一阶梯度信息时,协作是没有好处的。这与随机函数的情况相反,随机函数中每台机器从固定分布中采样成本函数。此外,我们深入研究了具有强盗(零阶)反馈的联邦在线优化的更具挑战性的设置,其中机器只能访问查询点处的成本函数的值。这里的关键发现是确定协作有益的高维状态,甚至可能导致机器数量的线性加速。我们通过开发新颖的分布式单点和两点反馈算法,通过联邦对抗性线性老虎机进一步说明我们的发现。我们的工作是系统地理解有限反馈的联邦在线优化的首次尝试,并且它在一阶和零阶反馈的间歇性通信设置中达到了严格的遗憾界限。因此,我们的结果弥合了联邦在线优化中随机设置和自适应设置之间的差距。
Notes:
Towards Understanding Ensemble Distillation in Federated Learning
Authors: Sejun Park; Kihun Hong; Ganguk Hwang
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/park23e.html
Abstract: Federated Learning (FL) is a collaborative machine learning paradigm for data privacy preservation. Recently, a knowledge distillation (KD) based information sharing approach in FL, which conducts ensemble distillation on an unlabeled public dataset, has been proposed. However, despite its experimental success and usefulness, the theoretical analysis of the KD based approach has not been satisfactorily conducted. In this work, we build a theoretical foundation of the ensemble distillation framework in federated learning from the perspective of kernel ridge regression (KRR). In this end, we propose a KD based FL algorithm for KRR models which is related with some existing KD based FL algorithms, and analyze our algorithm theoretically. We show that our algorithm makes local prediction models as much powerful as the centralized KRR model (which is a KRR model trained by all of local datasets) in terms of the convergence rate of the generalization error if the unlabeled public dataset is sufficiently large. We also provide experimental results to verify our theoretical results on ensemble distillation in federated learning.
ISSN: 2640-3498 abstractTranslation: 联邦学习(FL)是一种用于数据隐私保护的协作机器学习范例。最近,人们提出了一种基于知识蒸馏(KD)的 FL 信息共享方法,该方法在未标记的公共数据集上进行集成蒸馏。然而,尽管实验取得了成功且有用,但基于 KD 的方法的理论分析尚未令人满意。在这项工作中,我们从核岭回归(KRR)的角度构建了联邦学习中集成蒸馏框架的理论基础。最后,我们针对KRR模型提出了一种基于KD的FL算法,与一些现有的基于KD的FL算法相关,并对我们的算法进行了理论上的分析。我们表明,如果未标记的公共数据集足够大,就泛化误差的收敛速度而言,我们的算法使本地预测模型与集中式 KRR 模型(这是由所有本地数据集训练的 KRR 模型)一样强大。我们还提供了实验结果来验证我们在联邦学习中集成蒸馏的理论结果。
Notes:
Secure Federated Correlation Test and Entropy Estimation
Authors: Qi Pang; Lun Wang; Shuai Wang; Wenting Zheng; Dawn Song
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/pang23a.html
Abstract: We propose the first federated correlation test framework compatible with secure aggregation, namely FED-χ2. In our protocol, the statistical computations are recast as frequency moment estimation problems, where the clients collaboratively generate a shared projection matrix and then use stable projection to encode the local information in a compact vector. As such encodings can be linearly aggregated, secure aggregation can be applied to conceal the individual updates. We formally establish the security guarantee of FED-χ2 by proving that only the minimum necessary information (i.e., the correlation statistics) is revealed to the server. We show that our protocol can be naturally extended to estimate other statistics that can be recast as frequency moment estimations. By accommodating Shannon’e Entropy in FED-χ2, we further propose the first secure federated entropy estimation protocol, FED-HHH. The evaluation results demonstrate that FED-χ2 and FED-H achieve good performance with small client-side computation overhead in several real-world case studies.
ISSN: 2640-3498 abstractTranslation: 我们提出了第一个与安全聚合兼容的联邦相关性测试框架,即FED-χ2。在我们的协议中,统计计算被重新定义为频率矩估计问题,其中客户端协作生成共享投影矩阵,然后使用稳定投影将局部信息编码在紧凑向量中。由于此类编码可以线性聚合,因此可以应用安全聚合来隐藏各个更新。我们通过证明只向服务器透露最少的必要信息(即相关统计信息)来正式建立 FED-χ2 的安全保证。我们表明,我们的协议可以自然地扩展到估计其他统计数据,这些统计数据可以重新转换为频率矩估计。通过在 FED-χ22 中容纳香农熵,我们进一步提出了第一个安全的联邦熵估计协议 FED-H。评估结果表明,FED-χ2 和 FED-HHH 在几个实际案例研究中以较小的客户端计算开销实现了良好的性能。
Notes:
Flash: Concept Drift Adaptation in Federated Learning
Authors: Kunjal Panchal; Sunav Choudhary; Subrata Mitra; Koyel Mukherjee; Somdeb Sarkhel; Saayan Mitra; Hui Guan
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/panchal23a.html
Abstract: In Federated Learning (FL), adaptive optimization is an effective approach to addressing the statistical heterogeneity issue but cannot adapt quickly to concept drifts. In this work, we propose a novel adaptive optimizer called Flash that simultaneously addresses both statistical heterogeneity and the concept drift issues. The fundamental insight is that a concept drift can be detected based on the magnitude of parameter updates that are required to fit the global model to each participating client’s local data distribution. Flash uses a two-pronged approach that synergizes client-side early-stopping training to facilitate detection of concept drifts and the server-side drift-aware adaptive optimization to effectively adjust effective learning rate. We theoretically prove that Flash matches the convergence rate of state-of-the-art adaptive optimizers and further empirically evaluate the efficacy of Flash on a variety of FL benchmarks using different concept drift settings.
ISSN: 2640-3498 abstractTranslation: 在联邦学习(FL)中,自适应优化是解决统计异质性问题的有效方法,但无法快速适应概念漂移。在这项工作中,我们提出了一种称为 Flash 的新型自适应优化器,它可以同时解决统计异质性和概念漂移问题。基本见解是,可以根据将全局模型适合每个参与客户的本地数据分布所需的参数更新的幅度来检测概念漂移。Flash采用双管齐下的方法,协同客户端提前停止训练以促进概念漂移检测和服务器端漂移感知自适应优化以有效调整有效学习率。我们从理论上证明 Flash 与最先进的自适应优化器的收敛速度相匹配,并使用不同的概念漂移设置进一步根据经验评估 Flash 在各种 FL 基准上的功效。
Notes:
PUB (https://openreview.net/forum?id=q5RHsg6VRw)
SRATTA: Sample Re-ATTribution Attack of Secure Aggregation in Federated Learning.
Authors: Tanguy Marchand; Regis Loeb; Ulysse Marteau-Ferey; Jean Ogier Du Terrail; Arthur Pignet
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/marchand23a.html
Abstract: We consider a federated learning (FL) setting where a machine learning model with a fully connected first layer is trained between different clients and a central server using FedAvg, and where the aggregation step can be performed with secure aggregation (SA). We present SRATTA an attack relying only on aggregated models which, under realistic assumptions, (i) recovers data samples from the different clients, and (ii) groups data samples coming from the same client together. While sample recovery has already been explored in an FL setting, the ability to group samples per client, despite the use of SA, is novel. This poses a significant unforeseen security threat to FL and effectively breaks SA. We show that SRATTA is both theoretically grounded and can be used in practice on realistic models and datasets. We also propose counter-measures, and claim that clients should play an active role to guarantee their privacy during training.
ISSN: 2640-3498 abstractTranslation: 我们考虑联邦学习 (FL) 设置,其中使用 FedAvg 在不同客户端和中央服务器之间训练具有完全连接的第一层的机器学习模型,并且可以使用安全聚合 (SA) 来执行聚合步骤。我们提出 SRATTA 是一种仅依赖于聚合模型的攻击,在现实的假设下,(i)从不同客户端恢复数据样本,(ii)将来自同一客户端的数据样本分组在一起。虽然样本回收已经在 FL 设置中进行了探索,但尽管使用了 SA,但对每个客户的样本进行分组的能力还是新颖的。这对 FL 构成了重大的不可预见的安全威胁,并有效地破坏了 SA。我们证明 SRATTA 既有理论依据,也可以在现实模型和数据集的实践中使用。我们还提出了对策,并要求客户在培训期间发挥积极作用,保护自己的隐私。
Notes:
PUB (https://openreview.net/forum?id=pRsJIVcjxD)
PDF (https://arxiv.org/abs/2306.07644)
CODE (https://github.com/owkin/sratta)
Vertical Federated Graph Neural Network for Recommender System
Authors: Peihua Mai; Yan Pang
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/mai23b.html
Abstract: Conventional recommender systems are required to train the recommendation model using a centralized database. However, due to data privacy concerns, this is often impractical when multi-parties are involved in recommender system training. Federated learning appears as an excellent solution to the data isolation and privacy problem. Recently, Graph neural network (GNN) is becoming a promising approach for federated recommender systems. However, a key challenge is to conduct embedding propagation while preserving the privacy of the graph structure. Few studies have been conducted on the federated GNN-based recommender system. Our study proposes the first vertical federated GNN-based recommender system, called VerFedGNN. We design a framework to transmit: (i) the summation of neighbor embeddings using random projection, and (ii) gradients of public parameter perturbed by ternary quantization mechanism. Empirical studies show that VerFedGNN has competitive prediction accuracy with existing privacy preserving GNN frameworks while enhanced privacy protection for users’ interaction information.
ISSN: 2640-3498 abstractTranslation: 传统的推荐系统需要使用集中式数据库来训练推荐模型。然而,由于数据隐私问题,当多方参与推荐系统训练时,这通常是不切实际的。联邦学习似乎是数据隔离和隐私问题的绝佳解决方案。最近,图神经网络(GNN)正在成为联邦推荐系统的一种有前景的方法。然而,一个关键的挑战是在保护图结构隐私的同时进行嵌入传播。关于基于 GNN 的联邦推荐系统的研究很少。我们的研究提出了第一个基于 GNN 的纵向联邦推荐系统,称为 VerFedGNN。我们设计了一个框架来传输:(i)使用随机投影的邻居嵌入的总和,以及(ii)由三元量化机制扰动的公共参数的梯度。实证研究表明,VerFedGNN 与现有的隐私保护 GNN 框架相比,具有竞争性的预测精度,同时增强了对用户交互信息的隐私保护。
Notes:
PUB (https://openreview.net/forum?id=NRnS6CtbaN)
PDF (https://arxiv.org/abs/2303.05786)
CODE (https://github.com/maiph123/verticalgnn)
Federated Conformal Predictors for Distributed Uncertainty Quantification
Authors: Charles Lu; Yaodong Yu; Sai Praneeth Karimireddy; Michael Jordan; Ramesh Raskar
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/lu23i.html
Abstract: Conformal prediction is emerging as a popular paradigm for providing rigorous uncertainty quantification in machine learning since it can be easily applied as a post-processing step to already trained models. In this paper, we extend conformal prediction to the federated learning setting. The main challenge we face is data heterogeneity across the clients — this violates the fundamental tenet of exchangeability required for conformal prediction. We propose a weaker notion of partial exchangeability, better suited to the FL setting, and use it to develop the Federated Conformal Prediction (FCP) framework. We show FCP enjoys rigorous theoretical guarantees and excellent empirical performance on several computer vision and medical imaging datasets. Our results demonstrate a practical approach to incorporating meaningful uncertainty quantification in distributed and heterogeneous environments. We provide code used in our experiments https://github.com/clu5/federated-conformal.
ISSN: 2640-3498 abstractTranslation: 保形预测正在成为一种在机器学习中提供严格的不确定性量化的流行范例,因为它可以轻松地用作已训练模型的后处理步骤。在本文中,我们将共形预测扩展到联邦学习设置。我们面临的主要挑战是客户端之间的数据异构性——这违反了保形预测所需的可交换性的基本原则。我们提出了一个较弱的部分可交换性概念,更适合 FL 设置,并用它来开发联邦共形预测 (FCP) 框架。我们证明 FCP 在多个计算机视觉和医学成像数据集上享有严格的理论保证和出色的实证性能。我们的结果展示了一种在分布式和异构环境中纳入有意义的不确定性量化的实用方法。我们提供实验中使用的代码 https://github.com/clu5/federated-conformal。
Notes:
PUB (https://openreview.net/forum?id=YVTr9PzIrK)
PDF (https://arxiv.org/abs/2305.17564)
CODE (https://github.com/clu5/federated-conformal)
Byzantine-Robust Learning on Heterogeneous Data via Gradient Splitting
Authors: Yuchen Liu; Chen Chen; Lingjuan Lyu; Fangzhao Wu; Sai Wu; Gang Chen
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/liu23d.html
Abstract: Federated learning has exhibited vulnerabilities to Byzantine attacks, where the Byzantine attackers can send arbitrary gradients to a central server to destroy the convergence and performance of the global model. A wealth of robust AGgregation Rules (AGRs) have been proposed to defend against Byzantine attacks. However, Byzantine clients can still circumvent robust AGRs when data is non-Identically and Independently Distributed (non-IID). In this paper, we first reveal the root causes of performance degradation of current robust AGRs in non-IID settings: the curse of dimensionality and gradient heterogeneity. In order to address this issue, we propose GAS, a GrAdient Splitting approach that can successfully adapt existing robust AGRs to non-IID settings. We also provide a detailed convergence analysis when the existing robust AGRs are combined with GAS. Experiments on various real-world datasets verify the efficacy of our proposed GAS. The implementation code is provided in https://github.com/YuchenLiu-a/byzantine-gas.
ISSN: 2640-3498 abstractTranslation: 联邦学习暴露了拜占庭攻击的漏洞,拜占庭攻击者可以向中央服务器发送任意梯度,以破坏全局模型的收敛和性能。人们提出了大量强大的聚合规则(AGR)来防御拜占庭攻击。然而,当数据非相同且独立分布(非 IID)时,拜占庭客户端仍然可以规避强大的 AGR。在本文中,我们首先揭示了非 IID 设置中当前鲁棒 AGR 性能下降的根本原因:维数灾难和梯度异质性。为了解决这个问题,我们提出了 GAS,一种 GraAdient Splitting 方法,可以成功地将现有的鲁棒 AGR 适应非 IID 设置。我们还提供了现有稳健 AGR 与 GAS 结合时的详细收敛分析。对各种现实世界数据集的实验验证了我们提出的 GAS 的有效性。https://github.com/YuchenLiu-a/byzantine-gas 中提供了实现代码。
Notes:
PUB (https://openreview.net/forum?id=3DI6Kmw81p)
PDF (https://arxiv.org/abs/2302.06079)
CODE (https://github.com/YuchenLiu-a/byzantine-gas)
Fair yet Asymptotically Equal Collaborative Learning
Authors: Xiaoqiang Lin; Xinyi Xu; See-Kiong Ng; Chuan-Sheng Foo; Bryan Kian Hsiang Low
Conference : International Conference on Machine Learning
Url: https://proceedings.mlr.press/v202/lin23l.html
Abstract: In collaborative learning with streaming data, nodes (e.g., organizations) jointly and continuously learn a machine learning (ML) model by sharing the latest model updates computed from their latest streaming data. For the more resourceful nodes to be willing to share their model updates, they need to be fairly incentivized. This paper explores an incentive design that guarantees fairness so that nodes receive rewards commensurate to their contributions. Our approach leverages an explore-then-exploit formulation to estimate the nodes’ contributions (i.e., exploration) for realizing our theoretically guaranteed fair incentives (i.e., exploitation). However, we observe a "rich get richer" phenomenon arising from the existing approaches to guarantee fairness and it discourages the participation of the less resourceful nodes. To remedy this, we additionally preserve asymptotic equality, i.e., less resourceful nodes achieve equal performance eventually to the more resourceful/“rich” nodes. We empirically demonstrate in two settings with real-world streaming data: federated online incremental learning and federated reinforcement learning, that our proposed approach outperforms existing baselines in fairness and learning performance while remaining competitive in preserving equality.
ISSN: 2640-3498 abstractTranslation: 在使用流数据的协作学习中,节点(例如组织)通过共享根据其最新流数据计算的最新模型更新来共同持续地学习机器学习(ML)模型。为了让更足智多谋的节点愿意分享他们的模型更新,他们需要得到相当的激励。本文探讨了一种保证公平性的激励设计,以便节点获得与其贡献相称的奖励。我们的方法利用探索然后利用的公式来估计节点的贡献(即探索),以实现我们理论上保证的公平激励(即利用)。然而,我们观察到现有保证公平的方法出现了“富者愈富”的现象,并且阻碍了资源较少的节点的参与。为了解决这个问题,我们还保留渐近平等,即资源较少的节点最终会获得与资源较多/“丰富”的节点相同的性能。我们通过现实世界的流数据在两种环境中进行实证证明:联邦在线增量学习和联邦强化学习,我们提出的方法在公平性和学习性能方面优于现有基线,同时在维护平等方面保持竞争力。
Notes:
PUB (https://openreview.net/forum?id=5VhltFPSO8)
PDF (https://arxiv.org/abs/2306.05764)
CODE (https://github.com/xqlin98/Fair-yet-Equal-CML)
作者: 白小鱼(上海交通大学计算机系博士生)
分享仅供学习参考,若有不当,请联系我们处理。
END
热文
2.会议信息 | 2023年9月截稿的密码学与信息安全会议整理
推荐4. 笔记分享|山东大学隐私计算暑期课:Encryption: Computational security 1-4