IntroductionCollaborative training, however, does disclose information via model updates that are based on the training data. The key question we investigate in this paper is: what can be inferred about a participant’s training dataset from the model updates revealed during collaborative model training?本文关注的重点问题就是:从参数更新上,能推断出用户数据集的哪些信息?Of course, the purpose of ML is to discover new information about the data. Any useful ML model reveals something about the population from which the training data was drawn. For example, in addition to accurately classifying its inputs, a classifier model may reveal the features that characterize a given class or help construct data points that belong to this class. In this paper, we focus on inferring “unintended” features, i.e., properties that hold for certain subsets of the training data, but not generically for all class members。.任何有用 的机器学习模型,都会揭示训练数据的特征。例如分类模型,在给训练数据进行精确分类的同时,它也获得了训练数据中哪些数据特征影响了分类结果。因此本文所推断的特征做了一个说明,并不是那些全体训练集都拥有的特征,而是部分数据集所拥有的特征,并且这些特征并不是机器学习的最终目的(或者影响不大)。The basic privacy violation in this setting is membership inference: given an exact data point, determine if it was used to train the model. Prior work described passive and active membership inference attacks against ML models [24, 53], but collaborative learning presents interesting new avenues for such inferences. For example, we show that an adversarial participant can infer whether a specific location profile was used to train a gender classifier on the FourSquare location dataset [64] with 0.99 precision and perfect recall.在CL设定下,最基础的一种隐私骚扰就是:给定一个数据点,判断它是否参与训练中。本文研究证明了,CL的敌手可以在一个性别分类数据集中,推断某个特定的位置轮廓是否参与了训练当中。We then investigate passive and active property inference attacks that allow an adversarial participant in collaborative learning to infer properties of other participants’ training data that are not true of the class as a whole, or even independent of the features that characterize the classes of the joint model. We also study variations such as inferring when a property appears and disappears in the data during training—for example, identifying when a certain person first appears in the photos used to train a generic gender classifier.本文调查了利用消极和积极的攻击方法,让敌手推断其余参与者的训练数据特征,而这些特征都是独立于机器学习任务的。其次还研究了一些变化,例如在训练过程中推断属性何时在数据中出现和消失。Our key observation, concretely illustrated by our experiments, is that modern deep-learning models come up with separate internal representations of all kinds of features, some of which are independent of the task being learned. These “unintended” features leak information about participants’ training data. We also demonstrate that an active adversary can use multi-task learning to trick the joint model into learning a better internal separation of the features that are of interest to him and thus extract even more information.本文的重要发现,现代深度学习模型中,提出了各种特征的独立内部表示,而其中的一些特征与所学任务无关。这些无关特征,揭示了训练数据的隐私。并且积极的敌手,可以使用多任务学习来欺骗联合模型,从而更好地内部分离他感兴趣的特征,从而提取更多的信息。Federated learning with model averaging [35] does not reveal individual gradient updates, greatly reducing the information available to the adversary. We demonstrate successful attacks even in this setting, e.g., inferring that photos of a certain person appear in the training data.即使对于FL设定,此种攻击方式也有效。Finally, we evaluate possible defenses—sharing fewer gradients, reducing the dimensionality of the input space, dropout—and find that they do not effectively thwart our attacks. We also attempt to use participant-level different privacy [36], which, however, is geared to work with thousands of users, and the joint model fails to converge in our setting.常见的防御手段,如共享更少的梯度,减少输入空间的维度,dropout对于攻击无效,而采用用户层面的dp会导致无法最终模型无法收敛。
03
Background这里专门将Collaborative learning和Federated learning做了一个对比,为了我觉得是看过这么多篇论文里面最清晰的,所以这里再讲一遍。Collaborative learning文中是将FL视作CL的一种architecture,因此将CL分为了Collaborative learning with synchronized gradient updates,Federated learning with model averagingCollaborative learning may also involve participants who want to hide their training data from each other. We review two architectures for privacy-preserving collaborative learning based on, respectively, [52] and [35]Collaborative learning with synchronized gradient updates
在每轮iteration中,FL参与者下载服务器的全局模型,利用自己本地训练数据中的batch更新梯度数据,并将该更新发送给服务器。服务器等待所有参与者的更新接收完成后,对梯度进行聚合,并利用SGD算法对全局模型进行更新。 其中,一些防御手段,将梯度数据加dp在上传,以及只上传部分梯度,被后文实验证明都不能有效抵抗攻击。Federated learning with model averaging
Reasoning about Privacy in Machine Learning定义什么是隐私侵犯,如果仅仅是学到了训练数据的一点新东西,算不上隐私侵犯,至少敌手要学习到比影响分类的特征以外的特征。因此文中对敌手所探测到的特征进行了分类。Inferring class representatives攻击说明:给定对分类器模型的黑盒访问,推断出每个类的特征,从而可以构造这些类的代表特征。攻击效果:在特殊情况下,只有在所有类成员都相似的特殊情况下,模型反演的结果与训练数据相似。例子1:在一个面部识别模型中,每个类对应于一个单独的个体,所有类成员描述同一个人。因此,模型反演的输出在视觉上类似于该人的任何图像,包括训练照片。如果类成员在视觉上不尽相同,则模型反演的结果与训练数据不尽相同例子2:利用GAN进行反推的模型中,只有当数据集本身十分类似时,GAN的训练数据重建效果才会好,如采用MNIST手写图片数据集,因为1-9的数字之间本身是相似的。而当GAN去重建一个。模型反演和GANs产生的数据点只有在所有类成员都相似时才与训练输入相似,MNIST和人脸识别数据集就是这样。这是由于,一个训练后的分类器,会揭示特征与标签之间的关系,这样敌手较容易的从其中获得逆向关系。也就是说,GAN只能笨笨地学习到分类出来的特征,比如下图,GAN还原标签为女性的数据集,GAN只能做到找出谁是女性,但无法进一步区分这些图片的其他特征(没办法把这些人区分开)。
!!!高能来袭!!!以下是本文对GAN攻击的定性和回应,可以说GAN攻击就是一种非常畸形的攻击设定。 因此,针对CL的GAN攻击本质上是一种,将共有模型过度拟合到单个参与者的训练数据中来实现的。他假定了某个类的训练集都属于单个参与者。而这种情况在现实的CL中是不存在的。现实的情况应该是某个类的训练数据是分散在许多参与者的设备中的。因此,后文的讨论都是基于这个实验设定。Inferring membership in training data最简单的隐私泄露是,给定一个模型和一个精确的数据点,推断这个点是否用于训练模型。Inferring properties of training data以往的攻击,都是获取全部数据集所拥有的特征,而本文针对于部分数据集上所拥有的特征。比如在区分性别的数据集上,推断某些图片是否佩戴眼镜,或者某些特定人物是否出现在图片中。
05
Inference AttacksThreat model普通攻击情况与FL的设定一致,在训练数据属性推测中,需要额外准备训练数据,标注出敌手想要推测的属性以及数据点的主任务属性。Overview of the attacks
敌手每一轮训练中保存联合模型参数的快照θt。连续快照之间的差异等于来自所有参与者的聚合更新,因此∆θt− ∆θt adv是来自除对手之外的所有参与者的聚合更新。 Leakage from the embedding layer嵌入层的矩阵被作为模型参数传递和优化,嵌入层的梯度相对于输入的单词是稀疏的:给定一批文本,嵌入层只使用出现在该批中的单词进行更新。而其他单词的梯度为0。这种差异就会泄露在训练中参与者使用了哪些单词进行训练。Leakage from the gradients梯度的更新可以用来推测特征值,特征值反应训练数据的隐私。Membership inference嵌入层的非0梯度,直接会反映哪些训练数据参与到训练中Passive property inference攻击准备:敌手准备额外的数据,由含有推测属性和不含推测属性的两部分数据组成,这些数据点需要从与目标参与者的数据相同的类中取样,但在其他方面可能是不相关的。攻击直觉:敌手可以利用全局模型的快照,根据具有属性的数据生成聚合更新,并根据不具有属性的数据生成更新。这将生成标记的示例,使对手能够训练一个二进制批处理属性分类器,该分类器确定观察到的更新是否基于具有或不具有该属性的数据。
Active property inference 敌手通过利用多任务学习来完成积极的属性推测,对手用连接到最后一层的增广属性分类器扩展其协作训练模型的本地副本。他训练这个模型,使其同时在主任务上表现良好,并识别批处理属性。在每个记录都有一个主标签y和一个属性标签p的训练数据上,模型的联合损失计算如下