Fairness Indicators：用于机器学习公平性的可扩展基础架构

Original Google TensorFlow 2021-07-27

收录于话题

文 / Catherina Xu 和 Tulsee Doshi, 产品经理, Google Research

虽然工业界和学术界仍在不断探索使用机器学习 (ML) 的好处，以改善产品及解决重要问题，但训练使用的算法和数据集也能反映出或加强这种不公平的偏差。例如，如果总是将某些群组的正常的文字评论标记为“垃圾评论”或“恶意评论”，审核系统便会在对话中排除这些群组的声音。

2018 年，我们曾分享过 Google 如何利用 AI 改善产品，重点介绍了指引工作方向的 AI 原则。其中，第二个原则 ”避免造成或加剧偏见” 概述了我们致力于减少偏见和尽可能减小它对人们影响。

为兑现部分承诺，我们近期在 TensorFlow World 上发布了 Fairness Indicators 的测试版本，这套工具能够常规计算和可视化二进制和多类别分类的公平性指标，帮助团队迈出识别不公平影响的第一步。

Fairness Indicators 可用于生成透明度报告指标，例如用于模型卡 (Model Cards) 的指标，从而帮助开发者就可靠地部署模型做出更好的决策。由于公平性问题及评估每个案例都有差异，因此该版本中还包含一个交互式案例研究，该研究以 Jigsaw 的恶意评论数据集意外偏差为例，说明如何使用 Fairness Indicators，以及如何根据已部署生产的机器学习 (ML) 模型的上下文来检测和修正该模型中的偏差。Fairness Indicators 现已推出测试版本，您可以在自己的案例中试用该工具。

什么是 ML 公平性？

在典型的机器学习流水线中，无论是不具代表性的数据集，还是已学习模型的表示，又或是向用户呈现结果的方式，任何部分都可能会出现偏差。相较于某些用户，由于这种偏差所导致的错误可能会对其他用户造成更大的影响。

为检测这种不平等的影响，分别对独立分层或用户群体进行评估至关重要，因为整体指标可能会掩盖某些群体的糟糕表现。这些群体可能包括但不限于，由 人种、种族、性别、国籍、收入、性取向、能力 和 宗教信仰 等敏感特征定义的群体。然而，同样必须记住，不能仅通过指标和评估实现公平性；即使所有分层都表现优异也未必能证明系统是公平的。相反，我们应将评估视为识别表现差距的首要方法，尤其是在分类模型中。

Fairness Indicators 工具套件

Fairness Indicators 工具套件可用于计算和可视化分类模型中常见的公平性指标（如误报率和漏报率），以便轻松比较各分层的性能，或将设为分层基准。该工具可以计算置信区间，进而显示统计上的显著差异，并在多个阈值上执行评估。在界面中，用户可以切换基准分层，并研究其他各类指标的表现。用户也可定制适合于自己案例的可视化指标。

此外，Fairness Indicators 集成了 What-If 工具 (WIT)：通过点击 Fairness Indicators 图表中的柱状长条，用户便能将这些特定的数据点加载到 WIT 小部件中，以供进一步检查、比较以及反事实分析。这对于大型数据集特别实用，在使用 WIT 进行深入分析之前，我们可以先利用 Fairness Indicators 识别有问题的分层。

使用 Fairness Indicators 可视化公平性评估指标

在 Fairness Indicators 中点击某个分层，即可将该分层中的所有数据点加载到 What-If 部件中。本例显示的是带“女性”标签的所有数据点

Fairness Indicators 测试版本的发布包括以下内容：

pip 软件包：包括 TensorFlow Model Analysis (TFMA)、Fairness Indicators、TensorFlow Data Validation (TFDV)、What-If 工具和示例 Colab：

Fairness Indicators 示例 Colab — Fairness Indicators 的使用简介
Fairness Indicators for TensorBoard — TensorBoard 插件的使用示例
在 TFHub 中嵌入 Fairness Indicators — 研究不同嵌入对下游公平性指标影响的 Colab
Fairness Indicators 的云端可视化 API 的人脸检测模型 — 展示如何使用 Fairness Indicators 生成模型卡评估结果的 Colab

GitHub 代码库：源代码
使用指导：公平性高度依托上下文，因此必须仔细思考每个案例及其对用户的潜在影响。本文档会指导您如何选择群体和指标，并重点介绍评估的最佳实践。
案例研究：使用 Fairness Indicators 的交互式案例研究，为您展示 Jigsaw 的 Conversation AI 团队如何使用恶意评论分类数据集检测分类模型中的偏差。

如何结合在已有的模型中

Fairness Indicators 基于 TensorFlow Model Analysis 而构建，后者是 TensorFlow Extended (TFX) 的组件，可用于计算和可视化模型性能。基于特定的 ML 工作流，Fairness Indicators 可通过以下其中一种方式集成到系统中：如果使用 TensorFlow 模型和工具（如 TFX），那么可以：

将 Fairness Indicators 用作 TFX 中 Evaluator 的部分组件
在评估其他实时指标时在 TensorBoard 中使用 Fairness Indicators

如果使用非现有的 TensorFlow 工具，那么可以：

下载 Fairness Indicators pip 软件包，然后将 TensorFlow Model Analysis 用作一个独立工具

对于非 TensorFlow 模型：

使用 Model-Agnostic TFMA，根据任意模型的输出计算 Fairness Indicators

Fairness Indicators 案例研究

我们创建了案例研究和介绍视频来展示如何将 Fairness Indicators 与工具结合使用，进而检测和减少 Jigsaw 训练的”恶意评论数据集“中意外偏差的偏差。该数据集由 Conversation AI 团队开发，该团队是 Jigsaw 的内部团队，致力于训练 ML 模型以保护对话中的意见。训练后的模型可以预测文字评论是否可能会被滥用作各类恶意、侮辱和性露骨评论。

介绍视频

这类模型的主要用例是内容审核。如果模型以系统的方式惩罚某些类型的消息（例如经常将非恶意评论标记为恶意评论，结果导致误报率偏高），则这些声音将作消音处理。在此案例研究中，我们针对数据集中性别身份关键词切分的子群体，计算出其误报率，然后结合多种工具（Fairness Indicators、TFDV 和 WIT）检测、识别出潜在问题并采取措施修正这些底层问题。

下一步计划

Fairness Indicators 只是我们的牛刀小试。我们计划通过支持更多指标（例如，无需使用阈值就能评估分类器的指标）来实现垂直扩展；同时，也计划通过创建利用主动学习和 min-diff 等方法的修正库来实现水平扩展。这是因为，我们认为通过真实示例来学习很重要；而且，随着未来几个月内更多功能推出，我们将发布更多基于案例研究的工作内容。

开始使用此工具前，请参阅 Fairness Indicators GitHub 代码库。要详细了解如何在用例上下文中考虑公平性评估，请点击此链接。

我们很乐意与您一起了解 Fairness Indicators 的实用之处，以及添加哪些功能会更有使用价值。如要针对使用体验提供反馈，请通过 tfx@tensorflow.org 与我们联系。

致谢

参与这项工作的核心团队成员包括 Christina Greer、Manasi Joshi、Huanming Fang、Shivam Jindal、Karan Shukla、Osman Aka、Sanders Kleinfeld、Alicia Chang、Alex Hanna 和 Dan Nanas。此外，感谢 James Wexler、Mahima Pushkarna、Meg Mitchell 和 Ben Hutchinson 对本项目做出的贡献。

如果您想详细了解 本文提及 的相关内容，请参阅以下文档。这些文档深入探讨了这篇文章中提及的许多主题：

AI 原则
https://www.blog.google/technology/ai/ai-principles/
TensorFlow World
https://conferences.oreilly.com/tensorflow/tf-ca/schedule/2019-10-28
Fairness Indicators
https://github.com/tensorflow/fairness-indicators
模型卡
https://modelcards.withgoogle.com/about?utm_source=googleaiblog&utm_medium=blog&utm_campaign=fairness-indicators-cs&utm_term=&utm_content=blog-mc
交互式案例研究
https://developers.google.com/machine-learning/practica/fairness-indicators?utm_source=aiblog&utm_medium=blog&utm_campaign=fi-practicum&utm_term=&utm_content=blog-body
Jigsaw 恶意评论数据集意外偏差
https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification
已推出测试版本
https://github.com/tensorflow/fairness-indicators
What-If 工具
https://pair-code.github.io/what-if-tool/
pip 软件包
https://pypi.org/project/fairness-indicators/
TensorFlow Model Analysis
https://tensorflow.google.cn/tfx/tutorials/model_analysis/tfma_basic
TensorFlow Data Validation
https://tensorflow.google.cn/tfx/data_validation/get_started
示例 Colab
https://github.com/tensorflow/fairness-indicators/blob/master/fairness_indicators/examples/Fairness_Indicators_Example_Colab.ipynb
Fairness Indicators 示例 Colab
https://github.com/tensorflow/fairness-indicators/blob/master/fairness_indicators/examples/Fairness_Indicators_TensorBoard_Plugin_Example_Colab.ipynb
Fairness Indicators for TensorBoard
https://github.com/tensorflow/fairness-indicators/blob/master/fairness_indicators/examples/Fairness_Indicators_TensorBoard_Plugin_Example_Colab.ipynb
TensorBoard
https://tensorflow.google.cn/tensorboard
在 TFHub 中嵌入 Fairness Indicators
https://github.com/tensorflow/fairness-indicators/blob/master/fairness_indicators/examples/Fairness_Indicators_on_TF_Hub_Text_Embeddings.ipynb
Fairness Indicators 云端可视化 API 的人脸检测模型
https://github.com/tensorflow/fairness-indicators/blob/master/fairness_indicators/examples/Facessd_Fairness_Indicators_Example_Colab.ipynb
模型卡
https://modelcards.withgoogle.com/face-detection
GitHub 代码库
https://github.com/tensorflow/fairness-indicators
使用指导
https://github.com/tensorflow/fairness-indicators/blob/master/fairness_indicators/documentation/guidance.md
案例研究
https://developers.google.com/machine-learning/practica/fairness-indicators
恶意评论分类数据集
https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge
TensorFlow Extended
https://tensorflow.google.cn/tfx
Evaluator
https://tensorflow.google.cn/tfx/guide/evaluator
TensorBoard
https://github.com/tensorflow/tensorboard/blob/master/docs/fairness-indicators.md
独立工具
https://tensorflow.google.cn/tfx/guide/fairness_indicators
Model-Agnostic TFMA
https://tensorflow.google.cn/tfx/guide/fairness_indicators#model_agnostic_evaluation
案例研究
https://developers.google.com/machine-learning/practica/fairness-indicators?utm_source=aiblog&utm_medium=aiblog&utm_campaign=fi-practicum&utm_content=fi-practicum
恶意评论数据集“中的意外偏差
https://www.kaggle.com/c/jigsaw-unintended-bias-in-toxicity-classification
min-diff
https://arxiv.org/pdf/1901.04562.pdf
Fairness Indicators GitHub 代码库
https://github.com/tensorflow/fairness-indicators
此链接
https://github.com/tensorflow/fairness-indicators/blob/master/fairness_indicators/documentation/guidance.md

更多 AI 相关阅读：

中美友好合作故事——十万名中国弃婴长大了

中美友好合作故事——十万名中国弃婴长大了

看个病要排队两年，癌症都被拖成晚期

中共中央批准：作出对高朋逮捕决定

不仅要看已抓谁，还须一直抓到没