Current Issue Cover
面向图像识别的公平性研究进展

王玫1, 邓伟洪2, 苏森2(1.北京师范大学人工智能学院;2.北京邮电大学人工智能学院)

摘 要
在过去的几十年里,图像识别技术经历了迅速的发展,并深刻地改变着人类社会的进程。发展图像识别技术的目的是想要通过减少人力劳动和增加便利来造福人类。然而,最近的研究和应用表明,图像识别系统可能会表现出偏见甚至歧视行为,从而对个人和社会产生潜在的负面影响。因此,图像识别的公平性研究受到广泛关注,需要避免图像识别系统可能给人们带来的偏见与歧视,才能使人完全信任该项技术并与之和谐相处。本文对图像识别的公平性研究进行了全面的梳理回顾。首先,简要介绍了偏见来源于三个方面,即数据不平衡、属性间的虚假关联和群体差异性;其次,对于常用的数据集和评价指标进行汇总;然后,将现有的去偏见算法划分为重加权(重采样)、图像增强、特征增强、特征解耦、度量学习、模型自适应和后处理七类,并分别对各类方法进行介绍,阐述了各方法的优缺点;最后,对该领域的未来研究方向和机遇挑战进行了总结和展望。整体而言,学术界对图像识别公平性的研究已经取得了较大的进展,然而该领域仍处于发展初期,数据集和评价指标仍有待完善,针对未知偏见的公平性算法亟待解决,准确率和公平性的权衡困境有待突破,针对细分任务的独特发展趋势开始呈现,视频数据的去偏见算法逐渐受到关注。
关键词
A review on fairness in image recognition

(School of Artificial Intelligence,Beijing University of Posts and Telecommunications)

Abstract
In the past few decades, image recognition technology has undergone rapid developments and has integrated into our lives, profoundly changing the course of human society. However, recent researches and applications indicate that image recognition systems would show human-like discriminatory bias or make unfair decisions toward certain groups or populations, and even reduce the quality of their performances in historically under-served populations. Consequently, there is an increased need to guarantee fairness for image recognition systems and prevent discriminatory decisions, so people can fully trust and live in harmony with them. In this paper, we give a comprehensive overview of the cutting-edge research progress toward fairness in image recognition. First, we define that fairness aims to achieve consistent performances across different groups regardless of peripheral attributes, e.g., color, background, gender and race, and illustrate bias comes from three aspects. 1) Data imbalance. In existing datasets, some groups are over-represented and others are under-represented. Deep models will optimize for the over-represented groups because this boosts the accuracy, while ignoring the under-represented ones during training. 2) Spurious correlations. Existing methods always capture unintended decision rules from spurious correlations between target variables and peripheral attributes, failing to generalize on the images with no such correlations. 3) Group discrepancy. There is a large discrepancy between different groups. When deep models cannot trade off the specific requirements of various groups, they have to sacrifice performance on some subjects. Second, datasets (e.g., Colored MNIST, Corrupted CIFAR-10, CelebA, BAR, RFW) and evaluation metrics (e.g., equal opportunity and equal odds) used for fairness in image recognition are also introduced. These datasets enable researchers to study the bias of image recognition models with respect to color, background, image quality, gender, race and age. Third, we divide the debiased methods designed for image recognition into seven categories. 1) Sample reweighting (or resampling). This method assigns larger weights (increases the sampling frequency) to the minority groups, and simultaneously assigns smaller weights (decreases the sampling frequency) to the majority ones, such that the model will focus more on the minority groups and reduce the performance difference across groups. 2) Image augmentation. Generative adversarial networks (GANs) are introduced into debiased methods to translate the images of over-represented groups to the ones of under-represented groups. This method modifies the bias attributes of over-represented samples while remaining their target attributes unchanged, such that more samples are generated for under-represented groups and the problem of data imbalance is addressed. 3) Feature augmentation. Considering that image augmentation suffers from the model collapse in the training process of GANs, some works augment samples on the feature level. They encourage the recognition model to output consistent predictions for the samples before and after perturbing and editing the bias information of features, which makes it impossible for the model to predict target attributes based on bias information and thus improves the model fairness. 4) Feature disentanglement. It is one of the most commonly used debiased methods, which removes the spurious correlation between target and bias attributes in the feature space and learns target features that are independent of bias. 5) Metric learning. In order to encourage the model to make predictions based on target attributes rather than bias information, this method utilizes the power of metric learning, e.g., contrastive learning, to promote pulling the same target class but different bias class samples closer and pushing the different target class but same bias class samples away in the feature space. 6) Model adaptation. To address the problem of group discrepancy, some works adaptively change the network depth or hyperparameters for different groups according to their specific requirements, which improves the performance on under-represented groups. 7) Post-processing. This method assumes black-box access to a biased model and aims to modify the final predictions outputted by the model to mitigate bias. We additionally discuss the advantages and limitations of these methods. Competitive performances and experimental comparisons in widely used benchmarks are also summarized. Finally, we review and summarize the following future directions in this field. 1) In existing datasets, bias attributes are limited to color, background, image quality, race, age, and gender. Diverse datasets need to be constructed to study more complex biases in the real world. 2) Most of the recent works dealing with bias mitigation require annotations of the bias source. However, annotations require expensive labor, and sometimes multiple biases may coexist. How to mitigate multiple unknown biases remains to be fully explored. 3) There is a trade-off dilemma between fairness and algorithm performance. It is challenging to simultaneously reduce the effect of bias without hampering the overall model performance. 4) Causal intervention is introduced into object classification to mitigate bias, while individual fairness is proposed to encourage models to give similar predictions to similar individuals in face recognition. 5) Fairness on video data has also attracted attention recently.
Keywords

订阅号|日报