Current Issue Cover

王玫1, 邓伟洪2, 苏森2(1.北京师范大学人工智能学院;2.北京邮电大学人工智能学院)

摘 要
A review on fairness in image recognition

(School of Artificial Intelligence,Beijing University of Posts and Telecommunications)

In the past few decades, image recognition technology has undergone rapid developments and has integrated into our lives, profoundly changing the course of human society. However, recent researches and applications indicate that image recognition systems would show human-like discriminatory bias or make unfair decisions toward certain groups or populations, and even reduce the quality of their performances in historically under-served populations. Consequently, there is an increased need to guarantee fairness for image recognition systems and prevent discriminatory decisions, so people can fully trust and live in harmony with them. In this paper, we give a comprehensive overview of the cutting-edge research progress toward fairness in image recognition. First, we define that fairness aims to achieve consistent performances across different groups regardless of peripheral attributes, e.g., color, background, gender and race, and illustrate bias comes from three aspects. 1) Data imbalance. In existing datasets, some groups are over-represented and others are under-represented. Deep models will optimize for the over-represented groups because this boosts the accuracy, while ignoring the under-represented ones during training. 2) Spurious correlations. Existing methods always capture unintended decision rules from spurious correlations between target variables and peripheral attributes, failing to generalize on the images with no such correlations. 3) Group discrepancy. There is a large discrepancy between different groups. When deep models cannot trade off the specific requirements of various groups, they have to sacrifice performance on some subjects. Second, datasets (e.g., Colored MNIST, Corrupted CIFAR-10, CelebA, BAR, RFW) and evaluation metrics (e.g., equal opportunity and equal odds) used for fairness in image recognition are also introduced. These datasets enable researchers to study the bias of image recognition models with respect to color, background, image quality, gender, race and age. Third, we divide the debiased methods designed for image recognition into seven categories. 1) Sample reweighting (or resampling). This method assigns larger weights (increases the sampling frequency) to the minority groups, and simultaneously assigns smaller weights (decreases the sampling frequency) to the majority ones, such that the model will focus more on the minority groups and reduce the performance difference across groups. 2) Image augmentation. Generative adversarial networks (GANs) are introduced into debiased methods to translate the images of over-represented groups to the ones of under-represented groups. This method modifies the bias attributes of over-represented samples while remaining their target attributes unchanged, such that more samples are generated for under-represented groups and the problem of data imbalance is addressed. 3) Feature augmentation. Considering that image augmentation suffers from the model collapse in the training process of GANs, some works augment samples on the feature level. They encourage the recognition model to output consistent predictions for the samples before and after perturbing and editing the bias information of features, which makes it impossible for the model to predict target attributes based on bias information and thus improves the model fairness. 4) Feature disentanglement. It is one of the most commonly used debiased methods, which removes the spurious correlation between target and bias attributes in the feature space and learns target features that are independent of bias. 5) Metric learning. In order to encourage the model to make predictions based on target attributes rather than bias information, this method utilizes the power of metric learning, e.g., contrastive learning, to promote pulling the same target class but different bias class samples closer and pushing the different target class but same bias class samples away in the feature space. 6) Model adaptation. To address the problem of group discrepancy, some works adaptively change the network depth or hyperparameters for different groups according to their specific requirements, which improves the performance on under-represented groups. 7) Post-processing. This method assumes black-box access to a biased model and aims to modify the final predictions outputted by the model to mitigate bias. We additionally discuss the advantages and limitations of these methods. Competitive performances and experimental comparisons in widely used benchmarks are also summarized. Finally, we review and summarize the following future directions in this field. 1) In existing datasets, bias attributes are limited to color, background, image quality, race, age, and gender. Diverse datasets need to be constructed to study more complex biases in the real world. 2) Most of the recent works dealing with bias mitigation require annotations of the bias source. However, annotations require expensive labor, and sometimes multiple biases may coexist. How to mitigate multiple unknown biases remains to be fully explored. 3) There is a trade-off dilemma between fairness and algorithm performance. It is challenging to simultaneously reduce the effect of bias without hampering the overall model performance. 4) Causal intervention is introduced into object classification to mitigate bias, while individual fairness is proposed to encourage models to give similar predictions to similar individuals in face recognition. 5) Fairness on video data has also attracted attention recently.