Generative Zero-Shot Image Recognition Combined with Double Contrastive Embedding Learning
- Pages: 1-15(2024)
Published Online: 02 December 2024
DOI: 10.11834/jig.240526
移动端阅览
浏览全部资源
扫码关注微信
Published Online: 02 December 2024 ,
移动端阅览
张桂梅,闫文尚,黄军阳.结合双重对比嵌入学习的生成式零样本图像识别[J].中国图象图形学报,
ZHANG Guimei,YAN Wenshang,HUANG Junyang.Generative Zero-Shot Image Recognition Combined with Double Contrastive Embedding Learning[J].Journal of Image and Graphics,
目的
2
零样本学习是解决样本数据缺失情况下目标识别的有效方法,传统的零样本识别是通过对带标签的可见类数据训练,实现对无标签的未见类新数据的识别。根据任务设置的不同,可将其分为传统零样本学习(Conventional Zero-Shot Learning,CZSL)和广义零样本学习(Generalized Zero-Shot Learning,GZSL)。生成式零样本识别方法由于可以生成未见类的视觉特征,从而将零样本学习问题转换为常规监督学习问题。但是生成式零样本识别存在特征判别性信息不足、伪视觉特征与语义信息不一致以及域偏移等导致的识别精度低下问题,针对上述问题,提出结合双重对比嵌入学习的生成式零样本图像识别方法。
方法
2
首先,针对生成的特征判别性不足问题,基于VAE-GAN(Variational Autoencoder-Generative Adversarial Network, VAE-GAN)生成框架,并集成对比嵌入模块,多个网络协同训练,提高零样本图像识别的精度;其次,以条件VAE-GAN为生成网络,提出双重对比学习策略。一方面,在现有可见类对比学习的基础上,引入未见类伪样本实例-原型域内对比学习,使得生成的伪视觉特征与语义信息保持一致性,缓解可见类和未见类的语义混淆;另一方面,提出跨域中心-原型对比学习,缓解模型过于偏向于可见类,一定程度上减轻域偏移。
结果
2
在AWA1、AWA2、CUB、SUN 四个数据集上分别进行了零样本和广义零样本识别实验,并与最新相关方法进行了比较。在零样本识别任务中,提出的方法在AWA1数据集和CUB数据集上取得最优值,相比于性能次优的模型,T1值分别提高了2.2%和2.7%;在AWA2和SUN数据集中均取得次优值。在广义零样本识别中,AWA1、AWA2、CUB三个数据集上H值均取得最优,相比次优值分别提升0.6%、0.8%、2.8%,在SUN数据集中取得次优值。同时也进行了消融实验以验证提出算法的有效性。
结论
2
实验结果表明提出的方法可提高零样本和广义零样本图像识别的精度,并具有较好的泛化性能。
Objective
2
With the rapid advancement of deep learning technologies, image recognition accuracy has significantly improved. However, traditional deep learning models heavily rely on large-scale, well-labeled datasets. This strong dependence on manually annotated data not only incurs high costs but also makes it challenging for models to adapt to new data or data-scarce situations. Zero-shot learning (ZSL) has emerged as an effective solution to these challenges, aiming to recognize unseen, unlabeled data by training on labeled data from seen classes. According to different task settings, zero-shot learning is divided into Conventional Zero-Shot Learning (CZSL) and Generalized Zero-Shot Learning (GZSL). CZSL occurs when the test samples only contain unseen class samples, while GZSL applies when the test samples include both seen and unseen class samples. Since the generalized zero-shot setting is more aligned with real-world situations, improving the accuracy of generalized zero-shot image recognition is of greater practical significance. Generative methods are the most effective approach for addressing GZSL, as they can generate visual features for unseen classes, transforming the zero-shot learning problem into a conventional supervised learning task. However, generative zero-shot recognition faces challenges, such as insufficient discriminative information in generated features, inconsistencies between pseudo-visual features and semantic information, and domain shift, all of which lead to reduced recognition accuracy. To address these issues, we propose a generative zero-shot image recognition combined with double contrastive embedding learning.
Method
2
To address the core challenge of insufficient discriminability in generated features, we construct a generative framework based on the Variational Autoencoder-Generative Adversarial Network (VAE-GAN) and integrate a contrastive embedding module. Through a collaborative training strategy involving multiple network components, the quality of pseudo-features is significantly enhanced, leading to substantial improvements in the accuracy of zero-shot image recognition. Additionally, leveraging Conditional VAE-GAN as the core generative network, we propose an innovative dual contrastive learning strategy that effectively integrates intra-domain and cross-domain information, maximizing information utilization. Specifically, we first introduce intra-domain contrastive learning among pseudo-sample instances of unseen classes and their prototypes, ensuring that generated pseudo-visual features align closely with semantic information, thereby mitigating confusion between visible and unseen classes. Secondly, we implement cross-domain center-prototype contrastive learning, which strengthens the alignment between visual centers and semantic prototypes. This approach effectively reduces inter-class variance, facilitating more efficient cross-domain knowledge transfer. As a result, the model becomes less reliant on visible classes, partially alleviating domain shift, and enabling exceptional performance even in unknown domains.
Result
2
Our experimental framework was rigorously evaluated on zero-shot and generalized zero-shot recognition tasks across four distinct datasets, and the results were compared to the latest advancements in the field. In zero-shot recognition, the proposed method achieved state-of-the-art performance on the AWA1 and CUB datasets, showing a notable improvement of 2.2% and 2.7% in T1 values, respectively, over the second-best models. On the AWA2 and SUN datasets, our approach demonstrated competitive performance, further indicating its robustness across diverse data environments. For generalized zero-shot recognition, our algorithm outperformed competitors by achieving the highest H values on the AWA1, AWA2, and CUB datasets, with improvements of 0.6%, 0.8%, and 2.8%, respectively, over the second-best methods. Notably, our approach is the only one to achieve an accuracy exceeding 70% across all three datasets, underscoring its exceptional generalization capability. On the SUN dataset, our method performed competitively, further reinforcing its overall effectiveness. The observed performance gains can be attributed to the generated pseudo-features, which exhibit a high degree of similarity to the genuine features of the visible classes. This helps mitigate semantic confusion and effectively addresses domain shift. To further validate the effectiveness of our approach, ablation studies were conducted on the AWA1 and CUB datasets, providing empirical evidence of its contributions. Additionally, we explored the impact of the number of generated samples on model performance through extensive experiments varying sample sizes in both zero-shot and generalized zero-shot scenarios across all four datasets, offering valuable insights for optimizing the generation process to enhance recognition accuracy. Finally, to assess the effectiveness of utilizing a low-dimensional embedding space in our algorithm, we performed t-SNE dimensionality reduction and visualization experiments on randomly selected unseen classes from the coarse-grained AWA2 dataset and the fine-grained CUB dataset. These experiments visually represent the embedded features and demonstrate the advantages of our approach in capturing discriminative and meaningful representations in a low-dimensional space.
Conclusion
2
The experimental results indicate that the proposed method enhances the accuracy of zero-shot and generalized zero-shot image recognition, effectively addressing the challenges associated with current contrastive learning-based generative zero-shot image recognition, while also demonstrating good generalization performance. In future research, we will consider a trade-off between recognition accuracy and efficiency by designing lightweight networks to reduce model complexity and improve recognition efficiency.
零样本学习广义零样本学习生成对抗网络嵌入空间对比学习
Zero-Shot LearningGeneralized Zero-Shot LearningGANEmbedding spaceContrastive learning
Cavazza J, Murino V, Del Bue A. No adversaries to zero-shot learning: Distilling an ensemble of gaussian feature generators[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(10): 12167-12178.
Cetin S, Baran O B, Cinbiş R G. Closed-form sample probing for learning generative models in zero-shot learning[J]. 2022.
Chao W L, Changpinyo S, Gong B, et al. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer International Publishing, 2016: 52-68.
Chen Y, Zhou Y. Incorporating attribute-level aligned comparative network for generalized zero-shot learning[J]. Neurocomputing, 2024, 573: 127188.
Chen S, Hong Z, Hou W, et al. TransZero++: Cross attribute-guided transformer for zero-shot learning[J]. IEEE transactions on pattern analysis and machine intelligence, 2022, 45(11): 12844-12861.
Chen S, **e G, Liu Y, et al. Hsva: Hierarchical semantic-visual adaptation for zero-shot learning[J]. Advances in Neural Information Processing Systems, 2021, 34: 16622-16634.
Chen S, Hou W, Hong Z, et al. Evolving semantic prototype improves generative zero-shot learning[C]//International Conference on Machine Learning. PMLR, 2023: 4611-4622.
Chen S, Wang W, Xia B, et al. Free: Feature refinement for generalized zero-shot learning[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 122-131.
Chen Z, Huang Y, Chen J, et al. Duet: Cross-modal semantic grounding for contrastive zero-shot learning[C]//Proceedings of the AAAI conference on artificial intelligence. 2023, 37(1): 405-413.7.
Du Y, Shi M, Wei F, et al. Boosting zero-shot learning via contrastive optimization of attribute representations[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023.
Feng Y, Huang X, Yang P, et al. Non-generative generalized zero-shot learning via task-correlated disentanglement and controllable samples synthesis[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 9346-9355.
Gao Y, Tang C, Lv J. Cluster-based Contrastive Disentangling for Generalized Zero-Shot Learning[J]. arXiv preprint arXiv:2203.02648, 2022.
Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of wasserstein gans[J]. Advances in neural information processing systems, 2017, 30(04): 5769-5579.
Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[J]. Advances in neural information processing systems, 2014, 27(04): 2672-2680.
Han Z, Fu Z, Chen S, et al. Semantic contrastive embedding for generalized zero-shot learning[J]. International Journal of Computer Vision, 2022, 130(11): 2606-2622.
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.
Lampert C H, Nickisch H, Harmeling S. Learning to detect unseen object classes by between-class attribute transfer[C]//2009 IEEE conference on computer vision and pattern recognition. IEEE, 2009: 951-958.
Larochelle H, Erhan D, Bengio Y. Zero-data learning of new tasks[C].AAAI. 2008, 1(2): 3.
Li B, Drozd A, Guo Y, et al. Scaling word2vec on big corpus[J]. Data science and engineering, 2019, 4(02): 157-175.
Li Q, Zhan Z, Shen Y, et al. Co-GZSL: Feature Contrastive Optimization for Generalized Zero-Shot Learning[J]. Neural Processing Letters, 2024, 56(2): 99.
Paul R, Vora S, Li B. Instance Adaptive Prototypical Contrastive Embedding for Generalized Zero Shot Learning[J]. arxiv preprint arxiv:2309.06987, 2023.
Patterson G, Hays J. Sun attribute database: Discovering, annotating, and recognizing scene attributes[C]//2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012: 2751-2758.
Schonfeld E, Ebrahimi S, Sinha S, et al. Generalized zero-and few-shot learning via aligned variational autoencoders[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 8247-8255.
T. Chen, S. Kornblith, M. Norouzi, and G. Hinton, “A simple framework for contrastive learning of visual representations,” in ICML, 2020. 37(04): 1597-1607.
Wang C, Min S, Chen X, et al. Dual progressive prototype network for generalized zero-shot learning[J]. Advances in Neural Information Processing Systems, 2021, 34: 2936-2948
Wang Y, Hong M, Huangfu L, et al. Data Distribution Distilled Generative Model for Generalized Zero-Shot Recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(6): 5695-5703.
Wang Z, Liang J, He R, et al. Improving zero-shot generalization for clip with synthesized prompts[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 3032-3042.
Wah C, Branson S, Welinder P, et al. The caltech-ucsd birds-200-2011 dataset[J]. 2011.
Wei Xiushen, Xu Yuyan, Yang Jian. Review of webly-supervised fine-grained image recognition[J]. Journal of Image and Graphics. 2022,27(07):2057-2077.
魏秀参,许玉燕,杨健.网络监督数据下的细粒度图像识别综述[J].中国图象图形学报,2022,27(07):2057-2077.
Xian Y, Sharma S, Schiele B, et al. f-vaegan-d2: A feature generating framework for any-shot earning[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 10275-10284.
XIAN Y, LAMPERT C H, SCHIELE B, et al. Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly. IEEE transactions on pattern analysis and machine intelligence, 2018, 41(9): 2251–2265.
Yang G, Ye Z, Zhang R, et al. A comprehensive survey of zero-shot image classification: methods, implementation, and fair evaluation[J]. Applied Computing and Intelligence, 2022, 2(1): 1-31.
Zhang C, ** M, Yu Q, et al. Metric Learning for Projections Bias of Generalized Zero-shot Learning[J]. arxiv preprint arxiv:2309.01390, 2023.
Zhai Z, Li X, Chang Z. Center-VAE with discriminative and semantic-relevant fine-tuning features for generalized zero-shot learning[J]. Signal Processing: Image Communication, 2023, 111: 116897.
Zhang Guimei, Long Bangyao, Lu Feifei. Zero-shot Text Recognition Combining Migration Bootstrapping and Bidirectional Recurrent Structure GANs [J]. Pattern Recognition and Artificial Intelligence. 2020,33(12):1083-1096.
张桂梅,龙邦耀,鲁飞飞.结合迁移引导和双向循环结构GAN的零样本文本识别[J].模式识别与人工智能, 2020,33(12):1083-1096.
相关文章
相关作者
相关机构