融合判别式深度特征学习的图像识别算法

黄旭; 凌志刚; 李绣心

doi:10.11834/jig.170336

图像理解和计算机视觉 | 浏览量 : 0 下载量: 36 CSCD: 3

PDF
导出
分享
收藏
专辑

融合判别式深度特征学习的图像识别算法
Discriminative deep feature learning method by fusing linear discriminant analysis for image recognition
2018年23卷第4期页码：510-518
收稿：2017-07-07，

修回：2017-11-14，

纸质出版：2018-04-16
DOI： 10.11834/jig.170336
稿件说明：

移动端阅览

黄旭, 凌志刚, 李绣心. 融合判别式深度特征学习的图像识别算法[J]. 中国图象图形学报, 2018,23(4):510-518. DOI： 10.11834/jig.170336.

Xu Huang, Zhigang Ling, Xiuxin Li. Discriminative deep feature learning method by fusing linear discriminant analysis for image recognition[J]. Journal of Image and Graphics, 2018, 23(4): 510-518. DOI： 10.11834/jig.170336.

摘要

目的

卷积神经网络在图像识别算法中得到了广泛应用。针对传统卷积神经网络学习到的特征缺少更有效的鉴别能力而导致图像识别性能不佳等问题，提出一种融合线性判别式思想的损失函数LDloss（linear discriminant loss）并用于图像识别中的深度特征提取，以提高特征的鉴别能力，进而改善图像识别性能。

方法

首先利用卷积神经网络搭建特征提取所需的深度网络，然后在考虑样本分类误差最小化的基础上，对于图像多分类问题，引入LDA（linear discriminant analysis）思想构建新的损失函数参与卷积神经网络的训练，来最小化类内特征距离和最大化类间特征距离，以提高特征的鉴别能力，从而进一步提高图像识别性能，分析表明，本文算法可以获得更有助于样本分类的特征。其中，学习过程中采用均值分批迭代更新的策略实现样本均值平稳更新。

结果

该算法在MNIST数据集和CK+数据库上分别取得了99.53%和94.73%的平均识别率，与现有算法相比较有一定的提升。同时，与传统的损失函数Softmax loss和Hinge loss对比，采用LDloss的深度网络在MNIST数据集上分别提升了0.2%和0.3%，在CK+数据库上分别提升了9.21%和24.28%。

结论

本文提出一种新的融合判别式深度特征学习算法，该算法能有效地提高深度网络的可鉴别能力，从而提高图像识别精度，并且在测试阶段，与Softmax loss相比也不需要额外的计算量。

Abstract

Objective

Feature extraction plays an important role in image recognition

including face recognition and character recognition. However

feature extraction usually depends on domain knowledge or prior experiences. Convolutional neural networks (CNNs) are attracting the attention of researchers because they can automatically extract efficient features and simultaneously achieve image recognition via their self-learning capability. However

the learned features via classical CNNs may often have a poor discriminant capability for image recognition. This study introduces a linear discriminant analysis loss (LDloss) into CNNs and develops a discriminative deep feature learning method by fusing linear discriminant analysis (LDA) for image recognition. Therefore

CNNs can provide discriminant features to improve image recognition performance.

Method

A deep CNN for feature extraction is constructed to automatically extract efficient features for image classification tasks via a multilayer perceptron. LDA is introduced

and a new linear discriminant loss function (LDloss) is developed via a variant form of Fisher criterion. The new LDloss and softmax loss are integrated into a united loss function for deep feature learning. The learned features can minimize classification error and simultaneously achieve inter-class dispension and intra-class compactness. An average strategy based on mini-batch is used to update class centers in the learning process.

Result

Experimental results on MNIST and CK+ databases show that the proposed algorithm achieves 99.53% and 94.73% average recognition rates

respectively. Unlike softmax and hinge losses

LDloss achieves 0.2% and 0.3% increases on the MNIST database and 9.21% and 24.28% increases on the CK+ database under the same network structure

respectively. The proposed method can achieve 100% recognition rate for some classification in the MNIST and CK+ databases.

Conclusion

This study proposes a new discriminant deep feature learning algorithm for image recognition. The experimental results on different databases show that this proposed method can explicitly achieve intra-class compactness and inter-class separability among learned features and further efficiently improve the discriminative capability of the learned features. Therefore

the proposed method can achieve a higher accuracy recognition rate than some existing methods can. The proposed method also does not need additional computation load compared with softmax loss during the testing stage.

关键词

Keywords

references

Lowe D G. Distincitive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2):91-110.

Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 2005: 886-893. [ DOI:10.1109/CVPR.2005.177 http://dx.doi.org/10.1109/CVPR.2005.177 ]

LeCun Y, Boser B, Denker J S, et al. Backpropagation applied to handwritten zip code recognition[J]. Neural Computation, 1989, 1(4):541-551.[DOI:10.1162/neco.1989.1.4.541]

Li Y Y, Xu Y L, Ma S P, et al. Saliency detection based on deep convolutional neural network[J]. Journal of Image and Graphics, 2016, 21(1):53-59.

李岳云, 许悦雷, 马时平, 等.深度卷积神经网络的显著性检测[J].中国图象图形学报, 2016, 21(1):53-59. [DOI:10.11834/jig.20160107]

Amodei D, Ananthanarayanan S, Anubhai R, et al. Deep speech 2: end-to-end speech recognition in English and mandarin[C]//Proceedings of the 33rd International Conference on Machine Learning. New York, NY, USA: ICML, 2016: 173-182. http://www.oalib.com/paper/4015889 .

Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 1725-1732. [ DOI:10.1109/CVPR.2014.223 http://dx.doi.org/10.1109/CVPR.2014.223 ]

Lecun Y, Bottou L, Bengio Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.[DOI:10.1109/5.726791]

Chan T H, Jia K, Gao S H, et al. PCANet:a simple deep learning baseline for image classification?[J]. IEEE Transactions on Image Processing, 2015, 24(12):5017-5032.[DOI:10.1109/TIP.2015.2475625]

Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014: 580-587. [ DOI:10.1109/CVPR.2014.81 http://dx.doi.org/10.1109/CVPR.2014.81 ]

Ling Z G, Fan G L, Wang Y N, et al. Learning deep transmission network for single image dehazing[C]//Proceedings of 2016 IEEE International Conference on Image Processing. Phoenix, AZ, USA: IEEE, 2016: 2296-2300. [ DOI:10.1109/ICIP.2016.7532768 http://dx.doi.org/10.1109/ICIP.2016.7532768 ]

Wahba G. Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV[C]//Advances in Kernel Methods. Cambridge, MA, USA: MIT Press, 1999: 69-88. http://dl.acm.org/citation.cfm?id=299099 .

Hadsell R, Chopra S, LeCun Y. Dimensionality Reduction by Learning an Invariant Mapping[C]//Proceedings of 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. New York, USA: IEEE, 2006: 1735-1742. [ DOI:10.1109/CVPR.2006.100 http://dx.doi.org/10.1109/CVPR.2006.100 ]

Sun Y, Chen Y H, Wang X G, et al. Deep learning face representation by joint identification-verification[C]//Proceedings of 2014 Advances in Neural Information Processing Systems. [S. l. ]: NIPS, 2014: 1988-1996. http://www.oalib.com/paper/4082134 .

Liu W Y, Wen Y D, Yu Z D, et al. Large-margin softmax loss for convolutional neural networks[C]//Proceedings of the 33rd International Conference on Machine Learning. New York, NY, USA: ICML, 2016: 507-516. http://www.researchgate.net/publication/311492579_Large-Margin_Softmax_Loss_for_Convolutional_Neural_Networks?ev=prf_high .

Wen Y D, Zhang K P, Li Z F, et al. A discriminative feature learning approach for deep face recognition[C]//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: Springer, 2016: 499-515. [ DOI:10.1007/978-3-319-46478-7_31 http://dx.doi.org/10.1007/978-3-319-46478-7_31 ]

Schroff F, Kalenichenko D, Philbin J. FaceNet: a unified embedding for face recognition and clustering[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 815-823. [ DOI:10.1109/CVPR.2015.7298682 http://dx.doi.org/10.1109/CVPR.2015.7298682 ]

Huang P, Tang Z M. Minimum-distance discriminant projection and its application to face recognition[J]. Journal of Image and Graphics, 2013, 18(2):201-206.

黄璞, 唐振民.最小距离鉴别投影及其在人脸识别中的应用[J].中国图象图形学报, 2013, 18(2):201-206. [DOI:10.11834/jig.20130211]

Guo Y F, Li S J, Yang J Y, et al. A generalized Foley-Sammon transform based on generalized Fisher discriminant criterion and its application to face recognition[J]. Pattern Recognition Letters, 2003, 24(1-3):147-158.[DOI:10.1016/S0167-8655(02)00207-6]

Yang M, Zhang L, Feng X C, et al. Sparse representation based fisher discrimination dictionary learning for image classification[J]. International Journal of Computer Vision, 2014, 109(3):209-232.[DOI:10.1007/s11263-014-0722-8]

Wang H, Yan S C, Xu D, et al. Trace ratio vs. ratio trace for dimensionality reduction[C]//Proceedings of 2007 IEEE Conference on Computer Vision and Pattern Recognition. Minneapolis, MN, USA: IEEE, 2007: 1-8. [ DOI:10.1109/CVPR.2007.382983 http://dx.doi.org/10.1109/CVPR.2007.382983 ]

Lucey P, Cohn J F, Kanade T, et al. The extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and emotion-specified expression[C]//Proceedings of 2010 Computer Society Conference on Computer Vision and Pattern Recognition Workshops. San Francisco, CA, USA: IEEE, 2010: 94-101. [ DOI:10.1109/CVPRW.2010.5543262 http://dx.doi.org/10.1109/CVPRW.2010.5543262 ]

Vedaldi A, Lenc K. MatConvNet: convolutional neural networks for MATLAB[C]//Proceedings of the 23rd ACM International Conference on Multimedia. Brisbane, Australia: ACM, 2015: 689-692. [ DOI:10.1145/2733373.2807412 http://dx.doi.org/10.1145/2733373.2807412 ]

Jarrett K, Kavukcuoglu K, Ranzato M, et al. What is the best multi-stage architecture for object recognition?[C]//Proceedings of the IEEE 12th International Conference on Computer Vision. Kyoto, Japan: IEEE, 2009: 2146-2153. [ DOI:10.1109/ICCV.2009.5459469 http://dx.doi.org/10.1109/ICCV.2009.5459469 ]

Wan L, Zeiler M, Zhang S X, et al. Regularization of neural networks using dropconnect[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta, USA: ICML, 2013: 1058-1066. http://www.researchgate.net/publication/291735245_Regularization_of_neural_networks_using_dropconnect .

Romero A, Ballas N, Ebrahimi Kahou S, et al. Fitnets: hints for thin deep nets[J]. arXiv preprint arXiv: 1412. 6550, 2014. http://www.oalib.com/paper/4068134 .

Lin M, Chen Q, Yan S C. Network in network[J]. arXiv preprint arXiv: 1312. 4400, 2013.

Bartlett M S, Littlewort G, Frank M, et al. Recognizing facial expression: machine learning and application to spontaneous behavior[C]//Proceedings of 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, CA, USA: IEEE, 2005: 568-573. [ DOI:10.1109/CVPR.2005.297 http://dx.doi.org/10.1109/CVPR.2005.297 ]

Zhong L, Liu Q S, Yang P, et al. Learning active facial patches for expression analysis[C]//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, RI, USA: IEEE, 2012: 2562-2569. [ DOI:10.1109/CVPR.2012.6247974 http://dx.doi.org/10.1109/CVPR.2012.6247974 ]

Lv Y D, Feng Z Y, Xu C. Facial expression recognition via deep learning[C]//Proceedings of 2014 International Conference on Smart Computing. Hong Kong, China: IEEE, 2014: 303-308. [ DOI:10.1109/SMARTCOMP.2014.7043872 http://dx.doi.org/10.1109/SMARTCOMP.2014.7043872 ]

Liu M Y, Li S X, Shan S G, et al. AU-inspired deep networks for facial expression feature learning[J]. Neurocomputing, 2015, 159:126-136.[DOI:10.1016/j.neucom.2015.02.011]