端到端双通道特征重标定DenseNet图像分类
Image classification method based on end-to-end dual feature reweight DenseNet
- 2020年25卷第3期 页码:486-497
收稿日期:2019-06-18,
修回日期:2019-07-23,
录用日期:2019-7-30,
纸质出版日期:2020-03-16
DOI: 10.11834/jig.190290
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2019-06-18,
修回日期:2019-07-23,
录用日期:2019-7-30,
纸质出版日期:2020-03-16
移动端阅览
目的
2
针对密集连接卷积神经网络(DenseNet)没有充分考虑通道特征相关性以及层间特征相关性的缺点,本文结合软注意力机制提出了端到端双通道特征重标定密集连接卷积神经网络。
方法
2
提出的网络同时实现了DenseNet网络的通道特征重标定与层间特征重标定。给出了DenseNet网络通道特征重标定与层间特征重标定方法;构建了端到端双通道特征重标定密集连接卷积神经网络,该网络每个卷积层的输出特征图经过两个通道分别完成通道特征重标定以及层间特征重标定,再进行两种重标定后特征图的融合。
结果
2
为了验证本文方法在不同图像分类数据集上的有效性和适应性,在图像分类数据集CIFAR-10/100以及人脸年龄数据集MORPH、Adience上进行了实验,提高了图像分类准确率,并分析了模型的参数量、训练及测试时长,验证了本文方法的实用性。与DenseNet网络相比,40层及64层双通道特征重标定密集连接卷积神经网络DFR-DenseNet(dual feature reweight DenseNet),在CIFAR-10数据集上,参数量仅分别增加1.87%、1.23%,错误率分别降低了12%、9.11%,在CIFAR-100数据集上,错误率分别降低了5.56%、5.41%;与121层DFR-DenseNet网络相比,在MORPH数据集上,平均绝对误差(MAE)值降低了7.33%,在Adience数据集上,年龄组估计准确率提高了2%;与多级特征重标定密集连接卷积神经网络MFR-DenseNet(multiple feature reweight DenseNet)相比,DFR-DenseNet网络参数量减少了一半,测试耗时约缩短为MFR-DenseNet的61%。
结论
2
实验结果表明本文端到端双通道特征重标定密集连接卷积神经网络能够增强网络的学习能力,提高图像分类的准确率,并对不同图像分类数据集具有一定的适应性、实用性。
Objective
2
Image classification is one of the important research technologies in computer vision. The development of deep learning and convolutional neural networks (CNNs) has laid the technical foundation for image classification. In recent years
image classification methods based on deep CNN have become an important research topic. DenseNet is one of the widely applied deep CNNs in image classification
encouraging feature reusage and alleviating the vanishing gradient problem. However
this approach has obvious limitations. First
each layer simply combines the feature maps obtained from preceding layers by concatenating operation without considering the interdependencies between different channels. The network representation can be further improved by modeling feature channel correlation and realizing channel feature recalibration. Second
the correlation of the interlayer feature map is not explicitly modeled. Thus
adaptively learning the correlation coefficients by modeling the correlation of feature maps between the layers is important.
Method
2
The conventional DenseNet networks do not adequately consider the channel feature correlation and interlayer feature correlation. To address these limitations
multiple feature reweight DenseNet (MFR-DenseNet) combines channel feature reweight DenseNet (CFR-DenseNet) and inter-layer feature reweight DenseNet (ILFR-DenseNet) by ensemble learning method
thereby improving the representation power of the DenseNet by adaptively recalibrating the channel-wise feature responses and explicitly modeling the interdependencies between the features of different convolutional layers. However
MFR-DenseNet uses two independent parallel networks for image classification
which is not end-to-end training. The CFR-DenseNet and the ILFR-DenseNet models should be trained and saved in training. First
the models and weights are loaded
and the MFR-DenseNet needs multiple save and load. The training process is cumbersome. Second
the parameters and calculations are large
so the training takes a long time. In the test
the final prediction results of the MFR-DenseNet are obtained by taking an average of predictions from the two models. The parameters and test time are almost doubled compared with a single-channel feature reweight or interlayer feature reweight network. Therefore
the MFR-DenseNet has high requirements on the storage space and computing performance of the device in practical applications
thereby limiting its application. To address these limitations of MFR-DenseNet
this paper proposes an end-to-end dual feature reweight DenseNet (DFR-DenseNet) based on the soft attention mechanism. The network implements the channel feature reweight and interlayer feature reweight of DenseNet. First
the channel feature reweight and interlayer feature reweight method are integrated in DenseNet. By introducing a squeeze-and-excitation module (SEM) after each 3×3 convolutional layer
our method solves the problem of exploiting the channel dependencies. Each feature map of each layer in the SEM obtains a weight through a squeeze and excitation operation. The representation of the network can be improved by explicitly modeling the interdependencies between the channels. The output feature map of the convolutional layer is subjected to two squeeze excitation operations. Thus
the weight value of each layer can be obtained to achieve the reweight of the interlayer features. Then
DFR-DenseNet was constructed. The output feature map of each convolution layer completes the channel feature reweight and interlayer feature reweight through two channels. The concat and convolution operations were used to achieve the combination of two types of reweighted feature maps.
Result
2
First
the DFR-DenseNet is compared with the serial fusion method and parallel-addition fusion method on the image classification dataset CIFAR-10
which proves that DFR-DenseNet is the most effective. Second
to demonstrate the advantage of the DFR-DenseNet
we performed different experiments on the image classification dataset CIFAR-10/100. To show the effectiveness of the method on the high-resolution dataset
we conducted the age classification experiment on the face dataset MORPH
and the age group classification comparison experiment was performed on the unconstrained Adience dataset. The image classification accuracy was significantly improved. The 40-layer DFR-DenseNet had a 4.69% error and outperformed the 40-layer DenseNet by 12% on CIFAR-10 with only 1.87% more parameters. The 64-layer DFR-DenseNet resulted in a 4.29% error on CIFAR-10 and outperformed the 64-layer DenseNet by 9.11%. On CIFAR-100
the 40-layer DFR-DenseNet and 64-layer DFR-DenseNet resulted in a 24.29% and 21.86% test error on the test set
and they outperformed the 40-layer DenseNet and 64-layer DenseNet by 5.56% and 5.41%
respectively. Age estimation from a single face image is an essential task in the field of human-computer interaction and computer vision
which has a wide range of practical applications. Age estimation consists of two categories:age classification and age regression. Adience is used for age group classification and obtained 58.79% accuracy. MORPH Album 2 is used for age regression. The 121-layer DFR-DenseNet had a 3.16 mean absolute error and outperformed the 121-layer DenseNet by 7.33% on the MORPH Album 2. Compared with the MFR-DenseNet
the DFR-DenseNet reduced the number of parameters by half. The test time of the DFR-DenseNet network was shortened to approximately 61% in the MFR-DenseNet test.
Conclusion
2
The experimental results show that the end-to-end dual feature reweight DenseNet can enhance the learning ability of the network and improve the accuracy of image classification.
Chen Y P, Li J N, Xiao H X, Jin X J, Yan S C and Feng J S. 2017. Dual path networks//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, CA, USA: Curran Associates Inc.: 4470-4478
Han D, Kim J and Kim J. 2017. Deep pyramidal residual networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 6307-6315[ DOI:10.1109/CVPR.2017.668 http://dx.doi.org/10.1109/CVPR.2017.668 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016a. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. LasVegas, NV, USA: IEEE: 770-778[ DOI:10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016b. Identity mappings in deep residual networks//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer: 630-645[ DOI:10.1007/978-3-319-46493-0_38 http://dx.doi.org/10.1007/978-3-319-46493-0_38 ]
Hu J, Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 7132-7141[ DOI:10.1109/CVPR.2018.00745 http://dx.doi.org/10.1109/CVPR.2018.00745 ]
Huang G, Liu Z, van der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 2261-2269[ DOI:10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]
Huang G, Sun Y, Liu Z, Sedra D and Weinberger K Q. 2016. Deep networks with stochastic depth//Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer: 646-661[ DOI:10.1007/978-3-319-46493-0_39 http://dx.doi.org/10.1007/978-3-319-46493-0_39 ]
Krizhevsky A. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report TR-2009. University of Toronto
Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc.: 1097-1105 https://www.researchgate.net/publication/267960550_ImageNet_Classification_with_Deep_Convolutional_Neural_Networks .
LeCun Y, Bengio Y and Hinton G. 2015. Deep learning. Nature, 521(7553):436-444[DOI:10.1038/nature14539]
Li Y D, Hao Z B and Lei H. 2016. Survey of convolutional neural network. Journal of Computer Applications, 36(9):2508-2515, 2565
李彦冬, 郝宗波, 雷航. 2016.卷积神经网络研究综述.计算机应用, 36(9):2508-2515, 2565[DOI:10.11772/j.issn.1001-9081.2016.09.2508]
Panis G, Lanitis A, Tsapatsoulis N and Cootes T F. 2016. Overview of research on facial ageing using the FG-NET ageing database. IET Biometrics, 5(2):37-46[DOI:10.1049/iet-bmt.2014.0053]
Rothe R, Timofte R and van Gool L. 2015. DEX: deep expectation of apparent age from a single image//Proceedings of 2015 IEEE International Conference on Computer Vision Workshop. Santiago, Chile: IEEE: 252-257[ DOI:10.1109/ICCVW.2015.41 http://dx.doi.org/10.1109/ICCVW.2015.41 ]
Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL].[2019-05-01] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf
Singh S, Hoiem D and Forsyth D. 2016. Swapout: learning an ensemble of deep architectures//Proceedings of the 30th Annual Conference on Neural Information Processing Systems. Barcelona: [s.n.]: 28-36 https://www.researchgate.net/publication/303409493_Swapout_Learning_an_ensemble_of_deep_architectures .
Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston: IEEE: 1-9[ DOI:10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]
Targ S, Almeida D and Lyman K. 2016. Resnet in resnet: generalizing residual architectures[EB/OL].[2019-05-01] . https://arxiv.org/pdf/1603.08029.pdf https://arxiv.org/pdf/1603.08029.pdf
Wang F, Jiang M Q, Qian C, Yang S, Li C, Zhang H G, Wang X G and Tang X O. 2017. Residual attention network for image classification//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 6450-6458[ DOI:10.1109/CVPR.2017.683 http://dx.doi.org/10.1109/CVPR.2017.683 ]
Wang X L, Girshick R, Gupta A and He K M. 2018. Non-local neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 7794-7803[ DOI:10.1109/CVPR.2018.00813 http://dx.doi.org/10.1109/CVPR.2018.00813 ]
Woo S, Park J, Lee J Y and Kweon I S. 2018. CBAM: convolutional block attention module//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 3-19[ DOI:10.1007/978-3-030-01234-2_1 http://dx.doi.org/10.1007/978-3-030-01234-2_1 ]
Xie S N, Girshick R, Dollár P, Tu Z W and He K M. 2017. Aggregated residual transformations for deep neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 5987-5995[ DOI:10.1109/CVPR.2017.634 http://dx.doi.org/10.1109/CVPR.2017.634 ]
Yang Y B, Zhong Z S, Shen T C and Lin Z C. 2018. Convolutional neural networks with alternately updated clique//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 2413-2422[ DOI:10.1109/CVPR.2018.00256 http://dx.doi.org/10.1109/CVPR.2018.00256 ]
Zagoruyko S and Komodakis N. 2016. Wide residual networks//Proceedings of British Machine Vision Conference 2016.[s.l.]: BMVA Press: 87.1-87.12 https://www.researchgate.net/publication/303449354_Wide_Residual_Networks .
Zhang K, Guo L R, Gao C and Zhao Z B. 2019a. Pyramidal RoR for image classification. Cluster Computing, 22(S12):5115-5125[DOI:10.1007/s10586-017-1443-x]
Zhang K, Guo Y R, Wang X S, Yuan J S and Ding Q L. 2019b. Multiple feature reweight densenet for image classification. IEEE Access, 7:9872-9880[DOI:10.1109/ACCESS.2018.2890127]
Zhang K, Sun M, Han T X, Yuan X F, Guo L R and Liu T. 2018. Residual networks of residual networks:multilevel residual networks. IEEE Transactions on Circuits and Systems for Video Technology, 28(6):1303-1314[DOI:10.1109/TCSVT.2017.2654543]
Zhang K, Wang X S, Guo Y R, Su Y K and He Y X. 2019. A survey of deep learning methods for face age estimation. Journal of Image and Graphics, 24(8):1215-1230
张珂, 王新胜, 郭玉荣, 苏昱坤, 何颖宣. 2019.人脸年龄估计的深度学习方法综述.中国图象图形学报, 24(8):1215-1230[DOI:10.11834/jig.180653]
相关作者
相关机构