双核压缩激活神经网络艺术图像分类
Art image classification with double kernel squeeze- and-excitation neural network
- 2020年25卷第5期 页码:967-976
收稿:2019-06-03,
修回:2019-10-16,
录用:2019-10-23,
纸质出版:2020-05-16
DOI: 10.11834/jig.190245
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-06-03,
修回:2019-10-16,
录用:2019-10-23,
纸质出版:2020-05-16
移动端阅览
目的
2
为了充分提取版画、中国画、油画、水彩画和水粉画等艺术图像的整体风格和局部细节特征,实现计算机自动分类检索艺术图像的需求,提出通过双核压缩激活模块(double kernel squeeze-and-excitation,DKSE)和深度可分离卷积搭建卷积神经网络对艺术图像进行分类。
方法
2
根据SKNet(selective kernel networks)自适应调节感受野提取图像整体与细节特征的结构特点和SENet(squeeze-and-excitation networks)增强通道特征的特点构建DKSE模块,利用DKSE模块分支上的卷积核提取输入图像的整体特征与局部细节特征;将分支上的特征图进行特征融合,并对融合后的特征图进行特征压缩和激活处理;将处理后的特征加权映射到不同分支的特征图上并进行特征融合;通过DKSE模块与深度可分离卷积搭建卷积神经网络对艺术图像进行分类。
结果
2
使用本文网络模型对有无数据增强(5类艺术图像数据增强后共25 634幅)处理的数据分类,数据增强后的分类准确率比未增强处理的准确率高9.21%。将本文方法与其他网络模型和传统分类方法相比,本文方法的分类准确率达到86.55%,比传统分类方法高26.35%。当DKSE模块分支上的卷积核为1×1和5×5,且放在本文网络模型第3个深度可分离卷积后,分类准确率达到87.58%。
结论
2
DKSE模块可以有效提高模型分类性能,充分提取艺术图像的整体与局部细节特征,比传统网络模型具有更好的分类准确率。
Objective
2
The development of online digital media technology has promoted the sharing and spreading of natural art images. However
given the increasing number of art images
effective classification and retrieval are urgent problems that need to be solved. In the face of massive art image data
problems may occur in traditional manual feature extraction methods
such as tagging errors and subjective tagging. Moreover
the professional requirements of classifiers are relatively high. Convolutional neural networks (CNNs) are widely used in image classification because of its automatic feature extraction characteristics. Most of these network models are used for feature extraction in key areas of photographed images. However
natural art images are different from photographed images. Specifically
the distribution of overall style features and local detail features is evidently uniform. Selective kernel networks (SKNet) can adaptively adjust their receptive field size according to the input image to select multi-scale spatial information. However
the softmax gating mechanism in the module only strengthens the dependence between the channels of the feature map after the convolution operation of the receptive field with large response to stimulus. It also ignores the role of local detail features. Squeeze-and-excitation networks (SENet) can enhance the features in different channels but cannot extract the overall features and local detail features of the input. To fully extract and enhance the overall style features and local detail features of art images and realize the automatic classification and retrieval of art images
we combine the characteristics of SKNet and SENet to build a block called double kernel squeeze-and-excitation (DKSE) module. DKSE blocks and depthwise separable convolutions are mainly used to construct a CNN to classify art images.
Method
2
SKNet can capture the overall features and local detail features with different scales. According to the multi-scale structural characteristics of SKNet
we build the DKSE module with two branches. Each branch has a different convolutional kernel to extract the overall features and the local detail features and fuse the feature maps obtained by convolution operation. Then
according to the idea of compression and excitation in SENet
the fusion feature map spatial information is compressed into the channel descriptor by global average pooling (GAP). After the GAP operation
1×1 convolutional kernel is used to compress and activate the feature map. The weight of normalization between (0
1) is obtained through sigmoid gating mechanism. The weight is rescaled to the feature map of the different branches. The final output of the block is obtained by fusing the rescaled feature maps. Thus
more representative art image characteristics are extracted. In this study
we choose engraving
Chinese painting
oil painting
opaque watercolor painting and watercolor painting for classification. To enhance the data of artistic images
the images with high resolution of art images are artificially extracted and cut randomly into 299×299 pixels. The modules with rich style information are then selected. After the data augmentation
a total of 25 634 images under the five kinds of art images are obtained. The CNN is constructed by multiple DKSE modules and depthwise separable convolutions to classify the five kinds of art images. In all the experiments
80% of the art images under each kind are randomly selected as the training sets
while the remaining 20% of the datasets are used as verification sets. Our CNN is implemented in a Keras frame. The input images are resized to 299×299 pixels for training. The Adam optimizer is used in the experiments. The initial value of learning rate is 0.001
a mini-batch size of 32 is observed
and the total of training epochs is 120. In the training process
the training sets rotate randomly from 0° to 20°
and the horizontal or vertical direction is randomly shifted between 0% and 10% and flipped randomly to enhance the generalization ability of the proposed CNN. The learning rate is decreased by a factor of 10 if the accuracy of training sets does not improve after three training cycles.
Result
2
Our network model is used to classify the data with or without data enhancement processing. The accuracy of art image classification after data augmentation is 9.21% higher than that of unenhanced processing. Compared with other network models and traditional art image classification methods
the classification accuracy of our method is 86.55%
more than 26.35% higher than that of traditional art image classification methods. Compared with Inception-V4 networks
the number of parameters is approximately 33% of the number of Inception-V4 parameters
and the time spent is approximately 25%. In this study
we place the proposed DKSE module in three different positions of the network and then verify the influence of DKSE on the classification results. When the module is placed at the third depthwise separable convolution of the network model
the reduction ratio is set to 4 and the convolution kernel sizes on the branches are 1×1 and 5×5. Moreover
the classification accuracy is 87.58%
which is 1.58% higher than that of the other eight state-of-the-art network models. The classification accuracy of the reduction ratio of 4 is superior to the reduction ratio set to 16. We use gradient-weighted class activation mapping (Grad-CAM) algorithm with our network model
ours + SK model and ours + SE model
to visualize the overall features and local detail features of each kind of art images. Experimental results show that compared with the other two network models
our network model can fully extract the overall features and local detail features of art images.
Conclusion
2
Experimental results show that the proposed DKSE module can effectively improve the classification performance of the network model and fully extract the overall features and local detail features of the art images. The network model in this study has better classification accuracy than do the other CNN models.
Chollet F. 2017. Xception: deep learning with depthwise separable convolutions//Proceedings of 2017 IEEEConference on Computer Vision and Pattern Recognition. Honolulu: IEEE: 1800-1807[ DOI: 10.1109/CVPR.2017.195 http://dx.doi.org/10.1109/CVPR.2017.195 ]
Gao F, Nie J, Huang L, Duan L Y and Li X M. 2017. Traditional Chinese painting classification based on painting techniques. Chinese Journal of Computers, 40(12):2871-2882
高峰, 聂婕, 黄磊, 段凌宇, 李晓明. 2017.基于表现手法的国画分类方法研究.计算机学报, 40(12):2871-2882)[DOI:10.11897/SP.J.1016.2017.02871]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Howard A G, Zhu M L, Chen B and Kalenichenko D. 2017. MobileNets: efficient convolutional neural networks for mobile vision applications[EB/OL ] . (2017-04-17)[2019-05-25 ] . https://arxiv.org/pdf/1704.04861.pdf https://arxiv.org/pdf/1704.04861.pdf
Hu J, Shen L, Albanie S, Sun G and Wu E H. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 7132-7141
Ioffe S and Szegedy C. 2015. Batch normalization: accelerating deep network training by reducing internal covariate shift[EB/OL ] . (2015-03-02)[2019-05-25 ] . https://arxiv.org/pdf/1502.03167.pdf https://arxiv.org/pdf/1502.03167.pdf
Jiang S Q, Huang Q M, Ye Q X and Gao W. 2006. An effective method to detect and categorize digitized traditional Chinese paintings. Pattern Recognition Letters, 27(7):734-746[DOI:10.1016/j.patrec.2005.10.017]
Kingma D P and Ba J L. 2017. Adam: a method for stochastic optimization[EB/OL ] . (2017-01-30)[2019-05-25 ] . https://arxiv.org/pdf/1412.6980.pdf https://arxiv.org/pdf/1412.6980.pdf
Krizhevsky A, Sutskever I and Hinton G E. 2012. ImageNet classification with deep convolutional neural networks//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada: ACM: 1097-1105
Li J and Wang J Z. 2004. Studying digital imagery of ancient paintings by mixtures of stochastic models. IEEE Transactions on Image Processing, 13(3):340-353[DOI:10.1109/TIP.2003.821349]
Li X, Wang W H, Hu X L and Yang J. 2019. Selective kernel networks[EB/OL ] . (2019-03-18)[2019-05-25 ] . https://arxiv.org/pdf/1903.06586.pdf https://arxiv.org/pdf/1903.06586.pdf
Nair V and Hinton G E. 2010. Rectified linear units improve restricted Boltzmann machines//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel: Omnipress: 807-814
Qi Y Q. 2009. The creation and appreciation of Chinese painting——Xie He's "The six laws". Science and Technology Information, (14):231
亓玉权. 2009.中国画的创作和鉴赏——谢赫的"六法论".科技资讯, (14):231)[DOI:10.3969/j.issn.1672-3791.2009.14.192]
Sandler M, Howard A, Zhu M L, Zhmoginov A and Chen L C. 2018. MobileNetV2: inverted residuals and linear bottlenecks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 4510-4520[ DOI: 10.1109/CVPR.2018.00474 http://dx.doi.org/10.1109/CVPR.2018.00474 ]
Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D and Batra D. 2017. Grad-CAM: visual explanations from deep networks via gradient-based localization//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice: IEEE: 618-626[ DOI: 10.1109/ICCV.2017.74 http://dx.doi.org/10.1109/ICCV.2017.74 ]
Shen J L. 2009. Stochastic modelingwestern paintings for effective classification. Pattern Recognition, 42(2):293-301[DOI:10.1016/j.patcog.2008.04.016]
Sheng J C and Jiang J M. 2013. Style-based classification of Chinese ink and wash paintings. Optical Engineering, 52(9):#093101[DOI:10.1117/1.oe.52.9.093101]
Sun M J, Zhang D, Wang Z, Ren J C and Jin J S. 2016. Monte Carlo convex hull model for classification of traditional Chinese paintings. Neurocomputing, 171:788-797[DOI:10.1016/j.neucom.2015.08.013]
Szegedy C, Ioffe S, Vanhoucke V and Alemi A A. 2017. Inception-v4, inception-ResNet and the impact of residual connections on learning//Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI-17). San Francisco: AIAA: 4278-4284
Wang Z, Sun M J, Han Y H and Zhang D. 2013. Supervised heterogeneous sparse feature selection for Chinese paintings classification. Journal of Computer-Aided Design and Computer Graphics, 25(12):1848-1855
王征, 孙美君, 韩亚洪, 张冬. 2013.监督式异构稀疏特征选择的国画分类和预测.计算机辅助设计与图形学学报, 25(12):1848-1855
Yu F and Koltun V. 2016. Multi-scale context aggregation by dilated convolutions[EB/OL ] . (2016-04-30)[2019-05-25 ] . https://arxiv.org/pdf/1511.07122.pdf https://arxiv.org/pdf/1511.07122.pdf
Zhang X Y, Zhou X Y, Lin M X and Sun J. 2018. ShuffleNet: an extremely efficient convolutional neural network for mobile devices//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE: 6848-6856[ DOI: 10.1109/CVPR.2018.00716 http://dx.doi.org/10.1109/CVPR.2018.00716 ]
相关作者
相关机构
京公网安备11010802024621