Current Issue Cover
选择性卷积特征融合的花卉图像分类

尹红, 符祥, 曾接贤, 段宾, 陈英(南昌航空大学软件学院, 南昌 330063)

摘 要
目的 针对花卉图像标注样本缺乏、标注成本高、传统基于深度学习的细粒度图像分类方法无法较好地定位花卉目标区域等问题,提出一种基于选择性深度卷积特征融合的无监督花卉图像分类方法。方法 构建基于选择性深度卷积特征融合的花卉图像分类网络。首先运用保持长宽比的尺寸归一化方法对花卉图像进行预处理,使得图像的尺寸相同,且目标不变形、不丢失图像细节信息;之后运用由ImageNet预训练好的深度卷积神经网络VGG-16模型对预处理的花卉图像进行特征学习,根据特征图的响应值分布选取有效的深度卷积特征,并将多层深度卷积特征进行融合;最后运用softmax分类层进行分类。结果 在Oxford 102 Flowers数据集上做了对比实验,将本文方法与传统的基于深度学习模型的花卉图像分类方法进行对比,本文方法的分类准确率达85.55%,较深度学习模型Xception高27.67%。结论 提出了基于选择性卷积特征融合的花卉图像分类方法,该方法采用无监督的方式定位花卉图像中的显著区域,去除了背景和噪声部分对花卉目标的干扰,提高了花卉图像分类的准确率,适用于处理缺乏带标注的样本时的花卉图像分类问题。
关键词
Flower image classification with selective convolutional descriptor aggregation

Yin Hong, Fu Xiang, Zeng Jiexian, Duan Bin, Chen Ying(School of Software, Nanchang Hangkong University, Nanchang 330063, China)

Abstract
Objective Flower image classification is a fine-grained image classification. Its main challenges are large intra-class differences and inter-class similarities. Different types of flowers have high similarities in morphology, color, and other aspects, whereas flowers in the same category have great diversities in color, shape, and others. According to research and analysis, the current methods of flower image classification can be divided into two categories:methods based on handcrafted features and methods based on deep learning. The former usually obtains flower areas by image segmentation methods and then extracts or designs features manually. Finally, the extracted features are combined with a traditional machine learning algorithm to complete classification. These methods rely on the design experience of researchers. By contrast, methods based on deep learning utilize deep networks to learn the features of flowers automatically. Bounding boxes and part annotations are used to define accurate target positions, and then different convolution neural network models are fine-tuned to obtain the targets' features. Given that currently available flower image datasets lack annotation information, such as bounding box and part annotation, these strongly supervised methods are difficult to apply. Furthermore, tagging many flower images' bounding boxes and part annotations incurs high cost. To solve these problems, this study proposes an unsupervised flower image classification method on the basis of selective convolution descriptor aggregation. Method A flower image classification network is constructed on the basis of selective deep convolution descriptor aggregation. The proposed method can be divided into four phases:flower image preprocessing, selection and aggregation of convolution features in the Pool5 layer, selection and aggregation of convolution features in the Relu5-2 layer, and multi-layer feature fusion and classification. In the first phase, flower images are preprocessed with the normalization method that retains the aspect ratio to make the size of all flower images equal; thus, the dimension of each flower feature generated by the deep convolutional neural network is consistent. The input image size is set to 224×224 pixels in this study. In the second phase, the features of the preprocessed flower images are learned by VGG-16, which is the deep convolutional neural network model pre-trained by ImageNet. Then, the saliency region is located according to the high response value in the feature map of the Pool5 layer. However, some background regions also have high response values. The area of the background region with a large response value is smaller than the target area. Thus, the flood filling algorithm is used to calculate the maximum connected region of the saliency region. On the basis of the location information of the saliency region, deep convolution features within the region are selected and aggregated to form a low-dimensional feature of flower images. In the third phase, deep convolution features in the Relu5-2 layer are selected and fused to form another low-dimensional feature of flowers. Multi-layer convolution features have been proven to help the network to learn features and then complete the classification task; thus, the deep convolution features in the Pool5 and Relu5-2 layers are chosen in this study. Similarly, a saliency region map from the Relu5-2 layer is obtained on the basis of the response value. The saliency map from the Relu5-2 layer more accurately locates the flower region relative to the saliency map from the Pool5 layer, in which numerous noise regions and few semantic information exist. Thus, the saliency region map from the Relu5-2 layer is combined with a maximum connected region map from the Pool5 layer to produce a true saliency region map with little noise. Finally, deep convolution features are selected and aggregated to form the low-dimensional feature of flower images from the Relu5-2 layer on the basis of the location information of the true saliency region map. In the final phase, the above two low-dimensional features are aggregated to form the final flower features, which are then entered into the softmax layer for classification. Result To explore the effects of the proposed selective convolution descriptor aggregation method, we perform the following experiment on Oxford 102 Flowers. The preprocessed flower images are entered into the AlexNet, VGG-16, and Xception models, all of which are pre-trained by ImageNet. Experimental results show that the classification accuracy of the proposed method is superior to that of other models. Experiments are also conducted to compare the proposed method and other current flower image classification methods in the literature. Results indicate that the classification accuracy of this method is higher than that of methods based on handcrafted features and other methods based on deep learning. Conclusion A method for classifying flower images using selective convolution descriptor aggregation was proposed. A flower image's features were learned by using the transfer learning technique on the basis of a pre-trained network. Effective deep convolution features were selected according to the response value distribution in the feature map. Then, multi-layer deep convolution features were fused. Finally, the softmax layer is used for classification. The advantages of this method include locating the conspicuous region in the flower image in an unsupervised manner and selecting deep convolution features in the located region to exclude other invalid parts, such as background and noise parts. Therefore, the accuracy of flower image classification can be improved by reducing the disturbing information from invalid parts.
Keywords

订阅号|日报