Current Issue Cover


摘 要
目的 显著性目标检测算法主要分为基于低级特征的传统方法和基于深度学习的新方法。但传统方法难以捕获对象的高级语义信息,而基于深度学习的新方法能捕获高级语义信息却忽略了边缘特征。为了充分发挥两种方法各自的优势,基于传统方法和深度学习的新方法相结合的思路,本文结合稀疏能使得显著性对象指向性凝聚的优势,提出了一种基于稀疏自编码和显著性结果优化的方法。方法 对VGG网络第四个Pooling层的特征图进行稀疏自编码处理,得到五张稀疏显著性特征图,再将其与传统方法得到的显著图一起输入卷积神经网络进行显著性结果优化。结果 在公开数据集上进行的实验表明,本文算法有效提高了显著性对象F值和MAE值,在对显著性对象检测要求越来越准确的对象识别等任务中有较好的适应性和应用性前景。且模型结构简单,参数少,计算效率高。结论 本文提出了一种显著性结果优化算法,实验结果表明算法有效提高了显著性对象F值和MAE值,在对显著性对象检测要求越来越准确的对象识别等任务中有较好的适应性和应用性前景。
Significant object detection based on sparse and image fusion

Shizhan Hong()

Objective As a preprocessing part of computer vision, saliency detection has received more and more attention in object relocation, scene classification, semantic segmentation and visual tracking. Although significant object detection has been greatly developed, significant detection is still a very challenging subject due to a series of realistic factors such as background complexity and attention mechanism. In the past, many significant target detection methods have been developed. Mainly divided into traditional methods and new methods based on deep learning. The traditional approach is to find significant targets through low-level manual features such as contrast, color, and texture. These general techniques have proven to be very effective in maintaining image structure and reducing computational effort. However, these low-level features make it difficult to capture high-level semantic knowledge about the object and its surroundings. Therefore, these low-level feature-based methods do not achieve very good results when the salient object is stripped from the stacked background. The saliency detection method based on deep learning mainly seeks significant targets by automatically extracting advanced features. However, these most advanced models focus on the nonlinear combination of advanced features extracted from the final convolutional layer. Due to the lack of low-level visual information such as edges, the boundaries of these salient objects are often very blurry. And in these jobs, CNN features are applied directly to the model without any processing. The features extracted from CNN are generally high in dimension and contain more noise, which reduces the utilization efficiency of CNN features and may have the opposite effect. Sparse can effectively aggregate the salient objects in the feature map and eliminate some of the noise interference. Sparse self-encoding is one of the sparse methods. In order to solve the above problems, a traditional saliency recognition method based on sparse self-encoding and image fusion, combined with background prior and contrast analysis and VGG saliency calculation method are proposed. Method The algorithm proposed in this paper is mainly composed of four parts: traditional method saliency map extraction, VGG feature extraction, sparse self-encoding and saliency result optimization. First select the traditional method to be improved and calculate the corresponding saliency map. In this experiment, we selected four traditional methods with excellent results, namely DRFI, HDCT, RRW, CGVS. Then use the VGG network to extract the feature map. The feature maps obtained by each pooled layer are sparsely self-encoded to obtain twenty-five sparse saliency feature maps. When the feature map is selected, because the features extracted by the first three pooling layers are mainly low-level features, too much edge information and texture information are retained, which has duplicate effects with the feature map obtained by the conventional method, so it is not used. Comparing the 4th and 5th feature maps, it is found that the feature information of the 5th pooling layer is too much lost. After experimental verification, the 5th layer characteristic map has interference effect. So we use the feature map extracted from the 4th pooling layer. Next , put these feature maps into the sparse self-encoder to perform the sparse operation to obtain five feature maps, and put the feature map together with the corresponding saliency map obtained in the previous volume. Finally ,the neural network performs the operation and calculates the final saliency map. Result Our experiments selected four open datasets:DUT-OMRON, ECSSD, HKU-IS, and MSRA. Then we take half of the images from the four data sets used in the experiment to form a training set, and the remaining four test sets. The results obtained in this way can be extremely credible. After the experiment, we can get the following conclusions: (1) The model proposed in this paper greatly improves the F value in the four data sets of the four methods, such as an increase of 24.53% in the HKU-IS data set of the DRFI method. (2) The MAE value has also been greatly reduced, the least of which is reduced by 12.78% on the ECSSD dataset of the CGVS method and the most reduced by nearly 50%. (3) The model network proposed in this paper has few layers, few parameters, short calculation time, training time is about 2 hours, and the average test time of the image is about 0.2 seconds. In contrast, [11] is an image saliency optimization scheme using adaptive fusion. The training time is about 47 hours, and the average test time of the image is 56.95 seconds. The model proposed in this paper greatly improves the computational efficiency. (4) The model proposed in this paper has a significant improvement on the four data sets, especially in the HKU-IS and MSRA data sets. These data sets contain more difficult images, which proves more effectively in this paper. Conclusions In this paper, a low-level feature map based on traditional models such as texture and a high-level feature map of sparsely self-encoded VGG network are proposed to optimize the saliency results, which can greatly improve the saliency target recognition. The traditional methods based on DRFI, HDCT, RRW, CGVS are respectively tested in the publicly significant object detection data sets DUT-OMRON, ECSSD, HKU-IS and MSRA. Seeing the obtained f value and MAE value have been significantly improved, which proves the effectiveness of the proposed method. Moreover, the method steps adopted in this paper are simple, the network structure is simple and easy to understand, the training takes less time, the difficulty is lower, and it is easy to get popular promotion. The shortcoming of the paper is that some of the extracted feature maps are too missing. In practice, only the fourth layer of VGG maps are selected, and all useful information is not fully utilized.