姜文涛,张驰,张晟翀,刘万军(辽宁工程技术大学;辽宁工程技术大学 研究生学院;光电信息控制和安全技术重点实验室;辽宁工程技术大学 软件学院)
目的：自然场景图像中,特征提取的质量是决定目标检测性能的关键因素。目前,大多数检测算法都是利用卷积神经网络强大的学习能力来获得目标的先验知识,并根据这些知识进行目标检测。卷积神经网络的低层次特征具有高分辨率、低抽象语义、位置信息少的特点,缺乏特征的代表性；高层次的特征具有高识别性、低分辨率的特点,检测小尺度目标能力弱。方法：基于以上问题,本文提出了一种新的目标检测算法,此网络模型采用自顶向下的特征金字塔结构,将两个来自不同信息流的特征图,使用元素相加的方式进行特征融合,然后采用3í3的卷积层,减小融合后特征图的混叠效应。根据以上步骤来构建具有较强语义信息的新特征图,同时保留原有特征图的细节信息。融合后的特征图用于最后的目标分类和定位。结果：在PASCAL VOC 2007和PASCAL VOC 2012数据集上进行实验测试,该模型的mAP(mean Average Precision)为78.9%和76.7%,相对于经典的SSD算法,分别提高了1.4和0.9个百分点；此外,本文的方法在检测小尺度目标时,相较于经典SSD模型检测效果有着明显的提升。结论：本文提出了一种有效的目标检测算法,以自顶向下的方式扩展了语义信息,构造了高强度语义特征图,实现精确目标检测的目的。
Multi-scale feature map fusion algorithm for target detection
Jiang Wen Tao,Zhang Chi,Zhang Sheng Chong,Liu Wan Jun(Graduate School,Liaoning Technical University,Huludao)
Objective In natural scene images, the quality of feature extraction is the key factor to determine the performance of target detection.At present, most detection algorithms use the powerful learning ability of convolutional neural network to obtain the prior knowledge of the target and carry out the target detection according to the knowledge.The low level features of convolutional neural network are characterized by high resolution ratio, low abstract semantics, little position information and lack of representation of features;High-level features are characterized by high identification and low resolution ratio, and the ability to detect small- scale targets is weak. Method Based on the above problems, a new network model is proposed in this paper. This network model adopts a top-down feature pyramid structure to combine two feature graphs from different information flows,The method of element addition is used for feature fusion, and then the convolution layer of 3í3 is adopted to reduce the mixing effect of the feature graph after fusion.According to the above steps, a new feature graph with strong semantic information is constructed, while the details of the original feature graph are retained.The fusion layer feature graph is used for the final target classification and localization. Result Experimental tests were conducted on the data sets of PASCAL VOC 2007 and PASCAL VOC 2012. The mAP(mean Average Precision) of this model was 78.9% and 76.7%, which were improved by 1.4 and 0.9 percentage points compared with the classic SSD model;In addition, the method in this paper has significantly improved the detection effect compared with the classical SSD model when detecting small- scale targets. Conclusion This paper proposes an effective target detection of network model, which extends semantic information from top to bottom, constructs a high-strength semantic feature graph, and realizes the purpose of accurate target detection.