互注意力机制驱动的轻量级图像语义分割网络

栗风永; 叶彬; 秦川

发布时间： 2023-07-19
摘要点击次数： 966
全文下载次数： 584
DOI: 10.11834/jig.211127
2023 | Volume 28 | Number 7

互注意力机制驱动的轻量级图像语义分割网络

栗风永¹, 叶彬¹, 秦川²(1.上海电力大学计算机科学与技术学院, 上海 201306;2.上海理工大学光电信息与计算机工程学院, 上海 200093)

摘要

目的在图像语义分割中，细节特征和语义特征的融合是该领域的一个难点。一些在特定网络架构下设计的专用融合模块缺乏可扩展性和普适性，自注意力虽然可以实现全局的信息捕获，但不能实现不同特征的融合，其他的注意力机制在进行掩码计算时缺少可解释性。本文根据特征图之间的关联度进行建模，提出一种互注意力机制驱动的分割模块。方法该模块获取不同阶段的细节特征图和语义特征图，建立细节特征图上任一点和语义特征图之间的关联模型，并在关联模型的指导下对语义特征图上的特征进行聚合，作为细节特征图上该特征点的补充，从而将语义特征图上的信息融合到细节特征图上，并进一步采用相同的操作将细节特征图上的信息融合到语义特征图上，实现来自不同阶段特征图的相互融合。结果选取5个语义分割模型进行实验，实验结果表明，在使用替换方式对BiSeNet V2 （bilateral segmentation network）进行修改之后，浮点运算量、内存占用量和模型参数数量分别下降了8.6%，8.5%和2.6%，但是平均交并比却得到了提升。在使用插入方式对另外4个网络进行修改后，所有网络的平均交并比全部得到了不同程度的提高。结论本文提出的互注意力模块可普遍提升模型的语义分割准确度，实现不同网络模型的即插即用，具有较高的普适性。

关键词

图像语义分割轻量级网络互注意力模块特征融合关联模型

Mutual attention mechanism-driven lightweight semantic segmentation network

Li Fengyong¹, Ye Bin¹, Qin Chuan²(1.College of Computer Science and Technology, Shanghai University of Electric Power, Shanghai 201306, China;2.School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China)

Abstract

Objective The intrinsic and semantic feature-based image fusion is a challenging issue in relevance to image semantic segmentation. Image semantic segmentation has been developing for such domains like medical image analysis， remote sensing mapping，automatic driving and other related contexts. Lightweight semantic segmentation network is possible to meet the requirements of real applications via alleviating the constraints of speed and accuracy. The image semantic segmentation network is focused on convolutional neural network（CNN）to extract image-related texture，contour，color， gray and other related features of the objects，and it can be melted into semantic features as well. An efficient feature fusion mechanism is still to be tackled and resolved in terms of the intrinsic and semantic features extraction. For lightweight semantic segmentation network，feature fusion effect is restricted by insufficient model channels-derived feature representation. Network architecture-specific fusion modules are preferred but lack of scalability and universality. Although attention mechanisms have also been widely used in image semantic segmentation networks，current researches are still focused on self-attention mechanisms only，which are limited to existing feature maps，and information interaction is difficult to be achieved between different feature maps. To meet the requirement of feature fusion，we design a mutual attention mechanism-driven semantic segmentation module in terms of features-between correlations. In this module，attention mechanism is introduced for feature fusion stage，and information exchange can be realized between different feature maps. Method First，we reconstruct non-local module based attention calculation mechanism，and the mapping between query， key and value can be changed to obtain the mutual attention module，which obtains detail characteristic map and semantic characteristics map. Then，a single attention mechanism-derived association model is built up between any one feature point on the intrinsic and the semantic feature map. Subsequently，association model-guided features are aggregated on the semantic feature map，which adds this feature point to the intrinsic feature map. Accordingly，feature map-semantic information is fused to intrinsic feature map. Furthermore，the same operation is implemented to fuse the information from the intrinsic feature map to the semantic feature map，and mutual attention-based feature fusion mechanism can be finally realized. Since the fused feature and the input feature are used to preserve the same type，the fused feature can be easily embedded into existing image semantic segmentation model. The proposed mutual attention module can be used to share queries and keys as well. The complexity of the model is alleviated effectively and multiple branches-sharing models can be used to strengthen the connection between different branches. To improve the performance of the model further，the transmission of information can be coordinated through the mutual attention mechanism. Since the association model can guide feature fusion and splicing the model with cross shared operation effectively，the representation capability of fused feature is significantly enhanced，which can optimize computational cost and semantic segmentation efficiency. Result To verify the effectiveness of the proposed mutual attention module，bilateral segmentation network（BiSeNet V2）-based comparative experiments can show its feature fusion potentials of the mutual attention module. To develop the diversity of the mutual attention module，five semantic segmentation models are selected for experiments，and CamVid-related public datasets are used for training simutaneously. The quantitative average results of five experiments can be used to compare the amount of floating-point operations，memory usage，number of model parameters，and average cross-to-parallel ratio when the mutual attention module is added to the original network. The experimental results demonstrate the network model is more lightweight for the BiSeNet V2 network using optional methods. Each optimization is improved by 8. 6%，8. 5%，and 2. 6%， but the average cross-to-parallel ratio has been improved. The average intersection of all networks has been improved significantly via the other related four networks-embedded modifications，and the highest gains can be reached to 1. 14% and 0. 74% of each. Additionally，result analysis is based on the number of channels of queries，keys and values，the number of attention on the complexity，and modeling generation. Conclusion A mutual attention mechanism-driven lightweight semantic segmentation network is developed and demonstrated. The proposed mutual attention mechanism can guide feature fusion via multiple feature-between correlation. The public datasets-between experimental results are compared and illustrated that mutual attention mechanism is melted into different lightweight semantic segmentation network models，and semantic segmentation accuracy of the model are improved as well. To strengthen its ability of universality further，the proposed mutual attention module can be used to realize the plug-and-play for different network models.

Keywords

image semantic segmentation lightweight network mutual attention module feature fusion association model

在线采编平台

论文出版

年度会议

下载中心

年度信息