互注意力机制驱动的轻量级图像语义分割网络

栗风永; 叶彬; 秦川

doi:10.11834/jig.211127

图像分析和识别 | 浏览量 : 0 下载量: 1 CSCD: 3

PDF
导出
分享
收藏
专辑

互注意力机制驱动的轻量级图像语义分割网络
Mutual attention mechanism-driven lightweight semantic segmentation network
2023年28卷第7期页码：2068-2080
纸质出版日期： 2023-07-16 ，
DOI： 10.11834/jig.211127
稿件说明：

移动端阅览

栗风永，叶彬，秦川. 2023. 互注意力机制驱动的轻量级图像语义分割网络. 中国图象图形学报， 28(07):2068-2080

Li Fengyong， Ye Bin， Qin Chuan. 2023. Mutual attention mechanism-driven lightweight semantic segmentation network. Journal of Image and Graphics， 28(07):2068-2080
栗风永，叶彬，秦川. 2023. 互注意力机制驱动的轻量级图像语义分割网络. 中国图象图形学报， 28(07):2068-2080 DOI： 10.11834/jig.211127.

Li Fengyong， Ye Bin， Qin Chuan. 2023. Mutual attention mechanism-driven lightweight semantic segmentation network. Journal of Image and Graphics， 28(07):2068-2080 DOI： 10.11834/jig.211127.

摘要

目的

在图像语义分割中，细节特征和语义特征的融合是该领域的一个难点。一些在特定网络架构下设计的专用融合模块缺乏可扩展性和普适性，自注意力虽然可以实现全局的信息捕获，但不能实现不同特征的融合，其他的注意力机制在进行掩码计算时缺少可解释性。本文根据特征图之间的关联度进行建模，提出一种互注意力机制驱动的分割模块。

方法

该模块获取不同阶段的细节特征图和语义特征图，建立细节特征图上任一点和语义特征图之间的关联模型，并在关联模型的指导下对语义特征图上的特征进行聚合，作为细节特征图上该特征点的补充，从而将语义特征图上的信息融合到细节特征图上，并进一步采用相同的操作将细节特征图上的信息融合到语义特征图上，实现来自不同阶段特征图的相互融合。

结果

选取5个语义分割模型进行实验，实验结果表明，在使用替换方式对BiSeNet V2（bilateral segmentation network）进行修改之后，浮点运算量、内存占用量和模型参数数量分别下降了8.6%，8.5%和2.6%，但是平均交并比却得到了提升。在使用插入方式对另外4个网络进行修改后，所有网络的平均交并比全部得到了不同程度的提高。

结论

本文提出的互注意力模块可普遍提升模型的语义分割准确度，实现不同网络模型的即插即用，具有较高的普适性。

Abstract

Objective

The intrinsic and semantic feature-based image fusion is a challenging issue in relevance to image semantic segmentation. Image semantic segmentation has been developing for such domains like medical image analysis， remote sensing mapping， automatic driving and other related contexts. Lightweight semantic segmentation network is possible to meet the requirements of real applications via alleviating the constraints of speed and accuracy. The image semantic segmentation network is focused on convolutional neural network （CNN） to extract image-related texture， contour， color， gray and other related features of the objects， and it can be melted into semantic features as well. An efficient feature fusion mechanism is still to be tackled and resolved in terms of the intrinsic and semantic features extraction. For lightweight semantic segmentation network， feature fusion effect is restricted by insufficient model channels-derived feature representation. Network architecture-specific fusion modules are preferred but lack of scalability and universality. Although attention mechanisms have also been widely used in image semantic segmentation networks， current researches are still focused on self-attention mechanisms only， which are limited to existing feature maps， and information interaction is difficult to be achieved between different feature maps. To meet the requirement of feature fusion， we design a mutual attention mechanism-driven semantic segmentation module in terms of features-between correlations. In this module， attention mechanism is introduced for feature fusion stage， and information exchange can be realized between different feature maps.

Method

First， we reconstruct non-local module based attention calculation mechanism， and the mapping between query， key and value can be changed to obtain the mutual attention module， which obtains detail characteristic map and semantic characteristics map. Then， a single attention mechanism-derived association model is built up between any one feature point on the intrinsic and the semantic feature map. Subsequently， association model-guided features are aggregated on the semantic feature map， which adds this feature point to the intrinsic feature map. Accordingly， feature map-semantic information is fused to intrinsic feature map. Furthermore， the same operation is implemented to fuse the information from the intrinsic feature map to the semantic feature map， and mutual attention-based feature fusion mechanism can be finally realized. Since the fused feature and the input feature are used to preserve the same type， the fused feature can be easily embedded into existing image semantic segmentation model. The proposed mutual attention module can be used to share queries and keys as well. The complexity of the model is alleviated effectively and multiple branches-sharing models can be used to strengthen the connection between different branches. To improve the performance of the model further， the transmission of information can be coordinated through the mutual attention mechanism. Since the association model can guide feature fusion and splicing the model with cross shared operation effectively， the representation capability of fused feature is significantly enhanced， which can optimize computational cost and semantic segmentation efficiency.

Result

To verify the effectiveness of the proposed mutual attention module， bilateral segmentation network （BiSeNet V2）-based comparative experiments can show its feature fusion potentials of the mutual attention module. To develop the diversity of the mutual attention module， five semantic segmentation models are selected for experiments， and CamVid-related public datasets are used for training simutaneously. The quantitative average results of five experiments can be used to compare the amount of floating-point operations， memory usage， number of model parameters， and average cross-to-parallel ratio when the mutual attention module is added to the original network. The experimental results demonstrate the network model is more lightweight for the BiSeNet V2 network using optional methods. Each optimization is improved by 8.6%， 8.5%， and 2.6%， but the average cross-to-parallel ratio has been improved. The average intersection of all networks has been improved significantly via the other related four networks-embedded modifications， and the highest gains can be reached to 1.14% and 0.74% of each. Additionally， result analysis is based on the number of channels of queries， keys and values， the number of attention on the complexity， and modeling generation.

Conclusion

A mutual attention mechanism-driven lightweight semantic segmentation network is developed and demonstrated. The proposed mutual attention mechanism can guide feature fusion via multiple feature-between correlation. The public datasets-between experimental results are compared and illustrated that mutual attention mechanism is melted into different lightweight semantic segmentation network models， and semantic segmentation accuracy of the model are improved as well. To strengthen its ability of universality further， the proposed mutual attention module can be used to realize the plug-and-play for different network models.

关键词

图像语义分割轻量级网络互注意力模块特征融合关联模型

Keywords

image semantic segmentationlightweight networkmutual attention modulefeature fusionassociation model

references

Chen L C， Papandreou G， Kokkinos I， Murphy K and Yuille A L. 2018a. DeepLab： semantic image segmentation with deep convolutional nets， Atrous convolution， and fully connected CRFs. IEEE Transactions on Pattern Analysis and Machine Intelligence， 40（4）： 834-848 ［DOI： 10.1109/TPAMI.2017.2699184http://dx.doi.org/10.1109/TPAMI.2017.2699184］

Chen L C， Zhu Y K， Papandreou G， Schroff F and Adam H. 2018b. Encoder-decoder with Atrous separable convolution for semantic image segmentation//Proceedings of the 15th European Conference on Computer Vision （ECCV）. Munich， Germany： Springer： 833-851 ［DOI： 10.1007/978-3-030-01234-2_49http://dx.doi.org/10.1007/978-3-030-01234-2_49］

Gao X， Li C G and An J B. 2021. Real-time image semantic segmentation based on attention mechanism and multi-label classification. Journal of Computer-Aided Design & Computer Graphics， 33（1）： 59-67

高翔，李春庚，安居白. 2021. 基于注意力和多标签分类的图像实时语义分割. 计算机辅助设计与图形学学报， 33（1）： 59-67 ［DOI： 10.3724/SP.J.1089.2021.18233http://dx.doi.org/10.3724/SP.J.1089.2021.18233］

Hu J， Shen L and Sun G. 2018. Squeeze-and-excitation networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7132-7141 ［DOI： 10.1109/cvpr.2018.00745http://dx.doi.org/10.1109/cvpr.2018.00745］

Jiang W H， Xie Z Z， Li Y Y， Liu C and Lu H T. 2020. LRNNET： a light-weighted network with efficient reduced non-local operation for real-time semantic segmentation//Proceedings of 2020 IEEE International Conference on Multimedia & Expo Workshops （ICMEW）. London， UK： IEEE： 1-6 ［DOI： 10.1109/icmew46912.2020.9106038http://dx.doi.org/10.1109/icmew46912.2020.9106038］

Li G， Yun I， Kim J and Kim J. 2019a. DABNet： depth-wise asymmetric bottleneck for real-time semantic segmentation ［EB/OL］. ［2019-10-01］. https://arxiv.org/pdf/1907.11357v2.pdfhttps://arxiv.org/pdf/1907.11357v2.pdf

Li H C， Xiong P F， Fan H Q and Sun J. 2019b. DFANet： deep feature aggregation for real-time semantic segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 9514-9523 ［DOI： 10.1109/cvpr.2019.00975http://dx.doi.org/10.1109/cvpr.2019.00975］

Li X， Zhou Y M， Pan Z and Feng J S. 2019c. Partial order pruning： for best speed/accuracy trade-off in neural architecture search//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 9137-9145 ［DOI： 10.1109/cvpr.2019.00936http://dx.doi.org/10.1109/cvpr.2019.00936］

Li Y W， Song L， Chen Y K， Li Z M， Zhang X Y， Wang X G and Sun J. 2020. Learning dynamic routing for semantic segmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle， USA： IEEE： 8550-8559 ［DOI： 10.1109/cvpr42600.2020.00858http://dx.doi.org/10.1109/cvpr42600.2020.00858］

Long J， Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston， USA： IEEE： 3431-3440 ［DOI： 10.1109/cvpr.2015.7298965http://dx.doi.org/10.1109/cvpr.2015.7298965］

Luo K K， Wang T and Ye F F. 2021. U-Net segmentation model of brain tumor MR image based on attention mechanism and multi-view fusion. Journal of Image and Graphics， 26（9）： 2208-2218

罗恺锴，王婷，叶芳芳. 2021. 引入注意力机制和多视角融合的脑肿瘤MR图像U-Net分割模型. 中国图象图形学报， 26（9）： 2208-2218 ［DOI： 10.11834/jig.200584http://dx.doi.org/10.11834/jig.200584］

Mehta S， Rastegari M， Caspi A， Shapiro L and Hajishirzi H. 2018. ESPNet： efficient spatial pyramid of dilated convolutions for semantic segmentation//Proceedings of the 15th European Conference on Computer Vision. Munich， Germany： Springer： 561-580 ［DOI： 10.1007/978-3-030-01249-6_34http://dx.doi.org/10.1007/978-3-030-01249-6_34］

Mehta S， Rastegari M， Shapiro L and Hajishirzi H. 2019. ESPNetv2： a light-weight， power efficient， and general purpose convolutional neural network//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach， USA： IEEE： 9190-9200 ［DOI： 10.1109/cvpr.2019.00941http://dx.doi.org/10.1109/cvpr.2019.00941］

Qin F W， Shen X Y， Peng Y， Shao Y L， Yuan W Q， Ji Z P and Bai J. 2021. A real-time semantic segmentation approach for autonomous driving scenes. Journal of Computer-Aided Design & Computer Graphics， 33（7）： 1026-1037

秦飞巍，沈希乐，彭勇，邵艳利，袁文强，计忠平，白静. 2021. 无人驾驶中的场景实时语义分割方法. 计算机辅助设计与图形学学报， 33（7）： 1026-1037 ［DOI： 10.3724/SP.J.1089.2021.18631http://dx.doi.org/10.3724/SP.J.1089.2021.18631］

Wang X L， Girshick R， Gupta A and He K M. 2018. Non-local neural networks//Proceedings of 2018 IEEE Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 7794-7803. ［DOI： 10.1109/cvpr.2018.00813http://dx.doi.org/10.1109/cvpr.2018.00813］

Wu T Y， Tang S， Zhang R， Cao J and Zhang Y D. 2021. CGNet： a light-weight context guided network for semantic segmentation. IEEE Transactions on Image Processing， 30： 1169-1179 ［DOI： 10.1109/tip.2020.3042065http://dx.doi.org/10.1109/tip.2020.3042065］

Xiong C Z and Zhi H. 2019. Weakly supervised semantic segmentation and optimization algorithm based on multi-scale feature model. Journal on Communications， 40（1）： 163-171

熊昌镇，智慧. 2019. 融合多尺度信息的弱监督语义分割及优化. 通信学报， 40（1）： 163-171 ［DOI： 10.11959/j.issn.1000-436x.2019004http://dx.doi.org/10.11959/j.issn.1000-436x.2019004］

Yu C N， Gao C X， Wang J B， Yu G， Shen C H and Sang N. 2021. BiSeNet V2： bilateral network with guided aggregation for real-time semantic segmentation. International Journal of Computer Vision， 129（11）： 3051-3068 ［DOI： 10.1007/s11263-021-01515-2http://dx.doi.org/10.1007/s11263-021-01515-2］

Yu C Q， Wang J B， Peng C， Gao C X， Yu G and Sang N. 2018a. BiSeNet： bilateral segmentation network for real-time semantic segmentation//Proceedings of the 15th European Conference on Computer Vision （ECCV）. Munich， Germany： Springer： 334-349 ［DOI： 10.1007/978-3-030-01261-8_20http://dx.doi.org/10.1007/978-3-030-01261-8_20］

Yu C Q， Wang J B， Peng C， Gao C X， Yu G and Sang N. 2018b. Learning a discriminative feature network for semantic segmentation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City， USA： IEEE： 1857-1866 ［DOI： 10.1109/cvpr.2018.00199http://dx.doi.org/10.1109/cvpr.2018.00199］

Zhang A Q， Kang Y X， Wu Z Y， Cui L and Bo Q R. 2021. Semantic segmentation network of pathological images of liver tissue based on multi-scale feature and attention mechanism. Pattern Recognition and Artificial Intelligence， 34（4）： 375-384

张墺琦，亢宇鑫，武卓越，崔磊，卜起荣. 2021. 基于多尺度特征和注意力机制的肝脏组织病理图像语义分割网络. 模式识别与人工智能， 34（4）： 375-384 ［DOI： 10.16451/j.cnki.issn1003-6059.202104010http://dx.doi.org/10.16451/j.cnki.issn1003-6059.202104010］

Zhao H S， Qi X J， Shen X Y， Shi J P and Jia J Y. 2018. ICNet for real-time semantic segmentation on high-resolution images//Proceedings of the 15th European Conference on Computer Vision （ECCV）. Munich， Germany： Springer： 418-434 ［DOI： 10.1007/978-3-030-01219-9_25http://dx.doi.org/10.1007/978-3-030-01219-9_25］

Zhuang J T， Yang J L， Gu L and Dvornek N. 2019. ShelfNet for fast semantic segmentation//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop. Seoul， Korea （South）： IEEE： 847-856 ［DOI： 10.1109/iccvw.2019.00113http://dx.doi.org/10.1109/iccvw.2019.00113］

文章被引用时，请邮件提醒。

提交

图神经网络与CNN融合的虹膜特征编码方法

混合监督学习的乳腺癌全切片病理图像分类

红外与可见光图像特征动态选择的目标检测网络

融合姿态引导和多尺度特征的遮挡行人重识别

结合双边交叉增强与自注意力补偿的点云语义分割