针对遮挡物体的轮廓细化实例分割

李伟; 黄娅; 张馨渊; 韩贵金

发布时间： 2024-05-20
摘要点击次数： 395
全文下载次数： 335
DOI: 10.11834/jig.230194
2024 | Volume 29 | Number 5

针对遮挡物体的轮廓细化实例分割

李伟, 黄娅, 张馨渊, 韩贵金(西安邮电大学自动化学院, 西安 710121)

摘要

目的遮挡物体实例分割效果的好坏与物体轮廓的预测结果息息相关，但目前算法预测的物体轮廓并不够细化，使得分割掩膜粗糙，物体边界分割效果不佳。为此，以BCNet （bilayer convolutional network）网络为基础，提出一种针对遮挡物体的轮廓细化实例分割算法，预测的物体轮廓更加精细，分割掩膜更加完整。方法 1）提出一种均衡池化注意力模块来提取特征，在传统一维平均池化的基础上，增加一维最大池化操作以突出细节特征，并将最大池化和平均池化结果进行加权融合来提取特征，使提取的特征能更好地兼顾物体的整体和边缘细节；2）将BCNet掩膜头中轮廓预测与掩膜预测分成两个支路来进行，从特征金字塔最高分辨率特征中提取感兴趣区域（region of interest，RoI）特征用于轮廓预测，并提出一种自适应特征融合模块，将轮廓预测支路中的特征与掩膜预测支路的特征进行融合，在轮廓预测支路中，融合掩膜预测支路的特征可以更好地判定轮廓所属物体类别，在掩膜预测支路，融合轮廓预测支路的特征能够更好地辅助掩膜定位。结果在COCO 2017（common objects in context 2017）数据集上，本文相较于目前同类网络中性能最优的BCNet网络，在骨干网络为ResNet-50/101（deep residual network）时平均精度（average precision，AP）分别提高了1.7%和2.1%。结合可视化结果，本文分割算法对遮挡物体的轮廓分割更加精细，能有效分割出更加完整、精细的掩码。结论提出的针对遮挡物体的轮廓细化实例分割算法，明显提升了遮挡物体实例分割的效果。

关键词

遮挡物体实例分割均衡池化注意力模块(BPAM) 自适应特征融合模块(AFFM) BCNet 轮廓预测支路掩膜预测支路

Contour refinement instance segmentation for occluded objects

Li Wei, Huang Ya, Zhang Xinyuan, Han Guijin(School of Automation, Xi'an University of Posts and Telecommunications, Xi'an 710121, China)

Abstract

Objective Instance segmentation is a popular topic in image processing and computer vision. It is widely used in unmanned driving，medical image segmentation，and other fields. The high overlap between objects in practical application scenarios makes instance segmentation more challenging. Instance segmentation models include two-stage and singlestage models. In a two-stage instance segmentation model，the two-layer instance segmentation network（i. e. ，bilayer convolutional network（BCNet）is one of the most successful networks in the field of occlusion object instance segmentation. In contrast with previous two-stage instance segmentation methods，BCNet extracts regions of interest（RoI）and simultaneously regresses the occluder and occludee regions，grouping pixels that belong to the occluder region and treating them as pixels in the occludee region，but dividing them into two independent image layers. Consequently，the boundaries of the two objects are naturally decoupled，considering the interaction between them during the mask regression phase. Thus，a two-layer convolutional network is proposed，in which the top graph convolutional network（GCN）layer detects occluder objects while the bottom GCN layer infers occludee instances. The explicit modeling of occlusion relationships on the basis of a two-layer structure naturally decouples the boundaries of occluder and occludee instances，and considers the interaction between them in mask regression. BCNet proposes a new structure for highly overlapping objects in segmentation scenes. The segmentation performance of occluded object instances is closely related to the prediction results of object contours. However，the current algorithm’s predicted object contours are not sufficiently refined，resulting in rough segmentation masks and poor object boundary segmentation performance. Therefore，this study presents a contour refinement instance segmentation method for occluded objects based on BCNet. The predicted object contour is more precise，and the segmentation mask is more complete. Method Simple average pooling causes the loss of a significant amount of feature information and cannot generate more detailed feature information，while maximum pooling can retain more significant feature information. On this basis，the study proposes a balance pooling attention module. Global context information is collected in the form of 1D maximum pooling and 1D average pooling in parallel. The proportion of maximum pooling and average pooling is balanced through weighted feature fusion to generate more detailed information without losing overall features. The balance pooling attention module is added to ResNet to enable the feature extraction stage to extract features with more details. Features at different scales contain different feature information. High-resolution features contain more detailed information，while low-resolution features contain richer semantic information. In contrast with the input for the mask head in BCNet，this study separates the contour prediction branch from the mask prediction branch and adopts different feature inputs，extracting region of interest（RoI）features from the highest resolution features in a feature pyramid network（FPN）as features for the contour prediction branch and RoI features from the cascaded low-resolution features as input for the mask prediction branch. The features of the contour prediction branch have more detailed and spatial information，and the mask prediction branch has more abundant semantic information. To improve the segmentation effect further， the adaptive feature fusion module is proposed to fuse the features in the contour prediction branch with the features in the mask prediction branch. In the contour prediction branch，the features of the fused mask prediction branch can better determine the object category to which the contour belongs. In the mask prediction branch，the features of the fused contour prediction branch can better assist in mask positioning. The criss-cross attention module is used to infer the contours and masks of the occluder and occludee objects. It can maintain prediction accuracy and effectively reduce computational complexity compared with the non-local operator structure GCN module adopted in BCNet. Result Segmentation accuracy is compared with classic instance segmentation models，such as mask region-based convolutional neural network（Mask R-CNN），CenterMask，and BCNet，on the common objects in context（COCO）2017 validation set. Compared with the baseline network BCNet，average precision（AP）increases by 1. 7% and 2. 1% when the backbone network is ResNet-50 and ResNet-101，respectively. An evident improvement is observed in the segmentation accuracy of multi-scale targets， proving that this model is effective for improving the baseline network. The segmentation accuracy in this study is higher than those of two-stage instance segmentation algorithms，such as Mask R-CNN and CenterMask. The proposed method also exhibits advantages over single-stage segmentation algorithms，such as you only look at coefficients（YOLACT）. From the contour binary graph，the boundaries extracted using this method are more precise. The experimental results show that the proposed method exhibits strong model generalization and robustness. Conclusion The occluded object instance segmentation based on balance pooling attention and adaptive feature fusion proposed in this study can effectively improve the effectiveness of occluded object instance segmentation.

Keywords

occluded objects instance segmentation balance pooling attention module(BPAM) adaptive feature fusion module(AFFM) bilayer convolutional network(BCNet) contour prediction branch mask prediction branch