针对遮挡物体的轮廓细化实例分割
李伟, 黄娅, 张馨渊, 韩贵金(西安邮电大学) 摘 要
目的 遮挡物体实例分割效果的好坏与物体轮廓的预测结果息息相关,但目前算法预测的物体轮廓并不够细化,使得分割掩膜粗糙,物体边界分割效果不佳。为此,本文以BCNet ( bilayer convolutional network )网络为基础,提出一种针对遮挡物体的轮廓细化实例分割算法,预测的物体轮廓更加精细,分割掩膜更加完整。方法 首先,提出一种均衡池化注意力模块来提取特征,在传统一维平均池化的基础上,增加一维最大池化操作以突出细节特征,并将最大池化和平均池化结果进行加权融合来提取特征,使提取的特征能更好地兼顾物体的整体和边缘细节;其次,将BCNet掩膜头中轮廓预测与掩膜预测分成两个支路来进行,从特征金字塔最高分辨率特征中提取感兴趣区域( region of interest, RoI )特征用于轮廓预测,并提出一种自适应特征融合模块,将轮廓预测支路中的特征与掩膜预测支路的特征进行融合,在轮廓预测支路中,融合掩膜预测支路的特征可以更好的判定轮廓所属物体类别,在掩膜预测支路,融合轮廓预测支路的特征能够更好的辅助掩膜定位。结果 在COCO 2017 (common objects in context 2017)数据集上,本文相较于目前同类网络中性能最优的 BCNet 网络,在骨干网络为ResNet-50/101时平均精度( average precision, AP )分别提高了1.7%和2.1%。可视化结果表明,本分割算法对遮挡物体的轮廓分割更加精细,有效的分割出更加完整、精细的掩码。结论 本文所提出的针对遮挡物体的轮廓细化实例分割算法,明显提升了遮挡物体实例分割的效果。
关键词
Contour refinement instance segmentation for occluded objects
Li Wei, Huang Ya, Zhang Xinyuan, Han Guijin(Xi’an University of Posts & Telecommunications) Abstract
Objective Instance segmentation is a hot topic in image processing and computer vision, which is widely used in unmanned driving, medical image segmentation and other fields. The high overlap between objects in practical application scenarios makes instance segmentation more challenging. The instance segmentation model includes two-stage instance segmentation model and single stage instance segmentation model. In the two-stage instance segmentation model, the two-layer instance segmentation network (bilayer convolutional network, BCNet) is one of the most successful networks in the field of occlusion object instance segmentation. Unlike the previous two-stage instance segmentation method, BCNet extracts regions of interest and simultaneously regresses the occluder and occludee regions, grouping the pixels belonging to the occluder region and treating them as pixels in the occludee region, but dividing them into two independent image layers, naturally decoupling the boundaries of the two objects, taking into account the interaction between the two during the mask regression phase.Thus, a two-layer convolutional network is proposed, in which the top GCN layer detects occluder objects and the bottom GCN layer infers occludee instances. Explicit modeling of occlusion relationships based on a two-layer structure naturally decouples the boundaries of occlude and occludee instances, and it considers the interaction between them in the mask regression process, the BCNet network proposed a new structure for highly overlapping objects in segmentation scenes. The segmentation performance of occluded object instances is closely related to the prediction results of object contours. However, the current algorithm"s predicted object contours are not sufficiently refined, resulting in rough segmentation masks and poor object boundary segmentation performance. Therefore, this article proposes an occluded objects instance segmentation based on balance pooling attention and adaptive feature fusion, based on the BCNet network. The predicted object contour is more precise and the segmentation mask is more complete. Method Simple average pooling will cause significant feature information to be lost and cannot generate more detailed feature information, while maximum pooling can retain more significant feature information. Based on this, this paper proposes a balance pooling attention module, The global context information is collected in the form of one-dimensional maximum pooling and one-dimensional average pooling in parallel, and the proportion of maximum pooling and average pooling is balanced through weighted feature fusion to generate more detailed information without losing overall features, and the balance pooling attention module is added to the ResNet to enable the feature extraction stage to extract features with more details. Features at different scales contain different feature information, high resolution features contain more detailed information, while low resolution features contain richer semantic information. Unlike the input of the mask head in BCNet, this paper separates the contour prediction branch from the mask prediction branch and adopts different feature inputs, extracting RoI features from the highest resolution features in FPN as features for contour prediction branch, and extracting RoI features from cascaded low resolution features as input for mask prediction branch. The features of the contour prediction branch have more detailed and spatial information, and the mask prediction branch has more abundant semantic information. To further improve the segmentation effect, the adaptive feature fusion module is proposed to fuse the features in the contour prediction branch with the features in the mask prediction branch. In the contour prediction branch, the features of the fused mask prediction branch can better determine the object category to which the contour belongs. In the mask prediction branch, the features of the fused contour prediction branch can better assist in mask positioning. Using the Criss-Cross Attention attention module to infer the contours and masks of occlude and occludee objects, which can maintain prediction accuracy and effectively reduce computational complexity compared to the non local operator structure GCN module adopted in BCNet. Result The segmentation accuracy is compared with classic instance segmentation models such as Mask R-CNN, CenterMask, and BCNet on the COCO 2017 validation set. Compared to the baseline network BCNet, when the backbone network is ResNet-50 and ResNet-101, the AP increases by 1.7% and 1.6%, respectively, and there is a clear improvement in the segmentation accuracy of multiscale targets. It is proved that this model is effective for improving the baseline network. The segmentation accuracy of this article is higher than that of two-stage instance segmentation algorithms such as Mask R-CNN and CenterMask, and it also has advantages over single-stage segmentation algorithms such as YOLACT. From the contour binary graph, it can be seen that the boundaries extracted by this method are more precise. Experimental results show that this method has strong model generalization and robustness. Conclusion The occluded objects instance segmentation based on balance pooling attention and adaptive feature fusion proposed in this article can effectively improve the effectiveness of occluded objects instance segmentation.
Keywords
occluded objects instance segmentation balance pooling attention module(BPAM) adaptive feature fusion module(AFFM) bilayer convolutional network(BCNet) contour prediction branch mask prediction branch
|