Current Issue Cover

郭婧1, 王飞2(1.晋中职业技术学院;2.英国布莱顿大学)

摘 要
目的 构建支持分支和查询分支间的信息交互对于提升小样本语义分割的性能具有重要的作用,提出一种跨分支交叉指导网络的小样本语义分割算法。方法 首先,利用一组共享权重的主干网络将双分支输入图片映射到深度特征空间,并将输出的低层、中间层和高层特征进行尺度融合,构造多尺度特征;其次,借助支持分支的掩码将支持特征分解成目标前景和背景特征图;然后,设计了一种特征交互模块,在支持分支的目标前景和整个查询分支的特征图上建立信息交互,增强任务相关特征的表达能力,并利用掩码平均池化策略生成目标前景和背景区域的原型集;最后,利用无参数的度量方法分别计算支持特征和原型集、查询特征与原型集之间的余弦相似度值,并根据相似度值给出对应图片的掩码。结果 通过在PASCAL-5i和COCO-20i开源数据集上进行实验,结果表明,利用Vgg-16、ResNet-50和ResNet-101作为主干网络时,所提出模型在1-way 1-shot任务中分别可以获得50.2%/53.2%/57.1%和23.9%/35.1%/36.4%的mIoU、68.3%/69.4%/72.3%/和60.1%/62.4%/64.1%的FB-IoU;在1-way 5-shot任务上分别可以获得52.9%/55.7%/59.7%和32.5%/37.3%/38.3%的mIoU、69.7%/72.5%/74.6%和64.2%/66.2%/66.7%的FB-IoU。结论 相比当前主流的小样本语义分割模型,所提出模型在1-way 1-shot和在1-way 5-shot任务中可以获得更高的mIoU和FB-IoU,综合性能提升效果显著。
Cross-Branch Cross-Guidance Network for Few-Shot Semantic Segmentation


Objective Establishing information interactions between support and query branches play an important role in improving the performance of few-shot semantic segmentation, a novel few-shot semantic segmentation based on cross-branch cross-guidance network is proposed. Method First, a set of backbone networks with shared weights are used to map the dual-branch images into the deep feature space. And the low-level, middle-level, and high-level features are fused to construct multi-scale features. Secondly, the ground truth mask of the support image is used to decompose the support features into foreground and background feature maps. Then, a feature interaction module is designed to establish information interaction between the target foreground of the support branch and the feature map of the entire query branch, enhancing the expressive ability of task-related features. Meanwhile, the masked average pooling strategy is used to generate prototype sets of target foreground and background regions. Finally, a non-parametric metric approach is used to calculate the cosine similarity values between support features and prototype sets, as well as between query features and prototype sets. And the corresponding masks of the images are generated based on the similarity values. Result Experiments on the PASCAL-5i and COCO-20i open datasets demonstrate that when using Vgg-16, ResNet-50, and ResNet-101 as the backbone networks, the proposed model can achieve mIoU of 50.2%/53.2%/57.1% and FB-IoU of 23.9%/35.1%/36.4% in the 1-way 1-shot task, as well as mIoU of 68.3%/69.4%/72.3% and FB-IoU of 60.1%/62.4%/64.1% in the 1-way 5-shot task. Conclusion Compared to current mainstream few-shot semantic segmentation models, the proposed model achieves higher mIoU and FB-IoU in the 1-way 1-shot and 1-way 5-shot tasks, demonstrating significant overall performance improvement.