RGB-D salient object detection of using few-shot learning
- Vol. 27, Issue 10, Pages: 2860-2872(2022)
Received:24 November 2021,
Revised:2022-4-8,
Accepted:15 April 2022,
Published:16 October 2022
DOI: 10.11834/jig.211068
移动端阅览

浏览全部资源
扫码关注微信
Received:24 November 2021,
Revised:2022-4-8,
Accepted:15 April 2022,
Published:16 October 2022
移动端阅览
目的
2
现有基于RGB-D(RGB-depth)的显著性物体检测方法往往通过全监督方式在一个较小的RGB-D训练集上进行训练,因此其泛化性能受到较大的局限。受小样本学习方法的启发,本文将RGB-D显著性物体检测视为小样本问题,利用模型解空间优化和训练样本扩充两类小样本学习方法,探究并解决小样本条件下的RGB-D显著性物体检测。
方法
2
模型解空间优化通过对RGB和RGB-D显著性物体检测这两种任务进行多任务学习,并采用模型参数共享的方式约束模型的解空间,从而将额外的RGB显著性物体检测任务学习到的知识迁移至RGB-D显著性物体检测任务中。另外,训练样本扩充通过深度估计算法从额外的RGB数据生成相应的深度图,并将RGB图像和所生成的深度图用于RGB-D显著性物体检测任务的训练。
结果
2
在9个数据集上的对比实验表明,引入小样本学习方法能有效提升RGB-D显著性物体检测的性能。此外,对不同小样本学习方法在不同的RGB-D显著性物体检测模型下(包括典型的中期融合模型和后期融合模型)进行了对比研究,并进行相关分析与讨论。
结论
2
本文尝试将小样本学习方法用于RGB-D显著性物体检测,探究并利用两种不同小样本学习方法迁移额外的RGB图像知识,通过大量实验验证了引入小样本学习来提升RGB-D显著性物体检测性能的可行性和有效性,对后续将小样本学习引入其他多模态检测任务也提供了一定的启示。
Objective
2
Salient object detection is mainly used in computer vision pre-processing like video/image segmentation
visual tracking and video/image compression. Current RGB-depth(RGB-D) salient object detection(SOD) can be categorized into fully supervision and self-supervision. Fully supervised RGB-D SOD can effectively fuse the complementary information of two different modes for RGB images input and the corresponding depth maps by means of the three types of fusion (early/middle/late). To capture contextual information
self-supervised salient object detection uses a small number of unlabeled samples for pre-training. However
existing RGB-D salient object detection methods are mostly trained on a small RGB-D training set in a fully supervised manner
so their generalization ability is greatly restricted. Thanks to the emerging few-shot learning methods
our RGB-D salient object detection uses model hypothesis space optimization and training sample augmentation to explore and solve RGB-D salient object detection with few-shot learning.
Method
2
For model hypothesis space optimization
it can transfer the learned knowledge from extra RGB salient object detection task to RGB-D salient object detection task based on multi-task learning of RGB and RGB-D salient object detection tasks
and the hypothesis space of the model is constrained by sharing model parameters. Model-oriented
takeing into account middle and late fusions can add additional supervision to the network
therefore
the JL-DCF model is selected for middle fusion and the DANet
†
model is optioned for late fusion. To improve the effectiveness and generalization of RGB-D salient object detection tasks
RGB-D and RGB are simultaneously input into the network for online training and optimization in terms of JL-DCF
and the coarse prediction of RGB is supervised to optimize the network. In view of the commonality between the semantic segmentation and the saliency detection
the dual attention network for scene segmentation(DANet) model is transferred to the RGB-D salient object detection network
named DANet
†
. Similar to JL-DCF joint training
additional RGB supervision is added to the RGB branch of the two-stream DANet
†
. Furthermore
the training sample augmentation generates the related depth map based on the additional RGB data in terms of the depth estimation algorithm
and uses the RGB and the synthesized depth map for the training of the RGB-D salient object detection task. We adopt ResNet-101 as our network backbone. The scale of input image is 320×320×3 in JL-DCF network
and the scale of DANet
†
network input image is fixed to 480×480×3. The depth map is transformed into three-channel map by gray scale mapping. Our training set is composed of data from NJU2K
NLPR and DUTS
and the test set is NJU2K
NLPR
STERE
RGBD135
LFSD
SIP
DUT-RGBD
ReDWeb-S
DUTS (it is worth noting that
DUT-RGBD and ReDWeb-S are tested in the completed dataset based on 1 200 samples and 3 179 samples
respectively). The evaluation metrics are demonstrated as following: S measure (
$${S_\alpha }$$
)
maximum F measure (
$$F_\beta ^{\max }$$
)
maximum E measure (
$$E_\varphi ^{\max }$$
) and MAE (
$$M$$
). Our experiment is based on the Pytorch framework. The momentum parameter is 0.99
the learning rate is 0.000 05
and the weight decay is set to 0.000 5. Stochastic gradient descent learning technique is used to accelerate on NVIDIA RTX 2080S GPU. 1) Modeling: it takes about 20 hours to train 50 epochs. 2) Sampling: it takes about 100 hours to train 50 epochs and a weighting coefficient
$$\alpha $$
=2 200/10 553≈0.21 is illustrated to guarantee the roughly balanced in learning using the two different strategies.
Result
2
Our comparative experiments show that the introduction of few-shot learning methods on nine datasets can effectively improve the performance of RGB-D salient object detection. In addition
we compare different few-shot learning methods under different RGB-D salient object detection models (including typical middle-fusion model and late-fusion model)
and draws relevant analysis and discussion. In addition
the visual saliency map shows its potential of our few-shot RGB-D saliency object detection method.
Conclusion
2
We facilitate the few-shot learning method for RGB-D salient object detection. It develops two different few-shot learning methods for transferring additional knowledge. Our research is beneficial to develop the subsequent introduction of few-shot learning towards more multi-modal detection tasks.
Borji A, Cheng M M, Jiang H Z and Li J. 2015. Salient object detection: a benchmark. IEEE Transactions on Image Processing, 24(12): 5706-5722[DOI: 10.1109/TIP.2015.2487833]
Caelles S, Maninis K K, Pont-Tuset J, Leal-Taixé L, Cremers D and Van Gool L. 2017. One-shot video object segmentation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5320-5329[ DOI:10.1109/CVPR.2017.565 http://dx.doi.org/10.1109/CVPR.2017.565 ]
Chen H, Li Y F and Su D. 2019. Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recognition, 86: 376-385[DOI: 10.1016/j.patcog.2018.08.007]
Cheng Y P, Fu H Z, Wei X X, Xiao J J and Cao X C. 2014. Depth enhanced saliency detection method//Proceedings of 2014 International Conference on Internet Multimedia Computing and Service. Xiamen, China: ACM: 23-27[ DOI:10.1145/2632856.2632866 http://dx.doi.org/10.1145/2632856.2632866 ]
Fan D P, Cheng M M, Liu Y, Li T and Borji A. 2017. Structure-measure: a new way to evaluate foreground maps//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4558-4567[10.1109/ICCV. 2017.487]
Fan D P, Ji G P, Qin X B and Cheng M M. 2021. Cognitive vision inspired object segmentation metric and loss function. SCIENTIA SINICA Informationis, 51(9): 1475-1489
范登平, 季葛鹏, 秦雪彬, 程明明. 2021. 认知规律启发的物体分割评价标准及损失函数. 中国科学: 信息科学, 51(9): 1475-1489[DOI: 10.1360/ssi-2020-0370]
Fan D P, Lin Z, Zhang Z, Zhu M L and Cheng M M. 2021. Rethinking RGB-D salient object detection: models, data sets, and large-scale benchmarks. IEEE Transactions on Neural Networks and Learning Systems, 32(5): 2075-2089[DOI: 10.1109/TNNLS.2020.2996406]
Finn C, Abbeel P and Levine S. 2017. Model-agnostic meta-learning for fast adaptation of deep networks//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR. org: 1126-1135
Fu J, Liu J, Tian H J, Li Y, Bao Y J, Fang Z W and Lu H Q. 2019. Dual attention network for scene segmentation//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3141-3149[ DOI:10.1109/CVPR.2019.00326 http://dx.doi.org/10.1109/CVPR.2019.00326 ]
Fu K R, Fan D P, Ji G P and Zhao Q J. 2020. JL-DCF: joint learning and densely-cooperative fusion framework for RGB-D salient object detection//Proceedings of 2020 IEEE/CVF Conferenceon Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 3049-3059[ DOI:10.1109/CVPR42600.2020.00312 http://dx.doi.org/10.1109/CVPR42600.2020.00312 ]
Fu K R, Fan D P, Ji G P, Zhao Q J, Shen J B and Zhu C. 2021. Siamese network for RGB-D salient object detection and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence: #3073689[DOI: 10.1109/TPAMI.2021.3073689]
Gui L Y, Wang Y X, Ramanan D and Moura J M F. 2018. Few-shot human motion prediction via meta-learning//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 441-459[ DOI:10.1007/978-3-030-01237-3_27 http://dx.doi.org/10.1007/978-3-030-01237-3_27 ]
Han J W, Chen H, Liu N, Yan C G and Li X L. 2018. CNNs-based RGB-D saliency detection via cross-view transfer and multiview fusion. IEEE Transactions on Cybernetics, 48(11): 3171-3183[DOI: 10.1109/TCYB.2017.2761775]
Ju R, Ge L, Geng W J, Ren T W and Wu G S. 2014. Depth saliency based on anisotropic center-surround difference//Proceeding of 2014 International Conference on Image Processing. Paris, France: IEEE: 1115-1119[ DOI:10.1109/ICIP.2014.7025222 http://dx.doi.org/10.1109/ICIP.2014.7025222 ]
Li B, Yang Y and Liu Q. 2021. RGB-D video saliency detection via superpixel-level conditional random field. Journal of Image and Graphics, 26(4): 872-882
李贝, 杨铀, 刘琼. 2021. 超像素条件随机场下的RGB-D视频显著性检测. 中国图象图形学报, 26(4): 872-882[DOI: 10.11834/jig.200122]
Li N Y, Ye J W, Ji Y, Ling H B and Yu J Y. 2017. Saliency detection on light field. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(8): 1605-1616[DOI: 10.1109/TPAMI.2016.2610425]
Li Z Q and Snavely N. 2018. MegaDepth: learning single-view depth prediction from internet photos//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2041-2050[ DOI:10.1109/CVPR.2018.00218 http://dx.doi.org/10.1109/CVPR.2018.00218 ]
Liu N, Zhang N, Shao L and Han J W. 2021. Learning selective mutual attention and contrast for RGB-D saliency detection. IEEE Transactions on Pattern Analysis and Machine Intelligence: #3122139[DOI: 10.1109/TPAMI.2021.3122139]
Munkhdalai T and Yu H. 2017. Meta networks//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR. org: 2554-2563
Niu Y Z, Geng Y J, Li X Q and Liu F. 2012. Leveraging stereopsis for saliency analysis//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 454-461[ DOI:10.1109/CVPR.2012.6247708 http://dx.doi.org/10.1109/CVPR.2012.6247708 ]
Peng H W, Li B, Xiong W H, Hu W M and Ji R R. 2014. RGBD salient object detection: a benchmark and algorithms//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 92-109[ DOI:10.1007/978-3-319-10578-9_7 http://dx.doi.org/10.1007/978-3-319-10578-9_7 ]
Perazzi F, Krähenbühl P, Pritch Y and Hornung A. 2012. Saliency filters: contrast based filtering for salient region detection//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 733-740[ DOI:10.1109/CVPR.2012.6247743 http://dx.doi.org/10.1109/CVPR.2012.6247743 ]
Piao Y R, Ji W, Li J J, Zhang M and Lu H C. 2019. Depth-induced multi-scale recurrent attention network for saliency detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 7253-7262[ DOI:10.1109/ICCV.2019.00735 http://dx.doi.org/10.1109/ICCV.2019.00735 ]
Qu L Q, He S F, Zhang J W, Tian J D, Tang Y D and Yang Q X. 2017. RGBD salient object detection via deep fusion. IEEE Transactions on Image Processing, 26(5): 2274-2285[DOI: 10.1109/TIP.2017.2682981]
Ranftl R, Bochkovskiy A and Koltun V. 2021. Vision transformers for dense prediction//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal, Canada: IEEE: #01196[ DOI:10.1109/ICCV48922.2021.01196 http://dx.doi.org/10.1109/ICCV48922.2021.01196 ]
Ravi S and Larochelle H. 2017. Optimization as a model for few-shot learning//Proceedings of the 5th International Conference on Learning Representations. Toulon, France: OpenReview. net: 1-11
Snell J, Swersky K and Zemel R. 2017. Prototypical networks for few-shot learning//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 4080-4090
Tsai Y H H, Huang L K and Salakhutdinov R. 2017. Learning robust visual-semantic embeddings//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 3591-3600[ DOI:10.1109/ICCV.2017.386 http://dx.doi.org/10.1109/ICCV.2017.386 ]
Wang L J, Lu H C, Wang Y F, Feng M Y, Wang D, Yin B C and Ruan X. 2017. Learning to detect salient objects with image-level supervision//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 3796-3805[ DOI:10.1109/CVPR.2017.404 http://dx.doi.org/10.1109/CVPR.2017.404 ]
Wang W G, Lai Q X, Fu H Z, ShenJ B, Ling H B and Yang R G. 2022. Salient object detection in the deep learning era: an in-depth survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(6): 3239-3259[DOI: 10.1109/TPAMI.2021.3051099]
Wang Y Q, Yao Q M, Kwok J T and Ni L M. 2021. Generalizing from a few examples: a survey on few-shot learning. ACM Computing Surveys, 53(3): 63[DOI: 10.1145/3386252]
Wang Y X and Hebert M. 2016. Learning from small sample sets by combining unsupervised meta-training with CNNs//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc. : 244-252
Wu Y, Lin Y T, Dong X Y, Yan Y, Ouyang W L and Yang Y. 2018. Exploit the unknown gradually: one-shot video-based person re-identification by stepwise learning//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5177-5186[ DOI:10.1109/CVPR.2018.00543 http://dx.doi.org/10.1109/CVPR.2018.00543 ]
Xu P B, Sang J T and Lu D Y. 2021. Few shot image recognition based on class semantic similarity supervision. Journal of Image and Graphics, 26(7): 1594-1603
徐鹏帮, 桑基韬, 路冬媛. 2021. 类别语义相似性监督的小样本图像识别. 中国图象图形学报, 26(7): 1594-1603[DOI: 10.11834/jig.200504]
Zhang J, Fan D P, Dai Y C, Anwar S, Saleh F S, Zhang T and Barnes N. 2020. UC-Net: uncertainty inspired RGB-D saliency detection via conditional variational autoencoders//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 8579-8588[ DOI:10.1109/CVPR42600.2020.00861 http://dx.doi.org/10.1109/CVPR42600.2020.00861 ]
Zhao J X, Cao Y, Fan D P, Cheng M M, Li X Y and Zhang L. 2019. Contrast prior and fluid pyramid integration for RGBD salient object detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 3922-3931[ DOI:10.1109/CVPR.2019.00405 http://dx.doi.org/10.1109/CVPR.2019.00405 ]
Zhao X Q, Pang Y W, Zhang L H, Lu H C and Ruan X. 2021. Self-supervised pretraining for RGB-D salient object detection[EB/OL]. http://arxiv.org/pdf/2101.12482.pdf http://arxiv.org/pdf/2101.12482.pdf
Zhao X Q, Zhang L H, Pang Y W, Lu H C and Zhang L. 2020. A single stream network for robust and real-time RGB-D salient object detection//Proceedings of the 16th European Conference on Computer Vision. Glasgow, UK: Springer: 646-662[ DOI:10.1007/978-3-030-58542-6_39 http://dx.doi.org/10.1007/978-3-030-58542-6_39 ]
Zhu L C and Yang Y. 2018. Compound memory networks for few-shot video classification//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 782-797[ DOI:10.1007/978-3-030-01234-2_46 http://dx.doi.org/10.1007/978-3-030-01234-2_46 ]
相关作者
相关机构
京公网安备11010802024621