面向水下图像目标检测的退化特征增强算法
Underwater-relevant image object detection based feature-degraded enhancement method
- 2022年27卷第11期 页码:3185-3198
纸质出版日期: 2022-11-16 ,
录用日期: 2021-08-21
DOI: 10.11834/jig.210415
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2022-11-16 ,
录用日期: 2021-08-21
移动端阅览
钱晓琪, 刘伟峰, 张敬, 曹洋. 面向水下图像目标检测的退化特征增强算法[J]. 中国图象图形学报, 2022,27(11):3185-3198.
Xiaoqi Qian, Weifeng Liu, Jing Zhang, Yang Cao. Underwater-relevant image object detection based feature-degraded enhancement method[J]. Journal of Image and Graphics, 2022,27(11):3185-3198.
目的
2
基于清晰图像训练的深度神经网络检测模型因为成像差异导致的域偏移问题使其难以直接泛化到水下场景。为了有效解决清晰图像和水下图像的特征偏移问题,提出一种即插即用的特征增强模块(feature de-drifting module Unet,FDM-Unet)。
方法
2
首先提出一种基于成像模型的水下图像合成方法,从真实水下图像中估计色偏颜色和亮度,从清晰图像估计得到场景深度信息,根据改进的光照散射模型将清晰图像合成为具有真实感的水下图像。然后,借鉴U-Net结构,设计了一个轻量的特征增强模块FDM-Unet。在清晰图像和对应的合成水下图像对上,采用常见的清晰图像上预训练的检测器,提取它们对应的浅层特征,将水下图像对应的退化浅层特征输入FDM-Unet进行增强,并将增强之后的特征与清晰图像对应的特征计算均方误差(mean-square error,MSE)损失,从而监督FDM-Unet进行训练。最后,将训练好的FDM-Unet直接插入上述预训练的检测器的浅层位置,不需要对网络进行重新训练或微调,即可以直接处理水下图像目标检测。
结果
2
实验结果表明,FDM-Unet在PASCAL VOC 2007(pattern analysis,statistical modeling and computational learning visual object classes 2007)合成水下图像测试集上,针对YOLO v3(you only look once v3)和SSD(single shot multibox detector)预训练检测器,检测精度mAP(mean average precision)分别提高了8.58%和7.71%;在真实水下数据集URPC19(underwater robot professional contest 19)上,使用不同比例的数据进行微调,相比YOLO v3和SSD,mAP分别提高了4.4%~10.6%和3.9%~10.7%。
结论
2
本文提出的特征增强模块FDM-Unet以增加极小的参数量和计算量为代价,不仅能直接提升预训练检测器在合成水下图像的检测精度,也能在提升在真实水下图像上微调后的检测精度。
Objective
2
Underwater-relevant object detection aims to localize and recognize the objects of underwater scenarios. Our research is essential for its widespread applications in oceanography
underwater navigation and fish farming. Current deep convolutional neural network based (DCNN-based) object detection is via large-scale trained datasets like pattern analysis
statistical modeling and computational learning visual object classes 2007 (PASCAL VOC 2007) and Microsoft common objects in context (MS COCO) with degradation-ignored. Nevertheless
the issue of degradation-related has to be resolved as mentioned below: 1) the scarce underwater-relevant detection datasets affects its detection accuracy
which inevitably leads to overfitting of deep neural network models. 2) Underwater-relevant images have the features of low contrast
texture distortion and blur under the complicated underwater environment and illumination circumstances
which limits the detection accuracy of the detection algorithms. In practice
image augmentation method is to alleviate the insufficient problem of datasets. However
image augmentation has limited performance improvement of deep neural network models on small datasets. Another feasible detection solution is to restore (enhance) the underwater-relevant image for a clear image (mainly based on deep learning methods)
improve its visibility and contrast
and reduce color cast. Actually
some detection results are relied on synthetic datasets training due to the lack of ground truth images. Its enhancement effect of ground truth images largely derived of the quality of synthetic images. Our pre-trained model is effective for underwater scenes because it is difficult to train a high-accuracy detector. Clear images-based deep neural network detection models' training are difficult to generalize underwater scenes directly because of the domain shift issue caused by imaging differences. We develop a plug-and-play feature enhancement module
which can effectively address the domain shift issue between clear images and underwater images via restoring the features of underwater images extracted from the low-level network. The clear image-based detection network training can be directly applied to underwater image object detection.
Method
2
First
to synthesize the underwater version based on an improved light scattering model for underwater imaging
we propose an underwater image synthesis method
which first estimates color cast and luminance from real underwater images and integrate them with the estimated scene depth of a clear image. Next
we design a lightweight feature enhancement module named feature de-drifting module Unet (FDM-Unet) originated from the Unet structure. Third
to extract the shallow features of clear images and their corresponding synthetic underwater images
we use common detectors (e.g.
you only look once v3 (YOLO v3) and single shot multibox detector (SSD)) pre-trained on clear images. The shallow feature of the underwater image is input into FDM-Unet for feature de-drifting. To supervise the training of FDM-Unet
our calculated mean square error loss is in terms of the interconnections of the enhanced feature and the original shallow feature. Finally
the embedded training results do not include re-training or fine-tuning further after getting the shallow layer of the pre-trained detectors.
Result
2
The experimental results show that our FDM-Unet can improve the detection accuracy by 8.58% mean average precision (mAP) and 7.71% mAP on the PASCAL VOC 2007 synthetic underwater image test set for pre-trained detectors YOLO v3 and SSD
respectively. In addition
on the real underwater dataset underwater robot professional contest 19 (URPC19)
using different proportions of data for fine-tuning
FDM-Unet can improve the detection accuracy by 4.4%~10.6% mAP and 3.9%~10.7% mAP in contrast to the vanilla detectors YOLO v3 and SSD
respectively.
Conclusion
2
Our FDM-Unet can be as a plug-and-play module at the cost of increasing the very small number of parameters and calculation. The detection accuracy of the pre-trained model is improved greatly with no need of retraining or fine-tuning the detection model on the synthetic underwater image. Real underwater fine-tuning experiments show that our FDM-Unet can improve the detection performance compared to the baseline. In addition
the fine-tuning performance can improve the pre-trained detection model for real underwater image beyond synthesis image.
卷积神经网络(CNN)目标检测特征增强成像模型图像合成
convolutional neural network(CNN)object detectionfeature enhancementimaging modelimage synthesis
Ancuti C, Ancuti C O, Haber T and Bekaert P. 2012. Enhancing underwater images and videos by fusion//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 81-88 [DOI:10.1109/CVPR.2012.6247661http://dx.doi.org/10.1109/CVPR.2012.6247661]
Arruda V F, Paixão T M, Berriel R F, De Souza A F, Badue C, Sebe N and Oliveira-Santos T. 2019. Cross-domain car detection using unsupervised image-to-image translation: from day to night//Proceedings of 2019 International Joint Conference on Neural Networks (IJCNN). Budapest, Hungary: IEEE: 1-8 [DOI:10.1109/IJCNN.2019.8852008http://dx.doi.org/10.1109/IJCNN.2019.8852008]
Cai Q, Pan Y W, Ngo C W, Tian X M, Duan L Y and Yao T. 2019. Exploring object relation in mean teacher for cross-domain detection//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 11449-11458 [DOI:10.1109/CVPR.2019.01172http://dx.doi.org/10.1109/CVPR.2019.01172]
Carlevaris-Bianco N, Mohan A and Eustice R M. 2010. Initial results in underwater single image dehazing//Proceedings of Oceans 2010 Mts/IEEE Seattle. Seattle, USA: IEEE: 1-8 [DOI:10.1109/OCEANS.2010.5664428http://dx.doi.org/10.1109/OCEANS.2010.5664428]
Chen L, Liu Z H, Tong L, Jiang Z H, Wang S K, Dong J Y and Zhou H Y. 2020a. Underwater object detection using invert multi-class Adaboost with deep learning//Proceedings of 2020 International Joint Conference on Neural Networks (IJCNN). Glasgow, UK: IEEE: 1-8 [DOI:10.1109/IJCNN48605.2020.9207506http://dx.doi.org/10.1109/IJCNN48605.2020.9207506]
Chen X Y, Lu Y, Wu Z X, Yu J Z and Wen L. 2020b. Reveal of domain effect: how visual restoration contributes to object detection in aquatic scenes [EB/OL]. [2021-04-05].https://arxiv.org/pdf/2003.01913.pdfhttps://arxiv.org/pdf/2003.01913.pdf
Chen Y H, Li W, Sakaridis C, Dai D X and Van Gool L. 2018. Domain adaptive faster R-CNN for object detection in the wild//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 3339-3348 [DOI:10.1109/CVPR.2018.00352http://dx.doi.org/10.1109/CVPR.2018.00352]
Dai D X, Wang Y J, Chen Y H and Van Gool L. 2016. Is image super-resolution helpful for other vision tasks?//Proceedings of 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). Lake Placid, USA: IEEE: 1-9 [DOI:10.1109/WACV.2016.7477613http://dx.doi.org/10.1109/WACV.2016.7477613]
Dai J F, Li Y, He K M and Sun J. 2016b. R-FCN: object detection via region-based fully convolutional networks//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc: 379-387
Dwibedi D, Misra I and Hebert M. 2017. Cut, paste and learn: surprisingly easy synthesis for instance detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 1301-1310 [DOI:10.1109/ICCV.2017.146http://dx.doi.org/10.1109/ICCV.2017.146]
Everingham M, Van Gool L, Williams C K I, Winn J and Zisserman A. 2010. The pascal visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2): 303-338 [DOI: 10.1007/s11263-009-0275-4]
Gibson K B. 2015. Preliminary results in using a joint contrast enhancement and turbulence mitigation method for underwater optical imaging//Proceedings of OCEANS 2015-MTS/IEEE Washington, USA: IEEE: 1-5 [DOI:10.23919/OCEANS.2015.7404514http://dx.doi.org/10.23919/OCEANS.2015.7404514]
Girshick R, Donahue J, Darrell T and Malik J. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 580-587 [DOI:10.1109/CVPR.2014.81http://dx.doi.org/10.1109/CVPR.2014.81]
Guo J C, Li C Y, Guo C L and Chen S J. 2017. Research progress of underwater image enhancement and restoration methods. Journal of Image and Graphics, 22(3): 273-287
郭继昌, 李重仪, 郭春乐, 陈善继. 2017. 水下图像增强和复原方法研究进展. 中国图象图形学报, 22(3): 273-287) [DOI: 10.11834/jig.20170301]
Guo Y C, Li H Y and Zhuang P X. 2020. Underwater image enhancement using a multiscale dense generative adversarial network. IEEE Journal of Oceanic Engineering, 45(3): 862-870 [DOI: 10.1109/JOE.2019.2911447]
He K M, Sun J and Tang X O. 2011. Single image haze removal using dark channel prior.IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(12): 2341-2353 [DOI: 10.1109/TPAMI.2010.168]
He Z W and Zhang L. 2019. Multi-adversarial faster-RCNN for unrestricted object detection//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 6667-6676 [DOI:10.1109/ICCV.2019.00677http://dx.doi.org/10.1109/ICCV.2019.00677]
Hou M J, Liu R S, Fan X and Luo Z X. 2018. Joint residual learning for underwater image enhancement//Proceedings of the 25th IEEE International Conference on Image Processing (ICIP). Athens, Greece: IEEE: 4043-4047 [DOI:10.1109/ICIP.2018.8451209http://dx.doi.org/10.1109/ICIP.2018.8451209]
Huang H, Zhou H, Yang X, Zhang L, Qi L and Zang A Y. 2019. Faster R-CNN for marine organisms detection and recognition using data augmentation. Neurocomputing, 337: 372-384 [DOI: 10.1016/j.neucom.2019.01.084]
Inoue N, Furuta R, Yamasaki T and Aizawa K. 2018. Cross-domain weakly-supervised object detection through progressive domain adaptation//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 5001-5009 [DOI:10.1109/CVPR.2018.00525http://dx.doi.org/10.1109/CVPR.2018.00525]
Islam J, Fulton M and Sattar J. 2019. Toward a generic diver-following algorithm: balancing robustness and efficiency in deep visual detection. IEEE Robotics and Automation Letters, 4(1): 113-120 [DOI: 10.1109/LRA.2018.2882856]
Jian M W, Liu X Y, Luo H J, Lu X W, Yu H and Dong J Y. 2021. Underwater image processing and analysis: a review. Signal Processing: Image Communication, 91: #116088 [DOI: 10.1016/j.image.2020.116088]
Jian M W, Qi Q, Dong J Y, Yin Y L and Lam K M. 2018. IntegratingQDWD with pattern distinctness and local contrast for underwater saliency detection. Journal of Visual Communication and Image Representation, 53: 31-41 [DOI: 10.1016/j.jvcir.2018.03.008]
Li B Y, Peng X L, Wang Z Y, Xu J Z and Feng D. 2017. AOD-Net: all-in-one dehazing network//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 4780-4788 [DOI:10.1109/ICCV.2017.511http://dx.doi.org/10.1109/ICCV.2017.511]
Li C Y, Guo C L, Ren W Q, Cong R M, Hou J H, Kwong S and Tao D C. 2020. An underwater image enhancement benchmark dataset and beyond. IEEE Transactions on Image Processing, 29: 4376-4389 [DOI: 10.1109/TIP.2019.2955241]
Li J, Zhang X D, Li S F and Wu Z Z. 2021. Underwater color image enhancement based on two-scale image decomposition. Journal of Image and Graphics, 26(4): 787-795
李健, 张显斗, 李熵飞, 吴子朝. 2021. 采用双尺度图像分解的水下彩色图像增强. 中国图象图形学报, 26(4): 787-795) [DOI: 10.11834/jig.200144]
Li X, Shang M, Qin H W and Chen L S. 2015. Fast accurate fish detection and recognition of underwater images with fast R-CNN//Proceedings of OCEANS 2015-MTS/IEEE Washington. Washington, USA: IEEE: 1-5 [DOI:10.23919/OCEANS.2015.7404464http://dx.doi.org/10.23919/OCEANS.2015.7404464]
Li Z Q and Snavely N. 2018b. MegaDepth: learning single-view depth prediction from internet photos//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2041-2050 [DOI:10.1109/CVPR.2018.00218http://dx.doi.org/10.1109/CVPR.2018.00218]
Lin T Y, Goyal P, Girshick R, He K M and Dollár P. 2017. Focal loss for dense object detection//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2999-3007 [DOI:10.1109/ICCV.2017.324http://dx.doi.org/10.1109/ICCV.2017.324]
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755 [DOI:10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]
Lin W H, Zhong J X, Liu S, Li T and Li G. 2020. ROIMIX: proposal-fusion among multiple images for underwater object detection//Proceedings of 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Barcelona, Spain: IEEE: 2588-2592 [DOI:10.1109/ICASSP40776.2020.9053829http://dx.doi.org/10.1109/ICASSP40776.2020.9053829]
Liu D, Wen B H, Liu X M, Wang Z Y and Huang T S. 2018. When image denoising meets high-level vision tasks: a deep learning approach//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Switzerland: IJCAI: 842-848 [DOI:10.24963/ijcai.2018/117http://dx.doi.org/10.24963/ijcai.2018/117]
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y and Berg A C. 2016. SSD: single shot MultiBox detector//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 21-37 [DOI:10.1007/978-3-319-46448-0_2http://dx.doi.org/10.1007/978-3-319-46448-0_2]
Lyu X Q, Wang A, Liu Q L, Sun J M and Zhang S P. 2019. Proposal-refined weakly supervised object detection in underwater images//Proceedings of the 10th International Conference on Image and Graphics. Beijing, China: Springer: 418-428 [DOI:10.1007/978-3-030-34120-6_34http://dx.doi.org/10.1007/978-3-030-34120-6_34]
Peng Y T and Cosman P C. 2017. Underwater image restoration based on image blurriness and light absorption. IEEE Transactions on Image Processing, 26(4): 1579-1594 [DOI: 10.1109/TIP.2017.2663846]
Pizer S M, Johnston R E, Ericksen J P, Yankaskas B C and Muller K E. 1990. Contrast-limited adaptive histogram equalization: speed and effectiveness//Proceedings of the 1st Conference on Visualization in Biomedical Computing. Atlanta, USA: IEEE: 337-345 [DOI:10.1109/VBC.1990.109340http://dx.doi.org/10.1109/VBC.1990.109340]
Redmon J and Farhadi A. 2018. Yolo v3: an incremental improvement [EB/OL]. [2021-04-05].https://arxiv.org/pdf/1804.02767.pdfhttps://arxiv.org/pdf/1804.02767.pdf
Shen Z Q, Maheshwari H, Yao W C and Savvides M. 2019. SCL: towards accurate domain adaptive object detection via gradient detach based stacked complementary losses [EB/OL]. [2021-04-05].https://arxiv.org/pdf/1911.02559v3.pdfhttps://arxiv.org/pdf/1911.02559v3.pdf
Sun Z, Ozay M, Zhang Y, Liu X and Okatani T. 2018. Feature quantization for defending against distortion of images//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7957-7966 [DOI:10.1109/CVPR.2018.00830http://dx.doi.org/10.1109/CVPR.2018.00830]
Wang H, Wang Q L, Yang F, Zhang W Q and Zuo W M. 2019a. Data augmentation for object detection via progressive and selective instance-switching [EB/OL]. [2021-04-05].https://arxiv.org/pdf/1906.00358v2.pdfhttps://arxiv.org/pdf/1906.00358v2.pdf
Wang T, Zhang X P, Yuan L and Feng J S. 2019b. Few-shot adaptive faster R-CNN//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 7166-7175 [DOI:10.1109/CVPR.2019.00734http://dx.doi.org/10.1109/CVPR.2019.00734]
Wang Y, Cao Y, Zha Z J, Zhang J and Xiong Z W. 2020. Deep degradation prior for low-quality image classification//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 11046-11055 [DOI:10.1109/CVPR42600.2020.01106http://dx.doi.org/10.1109/CVPR42600.2020.01106]
Wang Y, Zhang J, Cao Y and Wang Z F. 2017. A deep CNN method for underwater image enhancement//Proceedings of 2017 IEEE International Conference on Image Processing (ICIP). Beijing, China: IEEE: 1382-1386 [DOI:10.1109/ICIP.2017.8296508http://dx.doi.org/10.1109/ICIP.2017.8296508]
Yu F X, Wang D, Chen Y P, Karianakis N, Shen T, Yu P, Lymberopoulos D, Lu S D, Shi W S and Chen X. 2021. Unsupervised domain adaptation for object detection via cross-domain semi-supervised learning [EB/OL]. [2021-04-05].https://arxiv.org/pdf/1911.07158v4.pdfhttps://arxiv.org/pdf/1911.07158v4.pdf
Yun S, Han D, Chun S, Oh S J, Yoo Y and Choe J. 2019. CutMix: regularization strategy to train strong classifiers with localizable features//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 6022-6031 [DOI:10.1109/ICCV.2019.00612http://dx.doi.org/10.1109/ICCV.2019.00612]
Zhang H Y, Cisse M, Dauphin Y N and Lopez-Paz D. 2018. mixup: beyond empirical risk minimization//Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: ICLR
Zhang J, Cao Y, Fang S, Kang Y and Chen C W. 2017. Fast haze removal for nighttime image using maximum reflectance prior//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 7016-7024 [DOI:10.1109/CVPR.2017.742http://dx.doi.org/10.1109/CVPR.2017.742]
Zhang J and Tao D C. 2020. FAMED-Net: a fast and accurate multi-scale end-to-end dehazing network. IEEE Transactions on Image Processing, 29: 72-84 [DOI: 10.1109/TIP.2019.2922837]
Zhang Q M, Zhang J, Liu W and Tao D C. 2019. Category anchor-guided unsupervised domain adaptation for semantic segmentation//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: NIPS: 435-445
Zhang S J, Zhang J, Fang S and Cao Y. 2014. Underwater stereo image enhancement using a new physical model//Proceedings of 2014 IEEE International Conference on Image Processing (ICIP). Paris, France: IEEE: 5422-5426 [DOI:10.1109/ICIP.2014.7026097http://dx.doi.org/10.1109/ICIP.2014.7026097]
Zhu X G, Pang J M, Yang C Y, Shi J P and Lin D H. 2019. Adapting object detectors via selective cross-domain alignment//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 687-696 [DOI:10.1109/CVPR.2019.00078http://dx.doi.org/10.1109/CVPR.2019.00078]
相关作者
相关机构