由粗到精的多尺度散焦模糊检测

衡红军; 叶何斌; 周末; 黄睿

doi:10.11834/jig.200126

图像理解和计算机视觉 | 浏览量 : 0 下载量: 50 CSCD: 1

PDF
导出
分享
收藏
专辑

由粗到精的多尺度散焦模糊检测
Coarse-to-fine multiscale defocus blur detection
2021年26卷第3期页码：581-593
收稿：2020-04-20，

修回：2020-6-8，

录用：2020-6-13，

纸质出版：2021-03-16
DOI： 10.11834/jig.200126
稿件说明：

移动端阅览

衡红军, 叶何斌, 周末, 黄睿. 由粗到精的多尺度散焦模糊检测[J]. 中国图象图形学报, 2021,26(3):581-593. DOI： 10.11834/jig.200126.

Hongjun Heng, Hebin Ye, Mo Zhou, Rui Huang. Coarse-to-fine multiscale defocus blur detection[J]. Journal of Image and Graphics, 2021, 26(3): 581-593. DOI： 10.11834/jig.200126.

摘要

目的

散焦模糊检测致力于区分图像中的清晰与模糊像素，广泛应用于诸多领域，是计算机视觉中的重要研究方向。待检测图像含复杂场景时，现有的散焦模糊检测方法存在精度不够高、检测结果边界不完整等问题。本文提出一种由粗到精的多尺度散焦模糊检测网络，通过融合不同尺度下图像的多层卷积特征提高散焦模糊的检测精度。

方法

将图像缩放至不同尺度，使用卷积神经网络从每个尺度下的图像中提取多层卷积特征，并使用卷积层融合不同尺度图像对应层的特征；使用卷积长短时记忆（convolutional long-short term memory，Conv-LSTM）层自顶向下地整合不同尺度的模糊特征，同时生成对应尺度的模糊检测图，以这种方式将深层的语义信息逐步传递至浅层网络；在此过程中，将深浅层特征联合，利用浅层特征细化深一层的模糊检测结果；使用卷积层将多尺度检测结果融合得到最终结果。本文在网络训练过程中使用了多层监督策略确保每个Conv-LSTM层都能达到最优。

结果

在DUT（Dalian University of Technology）和CUHK（The Chinese University of Hong Kong）两个公共的模糊检测数据集上进行训练和测试，对比了包括当前最好的模糊检测算法BTBCRL（bottom-top-bottom network with cascaded defocus blur detection map residual learning），DeFusionNet（defocus blur detection network via recurrently fusing and refining multi-scale deep features）和DHDE（multi-scale deep and hand-crafted features for defocus estimation）等10种算法。实验结果表明：在DUT数据集上，本文模型相比于DeFusionNet模型，MAE（mean absolute error）值降低了38.8%，F

0.3

值提高了5.4%；在CUHK数据集上，相比于LBP（local binary pattern）算法，MAE值降低了36.7%，F

0.3

值提高了9.7%。通过实验对比，充分验证了本文提出的散焦模糊检测模型的有效性。

结论

本文提出的由粗到精的多尺度散焦模糊检测方法，通过融合不同尺度图像的特征，以及使用卷积长短时记忆层自顶向下地整合深层的语义信息和浅层的细节信息，使得模型在不同的图像场景中能得到更加准确的散焦模糊检测结果。

Abstract

Objective

Defocus blur detection (DBD) is devoted to distinguishing the sharp and the blurred pixels

which has wide applications and is an important problem in computer vision. The DBD result can be applied to many computer vision tasks

such as deblurring

blur magnification

and object saliency detection. According to the adopted image features

the DBD methods can be generally divided into two categories: hand-crafted feature-based traditional DBD methods and deep-feature-based DBD methods. The former utilize low-level blur features

such as gradient

frequency

singular value

local binary pattern

and simple classifiers to distinguish sharp image regions and blurred image regions. These low-level blur features are extracted from image patches

which results in loss of high-level semantic information. Although the traditional DBD methods do not need many training exemplars

they perform unsatisfactorily when images have complex scenes

especially in homogeneous regions and dark regions. More recent DBD methods propose to learn the representation of blur and sharp by using a large volume of images to extract task-adaptive features. Blur prediction can be generated by an end-to-end convolutional neural network (CNN)

which is more efficient than traditional DBD methods. CNN can extract multiscale convolutional features

which is very useful for different vision problems

due to the hierarchical nonlinear ensemble of convolutional

rectified linear unit

and pooling layers. Generally

bottom layers extract low-level texture features that can improve the details of the detection results

whereas top layers extract high-level semantic features that are useful for conquering noise and background cluster. Most of the present methods integrate multiscale low-level texture features and high-level semantic features in their networks to generate robust defocus blur results. Although the existing DBD methods achieve better blur detection results than the hand-crafted feature-based methods

they still suffer from scale ambiguity and incomplete detection boundaries when processing images with complex scenes. In this paper

we propose a novel DBD framework that extracts multiscale convolutional features from images with different scales. Then

we use four branches of multiscale result refinement subnetworks to generate blur results at different feature scales. Lastly

we use a multiscale result fusion layer to generate the final blur results.

Method

The proposed network architecture consists of three parts: multiscale feature extraction subnetwork (FEN)

multiscale result refinement subnetwork (RRN)

and multiscale result fusion layer (RFL). We use visual geometry group(VGG16) as our basic feature extractor. We remove fully connected layers and the last pooling layer to increase image feature resolution. FEN consists of three basic feature extractors and a feature integration branch that integrates convolutional features of the same layers extracted from different scaled images. RRN is built by five convolutional long-short term memories (Conv-LSTM) layers to generate multiscale blur estimation from multiscale convolutional features. RFL consists of two convolutional layers with filter sizes of 3×3×32 and 1×1×1. We first resize the input image with different ratios and extract multiscale convolutional features from each resized image by FEN.FEN also integrates the features of the corresponding layers to explore the merits of features extracted from different images. Then

we feed the highest convolutional features of each branch of FEN into RRN to produce coarse blur maps. The blur maps producing the features are robust to noise and background clusters due to highest layers' extracted semantic features. However

these blur maps are in low resolutions that provide better guidance for fine-scale blur estimation. Thus

we gradually incorporate lower layers' higher resolution features into Conv-LSTMs to generate more precise blur maps. Multiscale convolutional features are integrated by the Conv-LSTMs from top to bottom in each branch of RRN. RFL is responsible for fusing the blur maps generated by the four branches. We concatenate the last prediction maps of each branch of RRN with the first integrated features of the first layer from the FEN as input for RFL to generate the final blur map because features of shallow layer contains a large amount of detail structure information that can improve the DBD result. We use the combination of F-measure

precision

recall

mean absolute error (MAE)

and cross-entropy as our loss function for network pretraining and training. We add supervised signal at each prediction layers

which can directly pass the gradient to the corresponding layers and make the network optimization easier. We randomly select 2 000 images from the Berkeley segmentation dataset

uncompressed color image database

and Pascal2008 to synthesize blur images to pretrain the proposed network. The real training set consists of 1 204 images

which are selected from Dalian University of Technology(DUT) and The Chinese University of Hong Kong (CUHK) datasets. We augment the real training images by rotation

flipping

and cropping

enlarging the training data by 15 times. This operation greatly improves network performance. Our network is implemented by Keras. We resize the input images and ground truths to 320×320 pixels. We use adaptive moment estimation optimizer. We set the learning rate to 1×10

-5

and divide by 10 in every five epochs until 1×10

-8

. We initialize FEN by VGG16 weights trained on ImageNet. We initialize the remaining layers by "Xavier Uniform". We conduct pretraining and training using Nvidia RTX 2080Ti. The whole training takes approximately one day.

Result

We train and test our network on two public blur detection datasets

DUT and CUHK

and compare our method with 10 state-of-the-art DBD methods. On the DUT dataset

our method achieves 38.8% relative MAE reduction and 5.4% relative F

0.3

improvement over DeFusionNet(DBD network via recurrently fusing and refining multi-scale deep features). On this dataset

our method is the only one whose F

0.3

is higher than 0.87 and whose MAE is lower than 0.1. On the CUHK dataset

our method achieves 36.7% relative MAE reduction and 9.7% relative F

0.3

improvement over the local binary pattern. The proposed DBD method performs well in several challenging cases including homogeneous region and background cluster. Our blur detection is more precise at the detection boundaries. We conduct different ablation analysis to verify the effectiveness of our model.

Conclusion

We propose the coarse-to-fine multiscale DBD method

which extracts multiscale convolutional features from images with different resize ratios

and generate multiscale blur estimation with Conv-LSTMs. Conv-LSTMs integrate the semantic information of deep layer with detail information of the shallow layer to refine the blur maps. We produce the final blur map by integrating the blur maps generated from different-sized images and the low-level fused features. Our method generates more precise DBD results in different image scenes compared with other DBD methods in various scenes.

关键词

Keywords

references

Achanta R, Hemami S, Estrada F and Susstrunk S. 2009. Frequency-tuned salient region detection//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 1597-1604[ DOI: 10.1109/cvpr.2009.5206596 http://dx.doi.org/10.1109/cvpr.2009.5206596 ]

Arbeláez P, Maire M, Fowlkes C and Malik J. 2011. Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5): 898-916[DOI:10.1109/tpami.2010.161]

Bae S and Durand F. 2007. Defocus magnification. Computer Graphics Forum, 26(3): 571-579[DOI:10.1111/j.1467-8659.2007.01080]

Dong C, Loy C C, He K M and Tang X O. 2016. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2): 295-307[DOI:10.1109/TPAMI.2015.2439281]

Everingham M, Van Gool L, Williams C K I, Winn J and Zisserman A. 2008. The PASCAL Visual Object Classes Challenge 2008(VOC2008). http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html

Golestaneh S A and Karam L J. 2017. Spatially-varying blur detection based on multiscale fused and sorted transform coefficients of gradient magnitudes//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 596-605[ DOI: 10.1109/cvpr.2017.71 http://dx.doi.org/10.1109/cvpr.2017.71 ]

Huang R, Feng W, Fan M Y, Wan L and Sun J Z. 2018. Multiscale blur detection by learning discriminative deep features. Neurocomputing, 285: 154-166[DOI:10.1016/j.neucom.2018.01.041]

Huang R, Xing Y and Wang Z Z. 2019. RGB-D salient object detection by a CNN with multiple layers fusion. IEEE Signal Processing Letters, 26(4): 552-556[DOI:10.1109/lsp.2019.2898508]

Jiang P, Ling H B, Yu J and Peng J L. 2013. Salient region detection by UFO: uniqueness, focusness and objectness//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 1976-1983[ DOI: 10.1109/iccv.2013.248 http://dx.doi.org/10.1109/iccv.2013.248 ]

Kingma D P and Ba J. 2015. Adam: a method for stochastic optimization//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR: [s.n.]

Kriener F, Binder T and Wille M. 2013. Accelerating defocus blur magnification//Proceedings of SPIE 8667, Multimedia Content and Mobile Devices.Burlingame, USA: SPIE: #86671Q[ DOI: 10.1117/12.2004118 http://dx.doi.org/10.1117/12.2004118 ]

Lee C Y, Xie S N, Gallagher P, Zhang Z Y and Tu Z W. 2015. Deeply-supervised nets//Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. San Diego, USA: JMLR: 562-570

Pan J S, Sun D Q, Pfister H and Yang M H. 2016. Blind image deblurring using dark channel prior//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1628-1636[ DOI: 10.1109/cvpr.2016.180 http://dx.doi.org/10.1109/cvpr.2016.180 ]

Pang Y W, Zhu H L, Li X Y and Li X L. 2016. Classifying discriminative features for blur detection. IEEE Transactions on Cybernetics, 46(10): 2220-2227[DOI:10.1109/tcyb.2015.2472478]

Park J, Tai Y W, Cho D and Kweon I S. 2017. A unified approach of multi-scale deep and hand-crafted features for defocus estimation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1736-1745[ DOI: 10.1109/cvpr.2017.295 http://dx.doi.org/10.1109/cvpr.2017.295 ]

Schaefer G and Stich M. 2004. UCID: an uncompressed color image database//Proceedings of SPIE, Storage and Retrieval Methods and Applicationsfor Multimedia 2004. San Jose, USA: SPIE: 472-480[ DOI: 10.1117/12.525375 http://dx.doi.org/10.1117/12.525375 ]

Shi J P, Xu L and Jia J Y. 2014. Discriminative blur detection features//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 2965-2972[ DOI: 10.1109/cvpr.2014.379 http://dx.doi.org/10.1109/cvpr.2014.379 ]

Shi J P, Xu L and Jia J Y. 2015a. Just noticeable defocus blur detection and estimation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 657-665[ DOI: 10.1109/cvpr.2015.7298665 http://dx.doi.org/10.1109/cvpr.2015.7298665 ]

Shi X J, Chen Z R, Wang H, Yeung D Y, Wong W K and Woo W C. 2015b. Convolutional LSTM network: a machine learning approach for precipitation nowcasting//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, USA: ACM: 802-810.

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations.San Diego, USA: ICLR

Su B L, Lu S J and Tan C L. 2011. Blurred image region detection and classification//Proceedings of the 19th ACM international conference on Multimedia. Scottsdale, USA: ACM: 1397-1400[ DOI: 10.1145/2072298.2072024 http://dx.doi.org/10.1145/2072298.2072024 ]

Tang C, Wu J, Hou Y H, Wang P C and Li W Q. 2016. A spectral and spatial approach of coarse-to-fine blurred image region detection. IEEE Signal Processing Letters, 23(11): 1652-1656[DOI:10.1109/lsp.2016.2611608]

Tang C, Zhu X Z, Liu X W, Wang L Z and Zomaya A. 2019. DeFusionNET: defocus blur detection via recurrently fusing and refining multi-scale deep features//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2700-2709[ DOI: 10.1109/cvpr.2019.00281 http://dx.doi.org/10.1109/cvpr.2019.00281 ]

Wang W Q, Shen J B, Dong X P and Borji A. 2018. Salient object detection driven by fixation prediction//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1711-1720[ DOI: 10.1109/cvpr.2018.00184 http://dx.doi.org/10.1109/cvpr.2018.00184 ]

Wang X W, Zhang S L, Liang X, Zhou H J, Zheng J J and Sun M Z. 2019. Accurate and fast blur detection using a pyramid M-shaped deep neural network. IEEE Access, 7: 86611-86624[DOI:10.1109/access.2019.2926747]

Xiao H M, Lu W, Li R P, Zhong N, Yeung Y, Chen J J, Xue F and Sun W. 2019. Defocus blur detection based on multiscale SVD fusion in gradient domain. Journal of Visual Communication and Image Representation, 59: 52-61[DOI:10.1016/j.jvcir.2018.12.048]

Yi X and Eramian M. 2016. LBP-based segmentation of defocus blur. IEEE Transactions on Image Processing, 25(4): 1626-1638[DOI:10.1109/tip.2016.2528042]

Zeng K, Wang Y N, Mao J X, Liu J Y, Peng W X and Chen N K. 2019. A local metric for defocus blur detection based on CNN feature learning. IEEE Transactions on Image Processing, 28(5): 2107-2115[DOI:10.1109/tip.2018.2881830]

Zhao B, Wu X, Feng J S, Peng Q and Yan S C. 2017. Diversified visual attention networks for fine-grained object classification. IEEE Transactions on Multimedia, 19(6): 1245-1256[DOI:10.1109/tmm.2017.2648498]

Zhao W D, Zhao F, Wang D and Lu H C. 2020. Defocus blur detection via multi-stream bottom-top-bottom network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 1884-1897[DOI:10.1109/tpami.2019.2906588]