由粗到精的多尺度散焦模糊检测
Coarse-to-fine multiscale defocus blur detection
- 2021年26卷第3期 页码:581-593
收稿:2020-04-20,
修回:2020-6-8,
录用:2020-6-13,
纸质出版:2021-03-16
DOI: 10.11834/jig.200126
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-04-20,
修回:2020-6-8,
录用:2020-6-13,
纸质出版:2021-03-16
移动端阅览
目的
2
散焦模糊检测致力于区分图像中的清晰与模糊像素,广泛应用于诸多领域,是计算机视觉中的重要研究方向。待检测图像含复杂场景时,现有的散焦模糊检测方法存在精度不够高、检测结果边界不完整等问题。本文提出一种由粗到精的多尺度散焦模糊检测网络,通过融合不同尺度下图像的多层卷积特征提高散焦模糊的检测精度。
方法
2
将图像缩放至不同尺度,使用卷积神经网络从每个尺度下的图像中提取多层卷积特征,并使用卷积层融合不同尺度图像对应层的特征;使用卷积长短时记忆(convolutional long-short term memory,Conv-LSTM)层自顶向下地整合不同尺度的模糊特征,同时生成对应尺度的模糊检测图,以这种方式将深层的语义信息逐步传递至浅层网络;在此过程中,将深浅层特征联合,利用浅层特征细化深一层的模糊检测结果;使用卷积层将多尺度检测结果融合得到最终结果。本文在网络训练过程中使用了多层监督策略确保每个Conv-LSTM层都能达到最优。
结果
2
在DUT(Dalian University of Technology)和CUHK(The Chinese University of Hong Kong)两个公共的模糊检测数据集上进行训练和测试,对比了包括当前最好的模糊检测算法BTBCRL(bottom-top-bottom network with cascaded defocus blur detection map residual learning),DeFusionNet(defocus blur detection network via recurrently fusing and refining multi-scale deep features)和DHDE(multi-scale deep and hand-crafted features for defocus estimation)等10种算法。实验结果表明:在DUT数据集上,本文模型相比于DeFusionNet模型,MAE(mean absolute error)值降低了38.8%,F
0.3
值提高了5.4%;在CUHK数据集上,相比于LBP(local binary pattern)算法,MAE值降低了36.7%,F
0.3
值提高了9.7%。通过实验对比,充分验证了本文提出的散焦模糊检测模型的有效性。
结论
2
本文提出的由粗到精的多尺度散焦模糊检测方法,通过融合不同尺度图像的特征,以及使用卷积长短时记忆层自顶向下地整合深层的语义信息和浅层的细节信息,使得模型在不同的图像场景中能得到更加准确的散焦模糊检测结果。
Objective
2
Defocus blur detection (DBD) is devoted to distinguishing the sharp and the blurred pixels
which has wide applications and is an important problem in computer vision. The DBD result can be applied to many computer vision tasks
such as deblurring
blur magnification
and object saliency detection. According to the adopted image features
the DBD methods can be generally divided into two categories: hand-crafted feature-based traditional DBD methods and deep-feature-based DBD methods. The former utilize low-level blur features
such as gradient
frequency
singular value
local binary pattern
and simple classifiers to distinguish sharp image regions and blurred image regions. These low-level blur features are extracted from image patches
which results in loss of high-level semantic information. Although the traditional DBD methods do not need many training exemplars
they perform unsatisfactorily when images have complex scenes
especially in homogeneous regions and dark regions. More recent DBD methods propose to learn the representation of blur and sharp by using a large volume of images to extract task-adaptive features. Blur prediction can be generated by an end-to-end convolutional neural network (CNN)
which is more efficient than traditional DBD methods. CNN can extract multiscale convolutional features
which is very useful for different vision problems
due to the hierarchical nonlinear ensemble of convolutional
rectified linear unit
and pooling layers. Generally
bottom layers extract low-level texture features that can improve the details of the detection results
whereas top layers extract high-level semantic features that are useful for conquering noise and background cluster. Most of the present methods integrate multiscale low-level texture features and high-level semantic features in their networks to generate robust defocus blur results. Although the existing DBD methods achieve better blur detection results than the hand-crafted feature-based methods
they still suffer from scale ambiguity and incomplete detection boundaries when processing images with complex scenes. In this paper
we propose a novel DBD framework that extracts multiscale convolutional features from images with different scales. Then
we use four branches of multiscale result refinement subnetworks to generate blur results at different feature scales. Lastly
we use a multiscale result fusion layer to generate the final blur results.
Method
2
The proposed network architecture consists of three parts: multiscale feature extraction subnetwork (FEN)
multiscale result refinement subnetwork (RRN)
and multiscale result fusion layer (RFL). We use visual geometry group(VGG16) as our basic feature extractor. We remove fully connected layers and the last pooling layer to increase image feature resolution. FEN consists of three basic feature extractors and a feature integration branch that integrates convolutional features of the same layers extracted from different scaled images. RRN is built by five convolutional long-short term memories (Conv-LSTM) layers to generate multiscale blur estimation from multiscale convolutional features. RFL consists of two convolutional layers with filter sizes of 3×3×32 and 1×1×1. We first resize the input image with different ratios and extract multiscale convolutional features from each resized image by FEN.FEN also integrates the features of the corresponding layers to explore the merits of features extracted from different images. Then
we feed the highest convolutional features of each branch of FEN into RRN to produce coarse blur maps. The blur maps producing the features are robust to noise and background clusters due to highest layers' extracted semantic features. However
these blur maps are in low resolutions that provide better guidance for fine-scale blur estimation. Thus
we gradually incorporate lower layers' higher resolution features into Conv-LSTMs to generate more precise blur maps. Multiscale convolutional features are integrated by the Conv-LSTMs from top to bottom in each branch of RRN. RFL is responsible for fusing the blur maps generated by the four branches. We concatenate the last prediction maps of each branch of RRN with the first integrated features of the first layer from the FEN as input for RFL to generate the final blur map because features of shallow layer contains a large amount of detail structure information that can improve the DBD result. We use the combination of F-measure
precision
recall
mean absolute error (MAE)
and cross-entropy as our loss function for network pretraining and training. We add supervised signal at each prediction layers
which can directly pass the gradient to the corresponding layers and make the network optimization easier. We randomly select 2 000 images from the Berkeley segmentation dataset
uncompressed color image database
and Pascal2008 to synthesize blur images to pretrain the proposed network. The real training set consists of 1 204 images
which are selected from Dalian University of Technology(DUT) and The Chinese University of Hong Kong (CUHK) datasets. We augment the real training images by rotation
flipping
and cropping
enlarging the training data by 15 times. This operation greatly improves network performance. Our network is implemented by Keras. We resize the input images and ground truths to 320×320 pixels. We use adaptive moment estimation optimizer. We set the learning rate to 1×10
-5
and divide by 10 in every five epochs until 1×10
-8
. We initialize FEN by VGG16 weights trained on ImageNet. We initialize the remaining layers by "Xavier Uniform". We conduct pretraining and training using Nvidia RTX 2080Ti. The whole training takes approximately one day.
Result
2
We train and test our network on two public blur detection datasets
DUT and CUHK
and compare our method with 10 state-of-the-art DBD methods. On the DUT dataset
our method achieves 38.8% relative MAE reduction and 5.4% relative F
0.3
improvement over DeFusionNet(DBD network via recurrently fusing and refining multi-scale deep features). On this dataset
our method is the only one whose F
0.3
is higher than 0.87 and whose MAE is lower than 0.1. On the CUHK dataset
our method achieves 36.7% relative MAE reduction and 9.7% relative F
0.3
improvement over the local binary pattern. The proposed DBD method performs well in several challenging cases including homogeneous region and background cluster. Our blur detection is more precise at the detection boundaries. We conduct different ablation analysis to verify the effectiveness of our model.
Conclusion
2
We propose the coarse-to-fine multiscale DBD method
which extracts multiscale convolutional features from images with different resize ratios
and generate multiscale blur estimation with Conv-LSTMs. Conv-LSTMs integrate the semantic information of deep layer with detail information of the shallow layer to refine the blur maps. We produce the final blur map by integrating the blur maps generated from different-sized images and the low-level fused features. Our method generates more precise DBD results in different image scenes compared with other DBD methods in various scenes.
Achanta R, Hemami S, Estrada F and Susstrunk S. 2009. Frequency-tuned salient region detection//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 1597-1604[ DOI: 10.1109/cvpr.2009.5206596 http://dx.doi.org/10.1109/cvpr.2009.5206596 ]
Arbeláez P, Maire M, Fowlkes C and Malik J. 2011. Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5): 898-916[DOI:10.1109/tpami.2010.161]
Bae S and Durand F. 2007. Defocus magnification. Computer Graphics Forum, 26(3): 571-579[DOI:10.1111/j.1467-8659.2007.01080]
Dong C, Loy C C, He K M and Tang X O. 2016. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2): 295-307[DOI:10.1109/TPAMI.2015.2439281]
Everingham M, Van Gool L, Williams C K I, Winn J and Zisserman A. 2008. The PASCAL Visual Object Classes Challenge 2008(VOC2008). http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html
Golestaneh S A and Karam L J. 2017. Spatially-varying blur detection based on multiscale fused and sorted transform coefficients of gradient magnitudes//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 596-605[ DOI: 10.1109/cvpr.2017.71 http://dx.doi.org/10.1109/cvpr.2017.71 ]
Huang R, Feng W, Fan M Y, Wan L and Sun J Z. 2018. Multiscale blur detection by learning discriminative deep features. Neurocomputing, 285: 154-166[DOI:10.1016/j.neucom.2018.01.041]
Huang R, Xing Y and Wang Z Z. 2019. RGB-D salient object detection by a CNN with multiple layers fusion. IEEE Signal Processing Letters, 26(4): 552-556[DOI:10.1109/lsp.2019.2898508]
Jiang P, Ling H B, Yu J and Peng J L. 2013. Salient region detection by UFO: uniqueness, focusness and objectness//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 1976-1983[ DOI: 10.1109/iccv.2013.248 http://dx.doi.org/10.1109/iccv.2013.248 ]
Kingma D P and Ba J. 2015. Adam: a method for stochastic optimization//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR: [s.n.]
Kriener F, Binder T and Wille M. 2013. Accelerating defocus blur magnification//Proceedings of SPIE 8667, Multimedia Content and Mobile Devices.Burlingame, USA: SPIE: #86671Q[ DOI: 10.1117/12.2004118 http://dx.doi.org/10.1117/12.2004118 ]
Lee C Y, Xie S N, Gallagher P, Zhang Z Y and Tu Z W. 2015. Deeply-supervised nets//Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. San Diego, USA: JMLR: 562-570
Pan J S, Sun D Q, Pfister H and Yang M H. 2016. Blind image deblurring using dark channel prior//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1628-1636[ DOI: 10.1109/cvpr.2016.180 http://dx.doi.org/10.1109/cvpr.2016.180 ]
Pang Y W, Zhu H L, Li X Y and Li X L. 2016. Classifying discriminative features for blur detection. IEEE Transactions on Cybernetics, 46(10): 2220-2227[DOI:10.1109/tcyb.2015.2472478]
Park J, Tai Y W, Cho D and Kweon I S. 2017. A unified approach of multi-scale deep and hand-crafted features for defocus estimation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1736-1745[ DOI: 10.1109/cvpr.2017.295 http://dx.doi.org/10.1109/cvpr.2017.295 ]
Schaefer G and Stich M. 2004. UCID: an uncompressed color image database//Proceedings of SPIE, Storage and Retrieval Methods and Applicationsfor Multimedia 2004. San Jose, USA: SPIE: 472-480[ DOI: 10.1117/12.525375 http://dx.doi.org/10.1117/12.525375 ]
Shi J P, Xu L and Jia J Y. 2014. Discriminative blur detection features//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 2965-2972[ DOI: 10.1109/cvpr.2014.379 http://dx.doi.org/10.1109/cvpr.2014.379 ]
Shi J P, Xu L and Jia J Y. 2015a. Just noticeable defocus blur detection and estimation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 657-665[ DOI: 10.1109/cvpr.2015.7298665 http://dx.doi.org/10.1109/cvpr.2015.7298665 ]
Shi X J, Chen Z R, Wang H, Yeung D Y, Wong W K and Woo W C. 2015b. Convolutional LSTM network: a machine learning approach for precipitation nowcasting//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, USA: ACM: 802-810.
Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations.San Diego, USA: ICLR
Su B L, Lu S J and Tan C L. 2011. Blurred image region detection and classification//Proceedings of the 19th ACM international conference on Multimedia. Scottsdale, USA: ACM: 1397-1400[ DOI: 10.1145/2072298.2072024 http://dx.doi.org/10.1145/2072298.2072024 ]
Tang C, Wu J, Hou Y H, Wang P C and Li W Q. 2016. A spectral and spatial approach of coarse-to-fine blurred image region detection. IEEE Signal Processing Letters, 23(11): 1652-1656[DOI:10.1109/lsp.2016.2611608]
Tang C, Zhu X Z, Liu X W, Wang L Z and Zomaya A. 2019. DeFusionNET: defocus blur detection via recurrently fusing and refining multi-scale deep features//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2700-2709[ DOI: 10.1109/cvpr.2019.00281 http://dx.doi.org/10.1109/cvpr.2019.00281 ]
Wang W Q, Shen J B, Dong X P and Borji A. 2018. Salient object detection driven by fixation prediction//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1711-1720[ DOI: 10.1109/cvpr.2018.00184 http://dx.doi.org/10.1109/cvpr.2018.00184 ]
Wang X W, Zhang S L, Liang X, Zhou H J, Zheng J J and Sun M Z. 2019. Accurate and fast blur detection using a pyramid M-shaped deep neural network. IEEE Access, 7: 86611-86624[DOI:10.1109/access.2019.2926747]
Xiao H M, Lu W, Li R P, Zhong N, Yeung Y, Chen J J, Xue F and Sun W. 2019. Defocus blur detection based on multiscale SVD fusion in gradient domain. Journal of Visual Communication and Image Representation, 59: 52-61[DOI:10.1016/j.jvcir.2018.12.048]
Yi X and Eramian M. 2016. LBP-based segmentation of defocus blur. IEEE Transactions on Image Processing, 25(4): 1626-1638[DOI:10.1109/tip.2016.2528042]
Zeng K, Wang Y N, Mao J X, Liu J Y, Peng W X and Chen N K. 2019. A local metric for defocus blur detection based on CNN feature learning. IEEE Transactions on Image Processing, 28(5): 2107-2115[DOI:10.1109/tip.2018.2881830]
Zhao B, Wu X, Feng J S, Peng Q and Yan S C. 2017. Diversified visual attention networks for fine-grained object classification. IEEE Transactions on Multimedia, 19(6): 1245-1256[DOI:10.1109/tmm.2017.2648498]
Zhao W D, Zhao F, Wang D and Lu H C. 2020. Defocus blur detection via multi-stream bottom-top-bottom network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 1884-1897[DOI:10.1109/tpami.2019.2906588]
相关作者
相关机构
京公网安备11010802024621