发布时间: 2021-03-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.200126
2021 | Volume 26 | Number 3

图像理解和计算机视觉

由粗到精的多尺度散焦模糊检测

衡红军, 叶何斌, 周末, 黄睿

中国民航大学计算机科学与技术学院, 天津 300300

收稿日期: 2020-04-20; 修回日期: 2020-06-08; 预印本日期: 2020-06-13

基金项目: 国家自然科学基金民航联合基金项目（U1333109）

作者简介: 衡红军, 1968年生, 男, 副教授, 主要研究方向为智能信息处理、计算机应用、自然语言处理、知识图谱。E-mail: henghjcauc@163.com;
叶何斌, 男, 硕士研究生, 主要研究方向为计算机视觉。E-mail: 2018052049@cauc.edu.cn;
周末, 女, 硕士研究生, 主要研究方向为计算机视觉。E-mail: 1648420377@qq.com;
黄睿, 通信作者, 男, 讲师, 主要研究方向为计算机视觉、机器学习。E-mail: ruihuang@tju.edu.cn

中图法分类号: TP391

文献标识码: A

文章编号: 1006-8961(2021)03-0581-13

摘要

目的散焦模糊检测致力于区分图像中的清晰与模糊像素，广泛应用于诸多领域，是计算机视觉中的重要研究方向。待检测图像含复杂场景时，现有的散焦模糊检测方法存在精度不够高、检测结果边界不完整等问题。本文提出一种由粗到精的多尺度散焦模糊检测网络，通过融合不同尺度下图像的多层卷积特征提高散焦模糊的检测精度。方法将图像缩放至不同尺度，使用卷积神经网络从每个尺度下的图像中提取多层卷积特征，并使用卷积层融合不同尺度图像对应层的特征；使用卷积长短时记忆（convolutional long-short term memory，Conv-LSTM）层自顶向下地整合不同尺度的模糊特征，同时生成对应尺度的模糊检测图，以这种方式将深层的语义信息逐步传递至浅层网络；在此过程中，将深浅层特征联合，利用浅层特征细化深一层的模糊检测结果；使用卷积层将多尺度检测结果融合得到最终结果。本文在网络训练过程中使用了多层监督策略确保每个Conv-LSTM层都能达到最优。结果在DUT（Dalian University of Technology）和CUHK（The Chinese University of Hong Kong）两个公共的模糊检测数据集上进行训练和测试，对比了包括当前最好的模糊检测算法BTBCRL（bottom-top-bottom network with cascaded defocus blur detection map residual learning），DeFusionNet（defocus blur detection network via recurrently fusing and refining multi-scale deep features）和DHDE（multi-scale deep and hand-crafted features for defocus estimation）等10种算法。实验结果表明：在DUT数据集上，本文模型相比于DeFusionNet模型，MAE（mean absolute error）值降低了38.8%，F_0.3值提高了5.4%；在CUHK数据集上，相比于LBP（local binary pattern）算法，MAE值降低了36.7%，F_0.3值提高了9.7%。通过实验对比，充分验证了本文提出的散焦模糊检测模型的有效性。结论本文提出的由粗到精的多尺度散焦模糊检测方法，通过融合不同尺度图像的特征，以及使用卷积长短时记忆层自顶向下地整合深层的语义信息和浅层的细节信息，使得模型在不同的图像场景中能得到更加准确的散焦模糊检测结果。

关键词

散焦模糊检测(DBD); 多尺度特征; 卷积长短时记忆(Conv-LSTM); 由粗到精; 多层监督

Coarse-to-fine multiscale defocus blur detection

Heng Hongjun, Ye Hebin, Zhou Mo, Huang Rui

Computer Science and Technology, Civil Aviation University of China, Tianjin 300300, China

Supported by: The United Funds of the Civil Aviation Administration of China and the National Natural Science Foundation of China(U1333109)

Abstract

Objective Defocus blur detection (DBD) is devoted to distinguishing the sharp and the blurred pixels, which has wide applications and is an important problem in computer vision. The DBD result can be applied to many computer vision tasks, such as deblurring, blur magnification, and object saliency detection. According to the adopted image features, the DBD methods can be generally divided into two categories: hand-crafted feature-based traditional DBD methods and deep-feature-based DBD methods. The former utilize low-level blur features, such as gradient, frequency, singular value, local binary pattern, and simple classifiers to distinguish sharp image regions and blurred image regions. These low-level blur features are extracted from image patches, which results in loss of high-level semantic information. Although the traditional DBD methods do not need many training exemplars, they perform unsatisfactorily when images have complex scenes, especially in homogeneous regions and dark regions. More recent DBD methods propose to learn the representation of blur and sharp by using a large volume of images to extract task-adaptive features. Blur prediction can be generated by an end-to-end convolutional neural network (CNN), which is more efficient than traditional DBD methods. CNN can extract multiscale convolutional features, which is very useful for different vision problems, due to the hierarchical nonlinear ensemble of convolutional, rectified linear unit, and pooling layers. Generally, bottom layers extract low-level texture features that can improve the details of the detection results, whereas top layers extract high-level semantic features that are useful for conquering noise and background cluster. Most of the present methods integrate multiscale low-level texture features and high-level semantic features in their networks to generate robust defocus blur results. Although the existing DBD methods achieve better blur detection results than the hand-crafted feature-based methods, they still suffer from scale ambiguity and incomplete detection boundaries when processing images with complex scenes. In this paper, we propose a novel DBD framework that extracts multiscale convolutional features from images with different scales. Then, we use four branches of multiscale result refinement subnetworks to generate blur results at different feature scales. Lastly, we use a multiscale result fusion layer to generate the final blur results. Method The proposed network architecture consists of three parts: multiscale feature extraction subnetwork (FEN), multiscale result refinement subnetwork (RRN), and multiscale result fusion layer (RFL). We use visual geometry group(VGG16) as our basic feature extractor. We remove fully connected layers and the last pooling layer to increase image feature resolution. FEN consists of three basic feature extractors and a feature integration branch that integrates convolutional features of the same layers extracted from different scaled images. RRN is built by five convolutional long-short term memories (Conv-LSTM) layers to generate multiscale blur estimation from multiscale convolutional features. RFL consists of two convolutional layers with filter sizes of 3×3×32 and 1×1×1. We first resize the input image with different ratios and extract multiscale convolutional features from each resized image by FEN.FEN also integrates the features of the corresponding layers to explore the merits of features extracted from different images. Then, we feed the highest convolutional features of each branch of FEN into RRN to produce coarse blur maps. The blur maps producing the features are robust to noise and background clusters due to highest layers' extracted semantic features. However, these blur maps are in low resolutions that provide better guidance for fine-scale blur estimation. Thus, we gradually incorporate lower layers' higher resolution features into Conv-LSTMs to generate more precise blur maps. Multiscale convolutional features are integrated by the Conv-LSTMs from top to bottom in each branch of RRN. RFL is responsible for fusing the blur maps generated by the four branches. We concatenate the last prediction maps of each branch of RRN with the first integrated features of the first layer from the FEN as input for RFL to generate the final blur map because features of shallow layer contains a large amount of detail structure information that can improve the DBD result. We use the combination of F-measure, precision, recall, mean absolute error (MAE), and cross-entropy as our loss function for network pretraining and training. We add supervised signal at each prediction layers, which can directly pass the gradient to the corresponding layers and make the network optimization easier. We randomly select 2 000 images from the Berkeley segmentation dataset, uncompressed color image database, and Pascal2008 to synthesize blur images to pretrain the proposed network. The real training set consists of 1 204 images, which are selected from Dalian University of Technology(DUT) and The Chinese University of Hong Kong (CUHK) datasets. We augment the real training images by rotation, flipping, and cropping, enlarging the training data by 15 times. This operation greatly improves network performance. Our network is implemented by Keras. We resize the input images and ground truths to 320×320 pixels. We use adaptive moment estimation optimizer. We set the learning rate to 1×10^-5 and divide by 10 in every five epochs until 1×10^-8. We initialize FEN by VGG16 weights trained on ImageNet. We initialize the remaining layers by "Xavier Uniform". We conduct pretraining and training using Nvidia RTX 2080Ti. The whole training takes approximately one day. Result We train and test our network on two public blur detection datasets, DUT and CUHK, and compare our method with 10 state-of-the-art DBD methods. On the DUT dataset, our method achieves 38.8% relative MAE reduction and 5.4% relative F_0.3 improvement over DeFusionNet(DBD network via recurrently fusing and refining multi-scale deep features). On this dataset, our method is the only one whose F_0.3 is higher than 0.87 and whose MAE is lower than 0.1. On the CUHK dataset, our method achieves 36.7% relative MAE reduction and 9.7% relative F_0.3 improvement over the local binary pattern. The proposed DBD method performs well in several challenging cases including homogeneous region and background cluster. Our blur detection is more precise at the detection boundaries. We conduct different ablation analysis to verify the effectiveness of our model. Conclusion We propose the coarse-to-fine multiscale DBD method, which extracts multiscale convolutional features from images with different resize ratios, and generate multiscale blur estimation with Conv-LSTMs. Conv-LSTMs integrate the semantic information of deep layer with detail information of the shallow layer to refine the blur maps. We produce the final blur map by integrating the blur maps generated from different-sized images and the low-level fused features. Our method generates more precise DBD results in different image scenes compared with other DBD methods in various scenes.

Key words

defocus blur detection(DBD); multi-scale features; Conv-LSTM (convolutional long-short term memory); coarse-to-fine; multi-layer supervision

0 引言

由于硬件的限制以及摄影人员的拍照技艺等因素，模糊是图像中一种常见的现象。在计算机视觉领域中，模糊检测结果可以用于显著性检测(Jiang等，2013)、模糊放大(Kriener等，2013；Bae和Durand，2007)和图像去模糊(Pan等，2016；Huang等，2018)等问题。模糊检测是计算机视觉中的一个重要研究方向，分为运动模糊检测和散焦模糊检测两类，本文主要研究散焦模糊检测。

根据所使用的特征，可以将现有的散焦模糊检测算法分为两类：一类是基于手工特征的传统方法，另一类是基于深度学习的方法。图 1展示了两类方法的模糊检测结果，图 1(c)为传统方法DBDF(discriminative blur detection features)(Shi等，2014)的结果，图 1(d)为深度学习方法BTBCRL(bottom-top-bottom network with cascaded defocus blur detection map residual learning)(Zhao等，2020)的结果。可以发现，当图像背景中出现较多的模糊光斑时，基于传统的手工特征的模糊检测方法DBDF错误地将模糊光斑判定为清晰区域，且当前景图像中出现清晰且均匀的纯色区域时，传统方法错误地将其检测为模糊。相对而言，基于深度学习的模糊检测方法BTBCRL可以较好地过滤掉复杂图像背景上的检测错误，但在前景区域依然存在检测不准确、结果中清晰区域边界不完整的问题。

图 1 本文方法和其他方法检测含模糊光斑和均质区域图像的结果对比

Fig. 1 The result comparison of our method with others when detecting blurred light spot and homogeneous region((a)input; (b)ground-truth; (c)DBDF; (d)BTBCRL; (e)ours)

为了解决上述问题，本文提出了一种由粗到精的多尺度散焦模糊检测网络。考虑到模糊区域对尺度的敏感，首先对原始图像进行多次缩放得到多尺度的输入图像，然后分别将这些图像输入到特征提取子网络得到输入图像的多尺度卷积特征。在特征提取子网络中，深层卷积层提取的特征包含丰富的语义信息，有利于正确地区分模糊和清晰区域，而浅层卷积层提取的特征包含纹理、边界和梯度等低级信息，有利于提高检测结果的精度。因此，整合深层的语义信息和浅层的细节特征可产生较好的模糊检测结果。为了有效地整合不同级别的信息，将深层卷积层的特征作为Conv-LSTM(convolutional long-short term memory)(Shi等，2015b)的输入，得到具有语义信息的模糊检测结果，然后再将其与浅一层卷积层的特征合并输入到下一层Conv-LSTM, 得到更加精细的模糊检测结果。通过由深层至浅层，逐步产生由粗到精的模糊检测结果；最后，使用两个卷积层将不同尺度的模糊检测结果进行融合，得到更为鲁棒的模糊检测结果。本文采用了多层监督的策略，对模型中所有经Conv-LSTM层细化的检测结果求损失，以确保在深浅层特征整合过程中，每一层都能达到最优。在DUT(Dalian University of Technology)和CUHK(Chinese University of Hong Kong)两个公共的模糊检测数据集上分别对模型进行了训练和测试，并对比了包括当前性能最佳的共10种模糊检测算法。在DUT数据集上，相比于DeFusionNet(defocus blur detection network via recurrently fusing and refining multi-scale deep features)模型，本文模型的MAE(mean absolute error)值降低了38.8%，F_0.3值提高了5.4%；在CUHK数据集上，相比于算法LBP(local binary pattern)，本文模型的MAE降低了36.7%，F_0.3值提高了9.7%。实验验证了本文提出的散焦模糊检测模型的有效性。

1 相关工作

1.1 基于手工特征的方法

基于手工特征的传统散焦模糊检测方法大多是利用图像局部的梯度(Pang等，2016；Park等，2017)、频率(Golestaneh和Karam，2017；Shi等，2014；Tang等，2016)、奇异值(Su等，2011；Xiao等，2019)或其他手工特征(Yi和Eramian，2016；Shi等，2015a)。Pang等人(2016)提出了一个特定模糊核特征向量来进行模糊检测，该向量由滤波后的模糊核方差和滤波后的图像块梯度方差的乘积组成。Golestaneh和Karam(2017)提出基于多尺度高频率融合和排序变换的模糊检测方法，使用从多个分辨率图像块中提取的高频离散余弦变换系数进行模糊检测。Shi等人(2014)利用傅里叶域描述符、局部梯度分布和空间滤波器作为模糊特征区分清晰像素和模糊像素。Shi等人(2015a)通过稀疏表示和图像分解构建模糊特征来进行模糊检测，这种方式直接建立稀疏边缘表示与模糊程度估计之间的对应关系。Yi和Eramian(2016)提出一个基于局部二进制模式(local binary pattern, LBP)的图像清晰度量度，来区分图像中对焦或散焦的区域。Xiao等人(2019)提出一个基于梯度域多尺度奇异值特征融合的模糊量度，将奇异值分为多个子带，然后在多个子带上推理出模糊检测结果。

基于手工特征的传统模糊检测方法，虽然不需要使用大量的数据进行训练，但需要资深的研究人员设计模糊相关的特征，调整分类器参数。传统方法在特定场景下能表现出较好的模糊检测性能，但当图像内容复杂时(如图 1(c)所示)，传统方法无法得到可靠的模糊检测结果。

1.2 基于深度学习的方法

随着深度学习在显著性检测(Wang等，2018；Huang等，2019)、图像超像素(Dong等，2016)、图像分类(Zhao等，2017)等计算机视觉问题上取得了较大的进展，越来越多的研究人员提出不同的模糊检测网络，极大地提高了模糊检测的精度。Park等人(2017)将频率域分布、梯度分布和奇异值等特征组成一个散焦模糊特征向量，并将其输入至全连网络以决定图像块的模糊程度。Zeng等人(2019)使用超像素代替图像块来训练神经网络，然后利用主成分分析(principal components analysis, PCA)提取网络中和模糊相关的卷积核对整幅图像进行卷积操作，得到模糊检测结果。Zhao等人(2020)使用多流的自底向上再向下全卷积端到端网络(BTBCRL)来解决散焦模糊检测问题。Tang等人(2019)提出一个深度融合卷积神经网络(DeFusionNet)，以循环融合和改善多尺度深度特征的方式进行散焦模糊检测。Wang等人(2019)使用一个多输入的编码解码网络来学习与整合不同层的多尺度特征，并用此网络作为子网络去构建一个金字塔网络进行模糊检测。

基于深度学习的模糊检测方法，利用大量的数据更新网络参数，使得网络从数据中自适应地学习模糊相关的特征，极大地提高了模糊检测的效果。从图 1中可以看出，现有的深度学习方法虽然可以较好地抑制背景上的检测错误，但在均质区域效果较差，且检测到的清晰区域边界不完整。为了解决以上问题，本文提出一种由粗到精的多尺度散焦模糊检测网络模型，通过从不同尺度的图像中提取包含不同信息的多层特征，并自顶向下地进行整合得到更为鲁棒的模糊检测结果。

2 本文方法

2.1 模型结构

如图 2所示，本文提出的由粗到精的多尺度散焦模糊检测网络模型主要由3部分组成：多尺度特征提取子网络、多尺度结果细化子网络和多尺度结果融合层。首先，通过特征提取子网络分别从不同尺度的图像中提取不同尺度的卷积特征；然后使用卷积层融合不同尺度图像对应层的特征。每个尺度下提取的特征和融合后的特征传递给多尺度结果细化子网络，产生不同尺度的模糊检测结果；最终使用两个卷积层融合不同尺度的模糊检测结果，得到更为鲁棒的模糊检测结果。

图 2 本文模型结构

Fig. 2 The architecture of the proposedsub-network

2.1.1 多尺度特征提取

考虑到模糊的尺度敏感性，本文将图像尺寸缩小0.8和0.6倍，并将原始图像和缩小后的图像一起作为网络的输入。原始图像大小为320×320像素，缩放后的图像大小分别为256×256像素、192×192像素。本文使用VGG(visual geometry group)16(Simonyan和Zisserman，2015)作为基本的特征提取网络，分别从3个尺度下的图像中提取多尺度的卷积特征。

图 2红色矩形框中展示了多尺度特征提取子网络的结构；每个尺度的特征提取网络分为5个卷积块，分别为Conv1，Conv2, Conv3, Conv4和Conv5。将每个模块中的最后一个卷积层提取的特征记为F_convl^s，其中s为尺度，convl表示第l个卷积模块的最后一个卷积层。深卷积层提取的特征包含丰富的语义信息，有助于正确定位对焦的物体，区分清晰与模糊区域。浅卷积层提取的特征包含纹理、梯度和边缘等低级图像特征，能更好地细化模糊区域边界，有利于提高检测结果的精度。因此，每个尺度的特征提取网络提取5层包含不同信息的特征，以供结果细化子网络进行深浅层特征整合，产生更准确的检测结果。

每个尺度下的图像卷积特征之间是独立的。为了利用不同尺度下的图像卷积特征的互补性，使用3×3×64的卷积层融合3个尺度下图像对应层的卷积特征，融合后的特征表示为F_0convl。首先对所有尺度的卷积特征使用1×1×64的卷积进行降维，然后使用双线性插值的方式将两个小尺度特征尺度放大至原始尺度大小，再将每个尺度下对应层的特征联合，最后对此联合特征进行3×3×64的卷积得到多尺度融合特征。融合特征也被用于结果细化子网络，产生对应的模糊检测结果。

2.1.2 多尺度结果细化

多尺度结果细化子网络以多尺度和融合特征作为输入，从这些特征中推理出散焦模糊检测结果，并对其进行细化。如图 2绿色矩形框所示，每个尺度的细化网络由5个Conv-LSTM层组成，与特征提取子网络中的5个特征输出层对应，融合特征的结果细化网络结构与单尺度特征的相同。

Conv-LSTM由传统的长短时记忆网络(long-short term memory, LSTM)发展而来，将LSTM中张量的点乘运算替换为卷积操作，使LSTM网络有了提取空间信息的能力。与传统的LSTM相同，Conv-LSTM根据输入门${\mathit{\boldsymbol{i}}_t}$、输出门${\mathit{\boldsymbol{o}}_t}$和遗忘门${\mathit{\boldsymbol{f}}_t}$更新隐藏状态和细胞状态。计算为

$ \boldsymbol{i}_{t}=\sigma\left(\boldsymbol{W}_{i}^{x} * \boldsymbol{X}_{t}+\boldsymbol{W}_{i}^{\mathrm{H}} * \boldsymbol{H}_{t-1}+\boldsymbol{b}_{i}\right) $

(1)

$ \boldsymbol{f}_{t}=\sigma\left(\boldsymbol{W}_{f}^{x} * \boldsymbol{X}_{t}+\boldsymbol{W}_{f}^{\mathrm{H}} * \boldsymbol{H}_{t-1}+\boldsymbol{b}_{f}\right) $

(2)

$ \boldsymbol{o}_{t}=\sigma\left(\boldsymbol{W}_{o}^{x} * \boldsymbol{X}_{t}+\boldsymbol{W}_{o}^{\mathrm{H}} * \boldsymbol{H}_{t-1}+\boldsymbol{b}_{o}\right) $

(3)

$ \boldsymbol{C}_{t}=\boldsymbol{f}_{t} \circ \boldsymbol{C}_{t-1}+\boldsymbol{i}_{t} \circ \tanh \left(\boldsymbol{W}_{c}^{x} * \boldsymbol{X}_{t}+\boldsymbol{W}_{c}^{\mathrm{H}} * \boldsymbol{H}_{t-1}+\boldsymbol{b}_{c}\right) $

(4)

$ \boldsymbol{H}_{t}=\boldsymbol{o}_{t} \circ \tanh \left(\boldsymbol{C}_{t}\right) $

(5)

式中，$σ$和tanh分别表示sigmoid和双曲正切激活函数；$ \circ $表示哈达玛积；$\mathit{\boldsymbol{W}}$和$\mathit{\boldsymbol{b}}$表示网络需要学习的权重；输入${\mathit{\boldsymbol{X}}_t}$，隐藏状态${\mathit{\boldsymbol{H}}_t}$，以及${\mathit{\boldsymbol{i}}_t}$、${\mathit{\boldsymbol{o}}_t}$和${\mathit{\boldsymbol{f}}_t}$都是3维张量，1≤$t$≤$T$。本文利用Conv-LSTM的循环结构细化检测结果，而不是对特征进行时序建模, 即将输入到Conv-LSTM的特征${\mathit{\boldsymbol{X}}_t}$复制$T$次，然后分别输入到每个时序单元，最后对第$T$个时序单元的隐藏状态进行1×1×1的卷积得到检测结果。模型的$T$设置为3。

将特征提取子网络最深层，包含丰富语义信息的特征$\mathit{\boldsymbol{F}}_{{\rm{conv5}}}^s$输入到结果细化子网络的${\rm{Conv - LSTM}}_5^s$层，得到粗略的包含语义信息的散焦模糊检测结果图$\mathit{\boldsymbol{B}}_5^s$。其中${\rm{Conv - LSTM}}_N^s$表示第$s$尺度的$N$个Conv-LSTM层，$\mathit{\boldsymbol{B}}_L^s$表示${\rm{Conv - LSTM}}_l^s$输出的结果。再将$\mathit{\boldsymbol{B}}_5^s$与浅一层特征$\mathit{\boldsymbol{F}}_{{\rm{conv4}}}^s$联合后输入到结果细化子网络的${\rm{Conv - LSTM}}_4^s$中整合，得到更精细的散焦模糊检测结果图$\mathit{\boldsymbol{B}}_4^s$。以同样的方式自顶向下整合浅层的图像特征，将高层的语义信息逐步传递至底层, 逐步细化上层的模糊检测结果。其融合过程为

$\boldsymbol{B}_{5}^{s} =\varPsi_{\text {Conv }-\mathrm{LSTM}}\left(\boldsymbol{F}_{\text {conv }}^{s}\right) $

(6)

$ \begin{aligned} \boldsymbol{B}_{l}^{s}=\varPsi_{\text {Conv }-\mathrm{LSTM}} &\left({Cat}\left(\sigma\left(\boldsymbol{F}_{\text {conv }}^{s}\right), \varPhi_{u p}\left(\boldsymbol{B}_{l+1}^{s}\right)\right)\right), \\ l & \in\{1,2,3,4\} \end{aligned} $

(7)

式中，${\mathit{\Psi }_{{\rm{Conv}} - {\rm{LSTM}}}}$表示Conv-LSTM层内部计算过程，$Cat$(·)表示拼接操作，${\mathit{\Phi }_{up}}$(·)表示步长为2的上采样操作。

2.1.3 多尺度结果融合

在得到各个尺度下的散焦模糊检测结果后，需要将这些多尺度的结果进行融合。由于特征提取子网络的浅层多尺度融合特征包含丰富的纹理和边缘信息，有助于得到更准确的检测结果。因此，将每个尺度下的模糊检测结果$\mathit{\boldsymbol{B}}_l^s$与低级特征$\mathit{\boldsymbol{F}}_{{\rm{conv1}}}^0$拼接后输入到多尺度结果融合模块，进行多尺度结果融合和细化，得到最终的散焦模糊检测结果。如图 2紫色矩形框所示，多尺度结果融合层由一个3×3×32和一个1×1×1的卷积层组成。产生最终的散焦模糊检测结果计算为

$ \boldsymbol{B}_{0}=f\left({Cat}\left(\sigma\left(\boldsymbol{F}_{\text {conv}1}^{0}\right), \boldsymbol{B}_{1}^{0}, \boldsymbol{B}_{1}^{1}, \boldsymbol{B}_{1}^{2}, \boldsymbol{B}_{1}^{3}\right)\right) $

(8)

式中，$f$表示融合层的融合操作，${\mathit{\boldsymbol{B}}_0}$表示最终融合后的模糊检测结果。

2.2 损失函数

损失函数由传统的加权交叉熵损失(${L_C}$)、准确率(${L_P}$)、召回率(${L_R}$)、F-measure(${L_{{F_\beta }}}$为F的下标)以及平均错误率(${L_{{\rm{MAE}}}}$)加权求和所得，即

$ \begin{aligned} L_{ \mathrm{Blur}}(\boldsymbol{B}, \boldsymbol{G})=L_{C}(\boldsymbol{B}, \boldsymbol{G})+\alpha_{1} L_{F_{\beta}}(\boldsymbol{B}, \boldsymbol{G})+\\ \alpha_{2} L_{P}(\boldsymbol{B}, \boldsymbol{G})+\alpha_{3} L_{R}(\boldsymbol{B}, \boldsymbol{G})+\alpha_{4} L_{\mathrm{MAE}}(\boldsymbol{B}, \boldsymbol{G}) \end{aligned} $

(9)

式中，$\mathit{\boldsymbol{B}}$表示模糊检测结果，$\mathit{\boldsymbol{G}}$表示真值，${\alpha _1} = {\alpha _2} = {\alpha _3} = {\alpha _4} = 0.1$，${L_C}$表示交叉熵损失函数。${L_C}$是损失函数的主要成分，计算为

$ \begin{array}{c} L_{C}(\boldsymbol{B}, \boldsymbol{G})=\frac{1}{W \times H} \sum\limits_{x}\left[g_{x} \cdot \ln \left(b_{x}+\varepsilon\right)+\right. \\ \left.\left(1-g_{x}\right) \cdot \ln \left(1-b_{x}+\varepsilon\right)\right] \end{array} $

(10)

式中，${b_x}$和${g_x}$为模糊检测结果和真值图在$x$位置的像素值，$W$和$H$为输入图像的宽度和高度。

${L_P}$、${L_R}$和${L_{{F_\beta }}}$的计算方式与准确率、召回率、F-measure类似，分别计算为

$ L_{P}(\boldsymbol{B}, \boldsymbol{G})=-\frac{\sum\limits_{x}\left(b_{x} \cdot g_{x}\right)}{\sum\limits_{x}\left(b_{x}+\varepsilon\right)} $

(11)

$ L_{R}(\boldsymbol{B}, \boldsymbol{G})=-\frac{\sum\limits_{x}\left(b_{x} \cdot g_{x}\right)}{\sum\limits_{x}\left(g_{x}+\varepsilon\right)} $

(12)

$ L_{F_{\beta}}(\boldsymbol{B}, \boldsymbol{G})=-\frac{\left(1+\beta^{2}\right) \cdot L_{P}(\boldsymbol{B}, \boldsymbol{G}) \cdot L_{R}(\boldsymbol{B}, \boldsymbol{G})}{\beta^{2} \cdot L_{P}(\boldsymbol{B}, \boldsymbol{G})+L_{R}(\boldsymbol{B}, \boldsymbol{G})+\varepsilon} $

(13)

参照Achanta等人(2009)将${\beta ^2}$设置为0.3；为避免分母为零，$ε$为常量。因为准确率、召回率和F-measure的值越高表示模型性能越好，所以取这3个指标的相反数以最小化损失函数。

MAE反映模糊检测结果与真值图之间的不符程度，计算为

$ L_{\mathrm{MAE}}(\boldsymbol{B}, \boldsymbol{G})=\frac{1}{W \times H} \sum\limits_{x}\left|b_{x}-g_{x}\right| $

(14)

对每个Conv-LSTM层的输出进行上采样至输出大小后计算损失，避免了对真值图进行下采样带来的信息损失。网络总体的损失函数$L$为每个Conv-LSTM层输出的损失之和，计算为

$ L=\sum\limits_{s=0}^{3} \sum\limits_{l=1}^{5} L_{\mathrm{Blur}}\left(\boldsymbol{B}_{l}^{s}, \boldsymbol{G}\right)+L_{\mathrm{Blur}}\left(\boldsymbol{B}_{0}, \boldsymbol{G}\right) $

(15)

式中，$\mathit{\boldsymbol{B}}_l^s$表示尺度$s$的第$l$层Conv-LSTM输出的结果。使用逐层监督的方式，网络中每个Conv-LSTM层能直接获取到多层损失函数的梯度，可以确保每一层都能达到最优(Lee等人，2015)。

3 实验及分析

3.1 评价指标

使用准确率、召回率、F-measure和MAE来评估模型性能。计算F-measure，首先对模型输出的结果缩放至320×320像素，然后使用最大类间方差法(Otsu)对输出结果进行二值化。使用二值化的模糊检测结果和真值图计算F-measure。其中${\beta ^2}$=0.3以强调准确率，记为F_0.3。

3.2 数据集

使用两个散焦模糊检测基准数据集：DUT(Zhao等，2020)和CUHK(Shi等，2014)。DUT数据集包含600幅训练图像和500幅测试图像以及对应的像素级别的标签。CUHK数据集包含704幅散焦模糊图像，按图像编号顺序取前604幅图像用于训练，后100幅图像用于测试。

为了得到最优的网络模型，本文使用模拟数据对网络进行预训练。先从BSD(Berkeley segmentation dataset)(Arbeláez等，2011)、UCID(uncompressed color image database)(Schaefer和Stich，2004)和Pascal(pattern analysis, statistical modeling and computational learning) 2008(Everingham等，2008)3个数据集中随机挑选出2 000幅图像；然后使用大小为7×7，方差为2的高斯模糊核，分别处理每幅图像的上、下、左、右这4个部分，得到4幅模拟的模糊图像；最后再对这4幅图以相同的方式迭代处理4遍。通过以上处理，由一幅原始图像可得到20幅模拟的散焦模糊图，一共产生40 000对模拟图像(包括真值)。

3.3 实现细节

对每幅图像随机旋转11个角度并在空白处填充零；水平、垂直和水平垂直翻转图像；宽高比不在[0.95, 1.05]之间的图像裁切成两幅。增强后的数据集图像数量约为原来的16倍。

使用以Tensorflow为后端的Keras框架实现模型，在英伟达RTX 2080Ti GPU上训练。模型训练分为两步：1)使用模拟的模糊数据预训练模型；2)使用基准数据集对预训练模型进行微调。预训练前，用ImageNet上训练完成的VGG16网络权重初始化特征提取网络，其他新加的网络层使用“Xavier Uniform”初始化。预训练学习率设置为1×10^-7，批大小(batch size)为2。预训练经过3轮迭代之后网络收敛，保存权重停止预训练。在训练前，使用数据增强方法对基准数据集进行增强。在加载预训练权重基础上，使用基准数据集进行微调，初始学习率设为1×10^-5，每5轮迭代使学习率降为原来的0.1倍，直至1×10^-8；批大小为2，优化器为Adam(adaptive moment estimation)(Kingma和Ba，2015)，训练直至模型收敛。

3.4 对比实验与分析

与10个先进的模糊检测方法进行比较，包括4个基于深度学习的方法，6个基于手工特征的方法。基于深度学习的方法为DHDE(multi-scale deep and hand-crafted features for defocus estimation)(Park等，2017)、BTBCRL(Zhao等，2020)、DeFusionNet(Tang等，2019)和BDNet-F(blur detection network with results fusion)(Huang等，2018)。基于手工特征的方法为HiFST(high-frequencymulti-scale fusion and sort transform of gradient magnitudes)(Golestaneh和Karam，2017)、SVD(singular value decomposition based blur detection)(Su等，2011)、DBDF(Shi等，2014)、SS(spectral and spatial approach)(Tang等，2016)、LBP(Yi和Eramian，2016)和JNB(just noticeable defocus blur detection)(Shi等，2015a)。BTBCRL的结果可从作者的主页上下载，其余8个方法的结果可从Tang等人(2019)建立的散焦模糊检测方法结果对比库中获得。

3.4.1 定性分析

散焦模糊检测的难点在于处理复杂背景和均质区域的图像。图 3直观地展示了在上述场景下，每种方法的可视化结果。第1行输入图像中的女生衣服为均质区域，除了DeFusionNet(Tang等，2019)和本文方法外，其他方法都将此清晰区域错误检测为模糊区域，方法SS更是得到了完全错误的检测结果。第3行的树叶也属于均质区域，且图像背景单调，本文方法能完全将清晰的树叶检测出来，而LBP(Yi和Eramian，2016)、SS(Tang等，2016)、SVD(Su等，2011)等传统方法只能检测出树叶边缘，BTBCRL(Zhao等，2020)和DeFusionNet(Tang等，2019)这两个基于深度学习的方法准确度也较差，未完全检测到清晰区域。

图 3 不同模糊检测方法在具有均质区域图像上的结果对比

Fig. 3 Comparison of the results of different blur detection methods on the images with homogeneous regions((a)input; (b)ground-truth; (c)BTBCRL; (d)DBDF; (e)DeFusionNet; (f)DHDE; (g)HiFST; (h)JNB; (i)LBP; (j)SS; (k)SVD; (l)ours)

在图 4中，第2行的输入图像中场景较为复杂，除了LBP和本文方法之外，其他方法均未准确检测到图像中清晰的区域。第3行的输入图像背景包含大量模糊光斑，除本文方法外，都错误地将此模糊光斑判定为清晰，特别是传统方法DBDF和深度学习方法DHDE完全无法避免此错误。第4行的输入图像中包含很多种颜色，且图像中物体较为杂乱。在处理这种复杂场景时，只有BTBCRL、LBP和本文方法能正确检测到清晰的物体。在这些方法的结果中，本文结果的模糊边界最清晰，最接近真值图效果。

图 4 不同模糊检测方法在具有复杂背景和光斑图像上的结果比较

Fig. 4 Comparison of the results of different blur detection methods on the images with complex backgrounds and light spots((a)input; (b)ground-truth; (c)BTBCRL; (d)DBDF; (e)DeFusionNet; (f)DHDE; (g)HiFST; (h)JNB; (i)LBP; (j)SS; (k)SVD; (l)ours)

以上结果表明，本文方法在处理复杂背景和模糊光斑时要优于其他模糊检测方法。

3.4.2 定量分析

本文在表 1中对比了本文模型与其他方法在DUT(Zhao等，2020)和CUHK(Shi等，2014)两个数据集上得到的准确率、召回率、F_0.3和MAE。在DUT数据集上，BTBCRL(Zhao等，2020)、DeFusionNet(Tang等，2019)和本文方法的F_0.3和MAE要远优于其他方法。相比于其他方法，只有本文方法的F_0.3超过0.87，MAE低于0.1。相比于排第2位的DeFusionNet方法，本文方法的召回率和F_0.3分别提高了20.4%、5.4%，MAE降低了38.8%。相比于性能最好的传统方法LBP，本文方法的F_0.3和MAE分别改善了16.1%，62.7%。

表 1 不同方法在DUT和CUHK数据集上测试结果的准确率、召回率、F_0.3和MAE对比
Table 1 The comparison of precision, recall, F_0.3 and MAE of different method results on DUT and CUHK datasets

下载CSV

数据集	评价指标	JNB	DBDF	HiFST	LBP	SVD	SS	DHDE	BDNet-F	BTBCRL	DeFusionNet	本文
DUT	准确率	0.374 4	0.414 7	0.507 3	0.631 1	0.573 9	0.616 8	0.505 7	-	0.801 3	0.802 1	0.804 7
	召回率	0.647 7	0.836 2	0.924 4	0.770 2	0.528 9	0.681 0	0.827 0	-	0.724 6	0.743 0	0.894 3
	F_0.3	0.568 0	0.625 2	0.713 6	0.753 8	0.683 1	0.737 3	0.663 3	-	0.828 0	0.830 7	0.875 3
	MAE	0.418 7	0.377 5	0.306 3	0.195 1	0.299 4	0.243 3	0.401 1	-	0.139 1	0.118 7	0.072 7
CUHK	准确率	0.551 5	0.506 6	0.704 4	0.681 4	0.700 2	0.705 3	0.579 1	0.603 1	-	0.797 8	0.754 2
	召回率	0.806 8	0.907 1	0.800 1	0.812 5	0.518 6	0.687 2	0.907 8	0.860 2	-	0.904 7	0.935 9
	F_0.3	0.699 3	0.689 7	0.789 2	0.775 4	0.730 7	0.765 0	0.738 4	0.746 0	-	0.864 7	0.850 3
	MAE	0.354 9	0.342 5	0.237 5	0.210 9	0.301 3	0.266 0	0.389 8	0.248 7	-	0.117 5	0.133 5
注：加粗字体和加下划线字体为每行最优和次优结果, “-”表示数据集中没有实验结果或代码实现。

在CUHK数据集上，综合性能最好的方法为DeFusionNet，但本文结果的召回率相比该方法提高了3.4%。相比于性能排第3位的方法LBP，本文结果的F_0.3提升了9.7%，MAE降低了36.7%。本文方法在以上两个数据集上的4个指标都要优于所有的传统方法。BTBCRL的作者未公开模型训练代码，因此本文无法在CUHK的测试集上评估其性能。

3.5 消融实验与分析

3.5.1 多尺度结果细化的有效性

本文的多尺度结果细化子网络的输入为多尺度和融合特征。用分别只输入多尺度和融合特征的方法来验证结果细化子网络是否有效地利用了这些特征。“多尺度”表示网络中只包含接收多尺度特征的结果细化网络，“融合”表示网络中只包含接收融合特征的结果细化网络。与基准模型结果相近的消融实验，对其重复3次后，取评价指标均值作为最终结果以消除误差(下文实验也采用此方式)。如表 2所示，使用“多尺度+融合”特征的网络性能表现要优于单独使用“多尺度”和“融合”特征的网络。“多尺度+融合”网络的F_0.3和MAE相较于“多尺度”网络，分别改善了0.1%，4.2%；相较于“融合”网络，分别改善了0.9%，14.1%。这说明多尺度特征和融合特征均有助于正确地区分模糊与清晰区域，多尺度特征对于准确的模糊检测更为关键。并且，“多尺度+融合”网络训练时损失更早收敛，比仅使用“多尺度”特征的网络大约少3 h训练时间。

表 2 DUT数据集上多尺度结果细化网络的有效性分析
Table 2 Analyzing the effectiveness of multi-scale result refining subnetwork on the DUT dataset

下载CSV

方法	准确率	召回率	F_0.3	MAE
多尺度+融合(本文)	0.804 7	0.894 3	0.875 3	0.072 7
仅使用多尺度	0.801 5	0.897 6	0.874 1	0.075 9
仅使用融合	0.816 3	0.842 6	0.867 3	0.084 6
注：加粗字体为每列最优值。

以上分析表明，本文网络模型的多尺度特征细化网络有效地利用了多尺度特征和融合特征，两类特征相互弥补，提高了检测精度。

3.5.2 多层监督的有效性

通过减少监督层数来验证多层监督的有效性。第1种策略为仅监督特征细化网络中Conv-LSTM 1层的输出和模型最终结果，记为“五层监督”；第2种策略为仅监督最终结果，记为“单层监督”。从表 3中可以看出，随监督的层数增加，召回率、F_0.3和MAE这3个指标都在改善。相比于单层监督，多层监督的召回率提升了2.0%，MAE降低了4.1%。多次实验结果表明多层监督几乎不会增加模型推理时间。实验结果表明，多层监督有效地提升了模型的检测精度。

表 3 DUT数据集上多层监督有效性分析
Table 3 Analyzing the effectiveness of multi-layer supervision on the DUT dataset

下载CSV

方法	准确率	召回率	F_0.3	MAE
多层监督(本文)	0.804 7	0.894 3	0.875 3	0.072 7
五层监督	0.802 1	0.888 7	0.872 7	0.074 3
单层监督	0.808 0	0.876 8	0.872 8	0.075 8
注：加粗字体为每列最优值。

3.5.3 多尺度结果融合层的有效性

通过比较3种融合方式来验证结果融合层的有效性。第1种融合方式(即本文模型的融合方法)使用双卷积层(3×3×32+1×1×1)，同时添加浅层特征$\mathit{\boldsymbol{F}}_{{\rm{conv1}}}^0$作为输入。第2种融合方式在第1种的基础上取消浅层特征的输入，仅使用双卷积层。第3种融合方式仅使用1×1×1的单层卷积，且不接收浅层特征输入。如表 4所示，本文模型的F_0.3和MAE最优，与仅使用双卷积相比，MAE降低了39.5%，表明输入的浅层特征能显著改善结果融合层的准确性。而单卷积层的MAE(0.366)非常高是因为1×1×1的卷积层拟合能力不足，不能有效地进行结果融合。

表 4 DUT数据集上结果融合层有效性分析
Table 4 Analyzing the effectiveness of result fusion layer on the DUT dataset

下载CSV

方法	F_0.3	MAE
双卷积层+浅层特征(本文)	0.875 3	0.072 7
双卷积层	0.871 0	0.120 1
单卷积层	0.874 8	0.366 0
注：加粗字体为每列最优值。

分析表明, 模型采用“双卷积层+浅层特征”的多尺度结果融合层能够有效地融合多尺度结果。

3.5.4 Conv-LSTM层的优势

使用Conv-LSTM层构建多尺度结果细化网络提升了模型检测精度。为验证其有效性，将多尺度结果细化网络中的Conv-LSTM层替换为具有相同卷积核大小的卷积层。表 5中详细对比了两种网络的性能。在DUT数据集上，使用Conv-LSTM后模型的准确率提升了0.9%，MAE降低了2.3%。在CUHK数据集上准确率提升了1.4%，MAE降低了4.6%。推理一幅320×320像素的输入图像，使用Conv-LSTM的模型需要158.8 ms，但使用卷积结果细化网络的模型只需41.8 ms。为了精准地检测散焦模糊，本文选用了Conv-LSTM构建模型。实验表明，多尺度结果细化网络中的Conv-LSTM层能有效地细化和整合多层特征。

表 5 DUT和CUHK数据集上Conv-LSTM层和卷积层的优势分析
Table 5 Analyzing the advantages of Conv-LSTM and convolutional layer on the DUT and CUHK datasets

下载CSV

方法	DUT				CUHK				推理时间/ms
方法	准确率	召回率	F_0.3	MAE	准确率	召回率	F_0.3	MAE	推理时间/ms
Conv-LSTM	0.804 7	0.894 3	0.875 3	0.072 7	0.754 2	0.935 9	0.850 3	0.133 5	158.8
卷积层	0.797 7	0.902 7	0.873 9	0.074 4	0.743 5	0.943 1	0.845 1	0.139 9	41.8
注：加粗字体为每列最优值。

3.5.5 多成分损失函数的有效性

使用交叉熵、准确率、召回率、F-measure以及平均错误率的加权求和作为模型训练的损失函数。为验证其有效性，与仅使用交叉熵损失函数的模型进行对比。表 6中的实验结果表明，使用多成分损失后，MAE降低了0.5%，召回率提升了1.1%，并且多成分损失函数在模型推理时对时间复杂度的影响可以忽略。

表 6 DUT数据集上多成分损失函数的有效性分析
Table 6 Analyzing the effectiveness of multipleloss function on the DUT dataset

下载CSV

方法	准确率	召回率	F_0.3	MAE
多成分损失	0.804 7	0.894 3	0.875 3	0.072 7
仅交叉熵损失	0.809 0	0.884 4	0.875 1	0.073 1
注：加粗字体为每列最优值。

3.5.6 训练数据的影响

训练数据对深度神经网络性能有着显著的影响。为分析其对本文模型性能的影响，对比了分别使用DUT、CUHK数据集训练和联合两个数据集进行训练的模型。表 7中的实验结果表明，联合两个数据集训练的模型在CUHK数据集上相比于单个数据集训练的模型，MAE降低了12.0%，F_0.3提升了1.4%。在DUT数据集上，联合训练后的模型性能稍有下降，MAE和F_0.3分别劣化了2.3%、0.4%。

表 7 不同训练数据对模型性能的影响分析
Table 7 Analyzing the effect of different training data for model performance

下载CSV

数据集	评价指标	联合两个数据集训练	使用单个数据集训练
DUT	准确率	0.802 2	0.804 7
	召回率	0.885 7	0.894 3
	F_0.3	0.871 7	0.875 3
	MAE	0.074 4	0.072 7
CUHK	准确率	0.777 5	0.754 2
	召回率	0.928 7	0.935 9
	F_0.3	0.862 1	0.850 3
	MAE	0.117 2	0.133 5
注：加粗字体为每行最优值。

3.5.7 模型的不足

虽然本文模型在两个数据集上性能表现很好，但是在检测一些图像时也会发生错误。图 5为模型检测发生错误的示例。如黄色矩形框所示，第1幅图像中岩石上的坑没有被过滤去掉；如红色矩形框所示，第2幅图像中模糊的花朵被检测为清晰。这些错误一般发生在模糊程度低且背景前景几乎一致的图像中。这可能是因为本文模型的多尺度特征尺度种类有限，不能很好地避免输入图像的缩放而带来的影响或模型没有充分利用深层特征的语义信息。对于模型存在的不足，需要进一步研究和分析。

图 5 本文散焦模糊检测模型发生错误的示例

Fig. 5 Some failure cases of our defocus blur detection model((a)input; (b)ground-truth; (c)ours)

4 结论

本文提出了一种由粗到精的多尺度散焦模糊检测网络(模型)，其中包含多尺度特征提取子网络、多尺度结果细化子网络和多尺度结果融合层。多尺度特征提取子网络横向地融合相同层的多尺度特征；而结果细化子网络纵向地整合不同层的多尺度特征。这两种利用特征的方法，改善了模糊尺度歧义性导致的检测精度下降问题。在特征整合时，使用Conv-LSTM以自顶向下的方式逐步整合不同层多尺度特征的模糊信息，将深层的语义信息传递到网络浅层用于产生精细完整的模糊检测结果，同时利用Conv-LSTM的循环结构细化模糊特征。由不同尺度图像的特征和多尺度图像融合特征得到的多尺度模糊检测结果作为先验知识，与浅层的融合特征一起经过多尺度结果融合，产生更为鲁棒的模糊检测结果。相比当前最好的检测算法，本文方法在处理包含大片均质区域和复杂背景的散焦模糊图像时鲁棒性更强，且结果中模糊区域边界清晰完整。

尽管本文方法的结果优于现有模糊检测方法的结果，但所提出的多尺度模糊检测模型参数量大，导致检测速度不高。在未来的工作中，将进一步研究如何减少网络的参数，并提高模糊检测的精度，构建性能更高的模糊检测模型。

参考文献

Achanta R, Hemami S, Estrada F and Susstrunk S. 2009. Frequency-tuned salient region detection//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 1597-1604[DOI: 10.1109/cvpr.2009.5206596]

Arbeláez P, Maire M, Fowlkes C, Malik J. 2011. Contour detection and hierarchical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(5): 898-916 [DOI:10.1109/tpami.2010.161]

Bae S, Durand F. 2007. Defocus magnification. Computer Graphics Forum, 26(3): 571-579 [DOI:10.1111/j.1467-8659.2007.01080]

Dong C, Loy C C, He K M, Tang X O. 2016. Image super-resolution using deep convolutional networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(2): 295-307 [DOI:10.1109/TPAMI.2015.2439281]

Everingham M, Van Gool L, Williams C K I, Winn J and Zisserman A. 2008. The PASCAL Visual Object Classes Challenge 2008(VOC2008). http://www.pascal-network.org/challenges/VOC/voc2008/workshop/index.html

Golestaneh S A and Karam L J. 2017. Spatially-varying blur detection based on multiscale fused and sorted transform coefficients of gradient magnitudes//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 596-605[DOI: 10.1109/cvpr.2017.71]

Huang R, Feng W, Fan M Y, Wan L, Sun J Z. 2018. Multiscale blur detection by learning discriminative deep features. Neurocomputing, 285: 154-166 [DOI:10.1016/j.neucom.2018.01.041]

Huang R, Xing Y, Wang Z Z. 2019. RGB-D salient object detection by a CNN with multiple layers fusion. IEEE Signal Processing Letters, 26(4): 552-556 [DOI:10.1109/lsp.2019.2898508]

Jiang P, Ling H B, Yu J and Peng J L. 2013. Salient region detection by UFO: uniqueness, focusness and objectness//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 1976-1983[DOI: 10.1109/iccv.2013.248]

Kingma D P and Ba J. 2015. Adam: a method for stochastic optimization//Proceedings of the 3rd International Conference on Learning Representations. San Diego, USA: ICLR: [s.n.]

Kriener F, Binder T and Wille M. 2013. Accelerating defocus blur magnification//Proceedings of SPIE 8667, Multimedia Content and Mobile Devices.Burlingame, USA: SPIE: #86671Q[DOI: 10.1117/12.2004118]

Lee C Y, Xie S N, Gallagher P, Zhang Z Y and Tu Z W. 2015. Deeply-supervised nets//Proceedings of the 18th International Conference on Artificial Intelligence and Statistics. San Diego, USA: JMLR: 562-570

Pan J S, Sun D Q, Pfister H and Yang M H. 2016. Blind image deblurring using dark channel prior//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1628-1636[DOI: 10.1109/cvpr.2016.180]

Pang Y W, Zhu H L, Li X Y, Li X L. 2016. Classifying discriminative features for blur detection. IEEE Transactions on Cybernetics, 46(10): 2220-2227 [DOI:10.1109/tcyb.2015.2472478]

Park J, Tai Y W, Cho D and Kweon I S. 2017. A unified approach of multi-scale deep and hand-crafted features for defocus estimation//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 1736-1745[DOI: 10.1109/cvpr.2017.295]

Schaefer G and Stich M. 2004. UCID: an uncompressed color image database//Proceedings of SPIE, Storage and Retrieval Methods and Applications for Multimedia 2004. San Jose, USA: SPIE: 472-480[DOI: 10.1117/12.525375]

Shi J P, Xu L and Jia J Y. 2014. Discriminative blur detection features//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 2965-2972[DOI: 10.1109/cvpr.2014.379]

Shi J P, Xu L and Jia J Y. 2015a. Just noticeable defocus blur detection and estimation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 657-665[DOI: 10.1109/cvpr.2015.7298665]

Shi X J, Chen Z R, Wang H, Yeung D Y, Wong W K and Woo W C. 2015b. Convolutional LSTM network: a machine learning approach for precipitation nowcasting//Proceedings of the 28th International Conference on Neural Information Processing Systems. Cambridge, USA: ACM: 802-810.

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition//Proceedings of the 3rd International Conference on Learning Representations.San Diego, USA: ICLR

Su B L, Lu S J and Tan C L. 2011. Blurred image region detection and classification//Proceedings of the 19th ACM international conference on Multimedia. Scottsdale, USA: ACM: 1397-1400[DOI: 10.1145/2072298.2072024]

Tang C, Wu J, Hou Y H, Wang P C, Li W Q. 2016. A spectral and spatial approach of coarse-to-fine blurred image region detection. IEEE Signal Processing Letters, 23(11): 1652-1656 [DOI:10.1109/lsp.2016.2611608]

Tang C, Zhu X Z, Liu X W, Wang L Z and Zomaya A. 2019. DeFusionNET: defocus blur detection via recurrently fusing and refining multi-scale deep features//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2700-2709[DOI: 10.1109/cvpr.2019.00281]

Wang W Q, Shen J B, Dong X P and Borji A. 2018. Salient object detection driven by fixation prediction//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 1711-1720[DOI: 10.1109/cvpr.2018.00184]

Wang X W, Zhang S L, Liang X, Zhou H J, Zheng J J, Sun M Z. 2019. Accurate and fast blur detection using a pyramid M-shaped deep neural network. IEEE Access, 7: 86611-86624 [DOI:10.1109/access.2019.2926747]

Xiao H M, Lu W, Li R P, Zhong N, Yeung Y, Chen J J, Xue F, Sun W. 2019. Defocus blur detection based on multiscale SVD fusion in gradient domain. Journal of Visual Communication and Image Representation, 59: 52-61 [DOI:10.1016/j.jvcir.2018.12.048]

Yi X, Eramian M. 2016. LBP-based segmentation of defocus blur. IEEE Transactions on Image Processing, 25(4): 1626-1638 [DOI:10.1109/tip.2016.2528042]

Zeng K, Wang Y N, Mao J X, Liu J Y, Peng W X, Chen N K. 2019. A local metric for defocus blur detection based on CNN feature learning. IEEE Transactions on Image Processing, 28(5): 2107-2115 [DOI:10.1109/tip.2018.2881830]

Zhao B, Wu X, Feng J S, Peng Q, Yan S C. 2017. Diversified visual attention networks for fine-grained object classification. IEEE Transactions on Multimedia, 19(6): 1245-1256 [DOI:10.1109/tmm.2017.2648498]

Zhao W D, Zhao F, Wang D, Lu H C. 2020. Defocus blur detection via multi-stream bottom-top-bottom network. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 1884-1897 [DOI:10.1109/tpami.2019.2906588]