发布时间: 2019-11-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.190064
2019 | Volume 24 | Number 11

图像处理和编码

采用训练策略实现的快速噪声水平估计

徐少平, 林珍玉, 李崇禧, 刘婷云, 杨晓辉

南昌大学信息工程学院, 南昌 330031

收稿日期: 2019-03-14; 修回日期: 2019-05-10; 预印本日期: 2019-05-17

基金项目: 国家自然科学基金项目（61662044，61163023，51765042）；江西省自然科学基金项目（20171BAB202017）

第一作者简介: 徐少平, 1976年生, 男, 教授, 博士生导师, 主要研究方向为图形图像处理技术、机器视觉、虚拟手术仿真。E-mail:xushaoping@ncu.edu.cn;
林珍玉, 女, 硕士研究生, 主要研究方向为图形图像处理和计算机视觉。E-mail:401030918076@email.ncu.edu.cn;
李崇禧, 男, 硕士研究生, 主要研究方向为图形图像处理和计算机视觉。E-mail:406130917315@email.ncu.edu.cn;
刘婷云, 女, 硕士研究生, 主要研究方向为图形图像处理技术和计算机视觉。E-mail:416114517210@email.ncu.edu.cn.

中图法分类号: TP391

文献标识码: A

文章编号: 1006-8961(2019)11-1882-11

摘要

目的大多数图像降噪算法都属于非盲降噪算法，其获得良好降噪性能的前提是能够准确地获知图像的噪声水平值。然而，现有的噪声水平估计（NLE）算法在噪声水平感知特征（NLAF）提取和噪声水平值映射两个核心模块中分别存在特征描述能力不足和预测准确性有待提高的问题。为此，提出了一种基于卷积神经网络（CNN）自动提取NLAF特征，并利用增强BP（back propagation）神经网络将其映射为相应噪声水平值的改进算法。方法在训练阶段，首先通过训练卷积神经网络模型并以全连接层中若干与噪声水平值相关系数较高的输出值构成NLAF特征矢量；然后，在AdaBoost技术的支撑下，利用多个映射能力相对较弱的BP神经网络构建一个非线性映射能力更强的增强BP神经网络预测模型，将NLAF特征矢量直接映射为噪声水平值。在预测阶段，首先从给定噪声图像中随机选取若干个图块输入到卷积神经网络模型中，提取每个图块的若干维NLAF特征值后，利用预先训练的BP网络模型将其映射为对应的噪声水平值，然后以估计值的中值作为图像噪声水平值的最终估计结果。结果对于具有不同噪声水平和内容结构的噪声图像，利用所提算法估计出的噪声水平值与真实值之间的估计误差小于0.5，均方根误差小于0.9，表现出良好的预测准确性和稳定性。此外，所提算法具有较高的执行效率，估计一幅512×512像素的图像的噪声水平值仅需约13.9 ms。结论实验数据表明，所提算法在高、中、低各个噪声水平下都具有稳定的预测准确性和较高的执行效率，与现有的主流噪声水平估计算法相比综合性能更佳，可以很好地应用于要求噪声水平作为关键参数的实际应用中。

关键词

噪声水平估计; 基于训练策略; 图块级; 噪声水平值感知特征; 噪声水平值映射; 中值估计

Fast noise level estimation algorithm adopting training strategy

Xu Shaoping, Lin Zhenyu, Li Chongxi, Liu Tingyun, Yang Xiaohui

School of Information Engineering, Nanchang University, Nanchang 330031, China

Supported by: National Natural Science Foundation of China (61662044, 61163023, 51765042); Natural Science Foundation of Jiangxi Province, China (20171BAB202017)

Abstract

Objective Image denoising is a fundamental but challenging problem in low-level vision and image processing. Most existing image-denoising methods can be classified as so-called non-blind approaches, which are assumed to work under the premise of the availability of noise level. Thus, their denoising performance highly depends on the accuracy of the noise level fed into them. In practice, however, noise level is always unknown beforehand. As a result, fast and accurate noise level estimation (NLE) is often necessary for blind image denoising. To date, training-based NLE methods using handcrafted features that reflect the distortion level of a noisy image, i.e., noise level-aware features (NLAFs), still suffer from the weak ability of feature description and the low accuracy of nonlinear mapping in NLAF extraction and noise level-mapping modules, respectively. To this end, an NLE algorithm that automatically extracts NLAFs with a convolutional neural network (CNN) and directly maps the NLAFs to their corresponding noise level using AdaBoost backpropagation (BP) neural network was proposed. Method Substantially clean images were first corrupted with Gaussian noise at different noise levels to form a set of noisy images. The noisy patches extracted from the noisy images and their corresponding noise levels were then fed into a CNN to train a CNN-based NLE model. However, the CNN-based NLE model directly used to obtain the noise level of a noisy image had poor estimation accuracy. The major reasons were as follows: 1) no strong correlation between most of the output values of the fully connected layer of the CNN-based NLE model and the noise levels, and 2) an inadequate nonlinear mapping ability of the regression layer used to predict noise level in the CNN-based NLE model. Therefore, the correlation between the output values of the fully connected layer and the ground truths was analyzed and then several outputs that had higher correlation coefficient with the ground-truth noise levels were selected as the NLAFs in the form of feature vector. With the support of the AdaBoost technique, multiple BP neural networks with relative weak mapping ability were combined to build a strong nonlinear mapping prediction model, i.e., enhanced BP network, and the obtained prediction model was used to map the extracted NLAFs to their corresponding noise level directly. In the prediction phase, given a noisy image to be denoised, several patches were first randomly extracted and then fed into the trained CNN-based NLE model. Next, several NLAFs were extracted from the fully connected layer of the CNN-based model. The extracted NLAFs were subsequently approximated to corresponding estimated noise levels via the enhanced BP neural network. Finally, the median value of the patch noise levels was taken as the final estimate of the entire image, which could effectively solve the over and underestimation problems and greatly improve the execution efficiency. Result Comparison experiments were conducted to test the validity of the proposed method from three aspects, namely, estimation accuracy, denoising effect, and execution efficiency. The proposed method was compared with several state-of-the-art NLE methods to demonstrate the estimation accuracy. The CNN-based NLE model used to automatically extract NLAFs in this work was also compared. These competing NLE methods were performed on two test image sets, namely, 1) 10 commonly used images, including Cameraman, House, Pepper, Monarch, Plane, Lena, Barbara, Couple, Man, and Boat; and 2) 50 textured images borrowed from the BSD database (different from training). For a fair comparison, all methods were implemented in the environment of MATLAB 2017b, which ran on Inter (R) Core (TM) i7-3770 CPU @ 3.4 GHz RAM 8 GB. For noisy images with different noise levels and texture structures, the estimation error between the noise levels estimated by the proposed method and the ground truths was less than 0.5, and the root mean square error between the noise levels estimated by the proposed method and the ground truths was less than 0.9 across different noise levels (i.e., 5, 15, 35, 55, 75, and 95). These results indicated satisfactory and robust estimation accuracy. In denoising comparison, noise levels different from ones used in the training phase, i.e., 7.5, 17.5, 37.5, 57.5, 77.5, and 97.5, were added to 10 commonly used clean images. The classic benchmark denoising algorithm, block matching and 3D filtering (BM3D), was adopted to restore noisy images of the test set. The peak signal-to-noise ratio results obtained by the BM3D algorithm fed into ground truths and estimated noise levels were nearly equal. The proposed NLE algorithm also had high execution efficiency and took only 13.9 ms to estimate the noise level for an image 512×512 pixels in size. Conclusion Experimental results demonstrate that the proposed NLE algorithm competes efficiently with the reference counterparts across different noise levels and image contents in terms of estimation accuracy and computational complexity. Unlike the previous training-based NLE algorithm with respect to NLAF extraction, the proposed algorithm is purely data-driven and does not rely on handcrafted features or other types of prior domain knowledge. These advantages make the proposed algorithm a preferable candidate for practical denoising. After the proposed NLE method is used as a preprocessing module, non-blind denoising algorithms can obtain good denoising performance when the noise level is required as the key parameter.

Key words

noise level estimation (NLE); training-based approach; patch level; noise level-aware feature (NLAF); noise level mapping; median estimation scheme

0 引言

图像降噪算法的性能很大程度上依赖于对待降噪图像受到噪声干扰严重程度(即噪声水平参数)的准确估计^[1-3]。现有的噪声水平估计(NLE)算法大多采取某种策略从噪声图像中提取具有噪声水平感知能力的特征(NLAF)，然后依据某种推演规则实现NLAF特征值到噪声水平值的映射(预测)^{[1, 4]}。其中，从噪声图像中提取能够反映噪声水平值高低的NLAF特征对NLE算法性能起着基础性及决定性的作用^[5]。

目前，大多数NLE算法依赖手工设计(hand-crafted)的特征提取方法从噪声图像中提取反映噪声水平高低的特征值对噪声水平值进行估计^[4-8]。例如，Zoran等人^[4]发现图像在DCT(discrete cosine transform)变换后，变换系数的峰度(Kurtosis)统计值会随着噪声水平的不同而发生有规律的变化。由于Kurtosis值具有尺度不变性，通过定义目标函数并寻找最优值的方法可以实现对图像噪声水平的估计。Zoran算法^[4]提取Kurtosis值过程的计算效率较低，且在高水平噪声条件下有可能执行失败。类似地，Dong等人^[5]通过综合考虑带通域(band-pass domains)中Kurtosis值的尺度不变性和空域(spatial domain)上的分段平稳性(piecewise stationarity)，利用K-means聚类算法将图像分解为一系列不重叠的区域，并假定每个区域与一个常量有关，将噪声水平估计问题转化为一个迭代优化拟合Kurtosis模型的问题。Liu等人^[6]和Pyatykh等人^[7]着眼于在图块级(patch-level)上实现对图像噪声水平的估计。他们经过研究发现，图像中所谓的低秩图块(low-rank patches)协方差矩阵的最小特征值(Eigen)逼近于噪声水平值。Pyatykh算法^[7]和Liu算法^[6]根据不同的约束条件选取低秩图块，但都采用主成分分析(PCA)方法提取这些低秩图块协方差矩阵的最小Eigen值来估计图像噪声水平值。虽然这两种算法对噪声水平估计的准确度比较高，但都是采用迭代方式从原生图块(raw patches)中筛选出低秩图块，导致算法的执行效率比较低。为解决这个问题，Chen等人^[8]根据原生图块协方差矩阵的前若干个Eigen值与噪声水平之间具有强相关性这一特性，直接计算出噪声图像原生图块协方差矩阵的所有Eigen值，并筛选若干Eigen值来推演(inferring)出噪声水平值。该算法中不再通过迭代方式从原生图块中筛选低秩图块，执行效率得到了大幅度的提高。但是，该算法在低噪声水平下存在过估计(over-estimation)、在高水平噪声下存在欠估计(under-estimation)的问题。为进一步提高Chen算法^[8]的预测准确性和执行效率，徐少平等人^[9]提出利用支持向量回归(SVR)技术粗精结合地进行两阶段噪声水平预测。该两阶段算法直接利用预先训练的粗精两种不同粒度的预测模型，将原生图块协方差矩阵的前若干个Eigen值映射为噪声水平值，实现了预测准确性和执行效率综合性能比较好的NLE算法。但该算法所使用的特征值仍然是根据人工先验知识提取的Eigen值，且基于SVR的映射模型有进一步改进的空间。

简而言之，上述各类NLE算法都是使用人工设计的NLAF特征，这些NLAF特征往往不能全面描述图像受到噪声干扰后的变化特点，导致后续的NLE算法中的噪声水平映射模块的预测性能降低。因此，在NLE算法中一个能自动获取描述能力强的NLAF特征的特征提取模块能给后续噪声水平值映射模块奠定扎实的基础。

到目前为止，现有的NLE算法均是基于人工领域知识提取NLAF特征值的方法，尚未有采用机器学习的方法实现NLAF特征值的自动提取工作。近年来，研究者们利用卷积神经网络(CNN)自动学习图像特征有效地提高了图像分类^[10]、目标检测^[11]、图像复原^[12]等图像处理任务的性能。例如，Zhu等人^[11]通过提取深度卷积神经网络隐含层的特征实现鲁棒的航空图像目标检测。实验结果表明，用在隐含层提取的特征进行定向鲁棒的目标检测(orientation robust object detection)时能够达到令人满意的准确率，且大幅度降低了计算复杂度。类似地，文献[13-14]利用CNN分别从遥感图像和X射线图像中提取用于目标检测的特征，取得了很高的检测正确率，充分表明CNN具有强大的特征表示和特征学习能力。以上工作都是利用CNN网络架构隐含层中的某一层输出值作为特征值来完成某种图像处理任务，这种提取特征的方法优于传统人工获取特征的方法，能避免人工设计的特征执行效率低下和描述能力差的问题。

NLE算法要获得成功，其实主要受NLAF特征提取(features extraction)质量和特征映射(features mapping)准确性两个方面的影响。为实现预测性能更佳的NLE算法，提出了一种基于自动特征提取与增强BP(back propagation)网络映射的NLE算法，试图采用具有强大学习能力的CNN自动从图块中提取刻画能力强的NLAF特征矢量, 并利用增强的BP网络预测模型进一步将NLAF特征矢量, 准确地映射为噪声水平值。与现有的工作相比，本文工作的特色有：1)自动的NLAF特征提取。CNN的输入是图块，在提取特征网络模型训练完成后，无需专业领域知识就可以获得描述能力更强的NLAF特征；2)更好的非线性映射能力。利用增强BP神经网络，较SVR、BP等传统映射方法实现了更好的预测准确性。实验数据表明所提出的NLE算法在预测准确性方面和执行效率方面的综合优势更为明显，更具实用价值。

1 现有问题与改进思路

1.1 现有问题

近年，CNN卷积神经网络在计算机视觉领域涉及到的分类^[15]和回归^[16]问题中得到了广泛应用，表现出了强大的特征学习能力和非线性逼近能力^[17]。CNN网络核心构件包括卷积层(Conv)、池化层(Pooling)以及全连接层(FC)，它们通过交替叠加有机地形成深度网络，以模拟人脑视觉皮层中简单和复杂细胞交替逐步提取输入的抽象视觉属性的过程^[18]。其实，可以用噪声图像作为输入、噪声水平值作为目标值，直接在有代表性的训练数据集合上训练CNN回归模型即可实现NLE算法。然而，这种完全在CNN框架下直接实现NLE算法(以下简称CNN-NLE)在预测准确性方面的表现并不是特别好。其主要原因在于：1)全连接层输出值(即通过深度学习提取的特征值)大多数与噪声水平值之间其实并不具有强相关性；2)CNN框架下承担预测作用的回归层(Regression)其非线性映射能力比较一般。

1.2 改进思路

针对CNN-NLE算法存在的上述两个弊端，通过相关性分析技术从CNN网络模型全连接层的输出值中筛选出与噪声水平值相关性最高的若干输出值作为NLAF特征值使用。另外，为增强噪声水平值的映射能力，采用Adaboost技术将多个映射能力相对较弱的BP神经网络映射器构成一个非线性映射能力更强的映射器。特征提取模型和映射模型训练完成后即可用于噪声水平值预测。对于给定的噪声图像，首先随机提取图像中少量图块并利用映射模型获得这些图块的噪声水平值，并构成一个噪声水平值集合，然后以图块噪声水平值集合中的中值作为整体图像噪声水平值最终的预测结果。

在具体实现细节上，本文在图块级上(而非图像级上)实现NLAF特征的提取与噪声水平值的预测。其主要优点有：1)实现复杂度低。如果在图像级上实现NLAF特征自动提取，则CNN网络模型就需要利用更多的卷积、下采样和池化操作，才能将高维度的输入(图像)降为所需要的NLAF特征矢量，使得网络结构复杂，网络训练和执行时间都将显著增加。2)预测准确性高。一般自然图像的细节内容均较为丰富，在图像级上完成噪声水平值的预测，其预测稳定性会在一定程度上受到图像内容的影响。为此，本文随机选取待测试图像中的m个图块，然后将在这些图块上所预测的噪声水平值的中值作为整幅图像的噪声水平估计值。这种处理方法能够有效解决在图像级上预测结果偏高和偏低的问题。可以获得与真实噪声水平值更为接近的估计结果，同时执行效率也能得到大幅度提高。

2 改进算法

2.1 提取NLAF特征的CNN模型

根据1.2节描述的技术路线，首先选用若干无噪声图像添加不同噪声水平值构成噪声图像集合，将噪声图像集合中的各个噪声图像分解为若干个图块并联合相应的噪声水平值构成训练数据集；其次，以训练数据集中图块及其对应的噪声水平值作为CNN网络训练数据的输入与输出，训练用于提取NLAF特征的CNN模型。

具体地，选取BSD数据库^[19]的100幅图像作为测试图像，给原始无失真图像加入5~100、间隔为5的高斯噪声生成噪声图像，然后从每幅噪声图像的不同位置随机提取200个噪声图块，与对应的噪声水平值共同构成训练数据集合(数据量为400 000个)。该模型通过随机梯度下降(SGD)算法以0.001的学习率进行模型训练，在回归层利用平方损失函数调整网络参数实现图块的真实噪声水平值与预测值之间的误差最小化，网络结构如图 1所示。该CNN网络由3个卷积层(卷积核大小分别为7×7、6×6、6×6)、2个池化层、1个非线性激活层、1个全连接层(包括20维输出值)和1个回归层组成。其实，图 1所示的CNN网络模型本身(即CNN-NLE算法)可以直接用于噪声水平预测，但是预测精度并不理想。尽管如此，鉴于CNN强大的特征提取能力，本文通过相关性分析的方法仅提取CNN网络全连接层部分与噪声水平值高度相关的输出值作为NLAF特征值使用，故图 1所示的CNN网络模型在文中被作为NLAF特征提取模型来使用。

图 1 用于自动提取NLAF特征的CNN网络架构

Fig. 1 The CNN architecture for extracting NLAF features

2.2 相关性分析

由于并非所有CNN网络全连接层的输出值与噪声水平值都具有强相关性(对噪声水平值变化敏感)，使用较低相关性的输出值作为NLAF特征值将影响后续映射过程的预测精度。为此，本文通过相关性分析技术筛选CNN-NLE模型中FC全连接层的输出值，仅用相关性高的特征值构成NLAF特征矢量。具体地，抽取CNN-NLE模型训练数据集中的$ N$ =40 000个图块$ \left(P_{1}, P_{2}, \cdots, P_{i}, \cdots, P_{N}\right)$ (涵盖了100幅图像的内容、20个级别的噪声水平值，每幅图像选用20个图块)。将所有图块输入到CNN-NLE模型中，计算相应全连接层第$ j$维输出值$\left. {\left( {o_{{P_1}}^i, o_{{P_2}}^i, \cdots , o_{{P_i}}^j, \cdots , o_{{P_N}}^j} \right.} \right) $与图块真实噪声水平值$\left. {\left( {\sigma _{{P_1}}, \sigma_{{P_2}}^{}, \cdots , \sigma_{{P_i}}, \cdots , \sigma_{{P_N}}} \right.} \right) $之间的PLCC(Pearson linear correlation coefficient)、SROCC(Spearman rank-order correlation coefficient)和KROCC(Kendall rank-order correlation coefficient)3种类型的相关系数^[20]。

PLCC系数用于评价全连接层的输出值与噪声水平值之间的线性相关关系，其定义为

$ {\lambda _{{\rm{PLCC}}}} = \frac{{\sum\limits_{i = 1}^N {\left( {o_{{p_i}}^j( - u)\left( {{\sigma _{{p_i}}} - v} \right)} \right.} }}{{\sqrt {\sum\limits_{i = 1}^N {{{\left( {o_{{p_i}}^j - \bar f_{{p_i}}^j} \right)}^2}} } \sqrt {\sum\limits_{i = 1}^N {{{\left( {\sigma _{{p_i}}^j - \bar f_{{p_i}}^j} \right)}^2}} } }} $

(1)

式中，$u $和$ v$分别表示全连接层第$ j$维输出值的均值和噪声水平值的均值。

SROCC系数主要用于评价两组数据的等级相关性，其定义为

$ {\lambda _{{\rm{SROCC }}}} = \frac{{\sum\limits_{i = 1}^N {\left( {R\left( {o_{{p_i}}^j} \right) - {{\bar r}_f}} \right)} \left( {R\left( {{\sigma _{{p_i}}}} \right) - {{\bar r}_\sigma }} \right)}}{{\sqrt {\sum\limits_{i = 1}^N {\left( {R\left( {o_{{p_i}}^j} \right) - {{\bar r}_f}} \right)} \sum\limits_{i = 1}^N {\left( {R\left( {{\sigma _{{p_i}}}} \right) - {{\bar r}_\sigma }} \right)} } }} $

(2)

式中，$ {R\left( {o_{{p_i}}^j} \right)}$和${R\left( {{\sigma _{{p_i}}}} \right)} $分别表示$ {o_{{p_i}}^j}$和$ {{\sigma _{{p_i}}}}$对应的等级划分，$ {\bar r}$和${{{\bar r}_\sigma }} $分别为FC输出值和噪声水平值的等级平均值。

KROCC系数主要是用来度量两组数据单调相关关系。在由全连接层第$j $维的输出值和噪声水平值组成的数据对$ \left( {o_{{P_i}}^j, {\sigma _{{P_i}}}} \right)_{i = 1}^N$中，任意选择两个不同的数据对$\left(o_{P_{x}}^{j}, \sigma_{P_{x}}\right) $和$\left(o_{P_{y}}^{j}, \sigma_{P_{y}}\right)(x \neq y) $形成$\left[\left(o_{P_{x}}^{j}, \sigma_{P_{x}}\right), \left(f_{P_{y}}^{j}, \sigma_{P_{y}}\right)\right] $。如果$o_{P_{x}}^{i} <o_{P_{y}}^{j} $且$ {\sigma _{{P_x}}} < {\sigma _{{P_y}}}$，则称$\left[ {\left( {o_{{P_x}}^j, {\sigma _{{P_x}}}} \right), \left( {o_{{P_y}}^j, {\sigma _{{P_y}}}} \right)} \right] $为同序对，其总个数记为${Q_c} $；如果$o_{P_{x}}^{i} <o_{P_{y}}^{j} $且$ {\sigma _{{P_x}}} > {\sigma _{{P_y}}}$，则称$\left[ {\left( {o_{{P_x}}^j, {\sigma _{{P_x}}}} \right), \left( {o_{{P_y}}^j, {\sigma _{{P_y}}}} \right)} \right] $为逆序对，其总个数为$ {Q_d}$；此外，将$o_{{P_x}}^j \ne o_{{P_y}}^j $且$ {\sigma _{{P_x}}} = {\sigma _{{P_y}}}$的数据对个数记为$ A$，将$ o_{{P_x}}^j \ne o_{{P_y}}^j$且${\sigma _{{P_x}}} = {\sigma _{{P_y}}} $的数据对个数记为$ B$，那么KROCC系数的数学表达式为

$ {\lambda _{{\rm{KROCC}}}} = \frac{{{Q_c} - {Q_d}}}{{\sqrt {\left( {{Q_e} + {Q_d} + A} \right)\left( {{Q_c} + {Q_d} + B} \right)} }} $

(3)

相关性计算涉及的数据量高达40 000条，可以充分保证所获得的相关性指标稳定性。从表 1中可以看出，FC全连接层中第15~20维的输出值与噪声水平值在3种类型的相关性系数值都在0.6以上，其中第16~20维大于0.91。虽然3种类型的相关性系数值所定义的具体相关性内容不同，但是取值均值均在[0, 1]之间，越接近1表示相关性越高，越接近0表示相关性越低。表 1中的数据表明第15~20维的输出值与噪声水平值具有显著中强程度的正相关性，能够稳定地反映噪声水平值的变化。因此，本文使用15~20维的输出值构成NLAF特征矢量。

表 1 全连接层中各维特征值与真实噪声水平值之间的相关性
Table 1 Correlation coefficient between features extracted from full-connected layer and noise levels

下载CSV

维数	PLCC	SROCC	KROCC
1	-0.614 1	-0.274 3	-0.091 0
2	-0.745 2	-0.528 6	-0.345 5
3	-0.865 9	-0.989 7	-0.932 5
4	-0.940 6	-0.993 9	-0.950 6
5	-0.936 2	-0.951 7	-0.847 2
6	-0.911 2	-0.902 0	-0.764 7
7	-0.419 8	-0.423 6	-0.411 4
8	-0.269 5	-0.274 8	-0.291 1
9	-0.247 8	-0.247 8	-0.223 3
10	-0.236 3	-0.224 8	-0.155 9
11	-0.123 1	-0.106 4	-0.043 1
12	-0.114 3	-0.071 2	0.020 0
13	0.070 2	0.097 0	0.154 9
14	0.327 3	0.337 0	0.344 8
15	0.779 6	0.795 4	0.664 2
16	0.978 0	0.982 0	0.914 9
17	0.966 7	0.996 2	0.960 0
18	0.952 1	0.995 4	0.956 3
19	0.941 2	0.994 5	0.952 3
20	0.929 8	0.993 3	0.947 5
注：加粗字体表示本文使用的15~20维特征值及其对应的相关系数值。

2.3 噪声水平值映射

由于单个BP神经网络的映射能力较弱，为达到更优的映射(预测)结果，利用AdaBoost技术^[21]将多个BP神经网络组合在一起，一个BP神经网络相当于对应AdaBoost架构中的一个弱预测器(weak learner)，这些弱预测器根据在训练集上的预测误差自适应地调整相应的权重，最后多个弱预测器的加权组合形成一个鲁棒且更为准确的增强BP神经网络。为训练该网络模型，需要利用从大量图块中提取的带有标签(即真实噪声水平值$ \sigma $)的NLAF特征矢量$ \boldsymbol{F}_{n}=\left(f_{1}, f_{2}, \cdots, f_{6}\right)$ (其中$n = 1, 2, \cdots , K, K $为训练图块的数量，${{f_1}, {f_2}, \cdots , {f_6}} $分别对应于CNN模型所提取的15~20维的特征)构成训练集合$ \left\{ {\left( {{\mathit{\boldsymbol{F}}_1}, {\sigma _1}} \right), \left( {{\mathit{\boldsymbol{F}}_2}, {\sigma _2}} \right), \cdots , \left( {{\mathit{\boldsymbol{F}}_k}, {\sigma _K}} \right)} \right\} \subset {{\bf{R}}^6} \times {\bf{R}}$。对于任意给定的图块${P_n} $，训练一个映射函数$\hat{\sigma}_{n}=\varphi\left(F_{n}\right) $将从图块${P_n} $中提取的NLAF特征矢量$ {\mathit{\boldsymbol{F}}_n}$映射为噪声水平估计值$ {{\hat \sigma }_{{P_n}}}$，具体训练过程如下：

首先，确定弱预测器的个数$ T$ =10(分析实验数据发现，当$ T$>10时，增加弱预测器个数对噪声检测正确率的提升作用已不大)，计算每个弱预测器$\left\{B P_{i} | 1 \leqslant i \leqslant T, i \in \bf{N}\right\} $对噪声水平的预测误差

$ {\varepsilon _i} = \sum\limits_{j = 1}^K {{D_{i, j}}} \times l\left( {{\sigma _j} - {{\hat \sigma }_{i, j}}} \right) $

(4)

式中，${{D_{i, j}}} $代表在第$ i$个弱预测器中第$ j$条训练数据对预测误差的权重，定义为

$ \begin{array}{l} {D_{i, j}} = \\ \left\{ {\begin{array}{*{20}{l}} {1/K}&{i = 1}\\ {{D_{i - 1, j}} \times \left( {1 + C \cdot l\left( {{\sigma _j} - {{\hat \sigma }_{i - 1, j}}} \right)} \right)}&{i = 2, \cdots , T} \end{array}} \right. \end{array} $

(5)

式中，$ \sigma $为特征值对应的真实噪声水平值，${\hat \sigma } $为噪声水平估计值，$ C$为[0, 1]之间的常数。$ l\left( \cdot \right)$函数为一个二值函数(binary function)，用于修正弱预测器的预测误差及对应的权重, 即

$ l(x)=\left\{\begin{array}{ll}{1} & {x>\delta} \\ {0} & {x \leqslant \delta}\end{array}\right. $

(6)

式中，$\delta $为设定的阈值。根据实验数据分析，最终设定参数$ C$ =0.1，$\delta $=0.1。

然后，利用凸函数(convex function)将每个弱预测器的预测误差转换为弱预测器在最终预测结果中的权重，使得那些预测误差较小的弱预测器的权重增加，而误差较大的弱预测器的权重减小。第$ i$个弱预测器的权重定义为

$ w_{i}=1 / \exp \left(-b\left(\left|\varepsilon_{i}\right|-c\right)\right) $

(7)

最后，基于各弱预测器的权重${w_i} $及其预测的噪声水平值$ {{\hat \sigma }_i}$，计算出图块${P_n} $的噪声水平估计值

$ \hat{\sigma}_{P_{n}}=\sum\limits_{i=1}^{T} w_{i} \times \hat{\sigma}_{i} $

(8)

对于一幅待评价的图像I，根据本文方法的技术路线，首先随机从图像I中提取$ m$个图块及其蕴含的特征矢量；然后，利用所训练的增强BP神经网络预测这些图块相应的噪声水平值$ \left\{ {{{\hat \sigma }_{{P_1}}}, {{\hat \sigma }_{{P_2}}}, \cdots , {{\hat \sigma }_{{P_{m - 1}}}}, {{\hat \sigma }_{{P_m}}}} \right\}$。最终图像I的噪声水平估计值为

$ {{\hat \sigma }_I} = M\left( {{{\hat \sigma }_{{P_1}}}, {{\hat \sigma }_{{P_2}}}, \cdots , {{\hat \sigma }_{{P_{m - 1}}}}, {{\hat \sigma }_{{P_m}}}} \right) $

(9)

式中，$M\left( \cdot \right) $为经典的中值函数。虽然增强BP神经网络可以获得不错的预测精度，但是受图像背景内容的干扰，其预测结果难免会出现过估计和欠估计的情况，利用$ M$函数可以有效去除噪声水平估计值集合中的离群值对最终估计值的影响，使得算法具有更好的估计准确性和鲁棒性。

3 实验结果与分析

3.1 数据集及实验配置

为了评估本文提出的NLE算法的性能，选择有代表性的Immerkær^[22]、Zoran^[4]、Yang^[23]、Liu^[6]、Chen^[8]、Rakhshanfar^[24]和CNN-NLE算法作为对比算法在2个测试图像集合进行测试。一个测试图像集合由各类文献常用测试图像构成，如图 2所示；另一个图像集合是由从BSD数据库^[19]中随机选取的50幅图像构成，如图 3所示。上述算法均运行在相同的硬件以及MATLAB 2017b软件平台上。

图 2 各类文献中常用的图像集合

Fig. 2 Commonly used images in the references

((a) Cameraman; (b) House; (c) Peppers; (d) Monarch; (e) Plane; (f) Lena; (g) Barbara; (h) Couple; (i) Man; (j) Boat)

图 3 BSD数据库中的部分代表性图像

Fig. 3 Some representative images in the BSD database

3.2 参数s和m的确定

改进算法的主要技术路线是：通过卷积神经网络自动提取噪声图块的NLAF特征值作为增强BP神经网络的输入, 实现对该图块噪声水平值的预测，将多个图块预测值的平均值作为整幅图像噪声水平值的最终估计值。相对而言，图块的大小(s×s)和数量(m)是影响算法整体性能的关键参数。为了确定最佳的图块大小和数量，首先对训练集中的所有无失真图像施加不同级别(5~100，间隔为5)的高斯噪声，然后在不同的参数s和m下训练6×7=42种不同的NLAF特征自动提取CNN模型(具体参数见表 2)，最后选用BSD数据库^[19]中的100幅图像(与训练集图像不同)作为测试图像，利用上述训练好的网络模型提取特征作为增强BP神经网络的输入，对图像的噪声水平值进行预测。对于每一种参数配置，计算每个噪声级别下对噪声水平预测值的均方根误差(RMSE)，取其均值作为度量该参数配置下本文算法优劣的标准，实验数据列在表 2中。由表 2可知，随着图块大小和所用图块数量的增加，预测结果的均方根误差有减少的趋势，然而算法的执行时间也会随之迅速增加。综合考虑算法的预测准确度和执行效率，最终确定参数取值为s=40、m=20。在各个对比实验中，本文算法的实验数据均是按照从图像中随机提取20个大小为40×40像素的图块这种配置下获得的。

表 2 不同参数配置下预测结果的均方根误差对比
Table 2 RMSE between estimated noise levels and ground truths under different parameter s and m settings

下载CSV

s	m
s	10	15	20	25	30	40	50
20	1.30	1.19	1.14	1.12	1.11	1.07	1.07
25	1.12	1.07	1.03	1.02	0.99	0.98	0.96
28	1.08	1.02	0.99	0.96	0.96	0.93	0.93
30	1.07	1.01	0.97	0.95	0.95	0.94	0.93
35	1.23	1.20	1.17	1.16	1.16	1.14	1.13
40	0.93	0.89	0.86	0.84	0.84	0.83	0.82

3.3 预测准确性

为了验证所提出算法的预测准确性，对图 2所示的图像施加不同水平(5，15，35，55，75和95)的高斯噪声，所有参与对比的算法在噪声图像上的预测数据在表 3和表 4中列出(限于篇幅，表 3中仅给出各NLE算法在Lena图像上的预测结果)。从表 3和表 4可以得出，Immerkær算法和Zoran算法的预测准确性较差；Yang算法、Liu算法和Rakhshanfar算法在各噪声水平下预测结果不够稳定；Chen算法在高噪声水平下表现良好，但在低噪声水平下的预测准确性不够理想；CNN-NLE算法预测准确度相对其他对比算法而言处于中等水平；本文算法在所有噪声水平下，不管是在单幅图像还是10幅图像上都表现出令人满意的预测准确性，表 4中的平均均方根误差值排名第1，表明它具有最佳的预测稳定性。

表 3 各算法在Lena图像上的预测结果
Table 3 Estimation results obtained by different algorithms on Lena image at different noise levels

下载CSV

算法	噪声水平值						误差均值
算法	5	15	35	55	75	95	误差均值
Immerkær^[22]	5.86	15.45	34.74	51.97	65.91	77.36	5.22
Zoran^[4]	5.15	14.18	33.29	53.11	72.86	93.37	1.39
Yang^[23]	5.48	15.06	34.47	54.45	73.97	93.34	0.72
Liu^[6]	5.39	14.99	34.46	54.57	74.26	93.90	0.54
Chen^[8]	5.79	15.37	34.96	55.27	75.37	95.18	0.34
Rakhshanfar^[24]	5.08	14.90	34.66	54.10	74.24	95.25	0.41
CNN-NLE	5.77	16.10	35.69	56.02	76.35	96.01	0.99
本文	5.28	14.77	34.57	54.73	74.14	95.34	0.40
注：加粗字体表示最优结果，加下划线字体表示次优结果。

表 4 10幅常用图像上各算法预测值与真值之间的均方根误差
Table 4 RMSE between estimated noise levels and ground truths on ten commonly used images

下载CSV

算法	噪声水平值						平均值
算法	5	15	35	55	75	95	平均值
Immerkær^[22]	1.79	1.00	0.44	3.39	9.50	18.08	5.70
Zoran^[4]	0.42	1.09	1.74	2.30	2.29	2.02	1.64
Yang^[23]	0.74	0.39	0.73	0.85	1.82	1.38	0.99
Liu^[6]	0.24	0.14	0.46	0.84	1.39	1.67	0.79
Chen^[8]	1.97	1.07	0.69	0.40	0.44	0.71	0.88
Rakhshanfar^[24]	0.43	0.70	1.55	2.69	4.03	4.01	2.24
CNN-NLE	1.01	1.33	1.55	1.73	1.48	1.89	1.50
本文	0.82	0.54	0.60	0.74	0.83	0.99	0.75
注：加粗字体表示最优结果，加下划线字体表示次优结果。

为了进一步验证改进算法的泛化能力，对第2个测试集中的测试图像添加不同于训练模型阶段中施加的噪声水平的高斯噪声(即7.5，17.5，37.5，57.5，77.5，97.5)进行测试。本文算法的预测结果与真实噪声水平值之间的均方根误差如表 5所示。从表 5可以看出：本文算法的预测准确性在部分噪声水平值上比表 4有所下降，均方根误差平均值也有少许提高，主要原因在于：1)参加测试的噪声水平值并未在训练数据集中出现；2)BSD图像库中的图像主要为纹理图像，图像细节非常丰富，给预测模型带来了一定困难。但总体而言，本文算法仍然表现出了较好的鲁棒性。

表 5 本文算法在50幅BSD图像上预测值与真值之间的均方根误差
Table 5 RMSE between estimated noise levels and ground truths on 50 images from BSD database

下载CSV

噪声水平值	7.5	17.5	37.5	57.5	77.5	97.5
RMSE	1.03	0.54	0.40	1.25	1.05	0.98

3.4 实际应用效果

为了进一步验证本文算法的实际应用效果，对图 2所示的10幅常用图像施加噪声水平为7.5，17.5，37.5，57.5，77.5，97.5的高斯噪声(与训练模型时所用的噪声水平不同)作为测试集，使用BM3D(block matching and 3D filtering)经典基准测试降噪算法^[2](使用其他主流降噪算法所得到的结果类似)对测试集图像进行降噪。使用真实噪声水平值和所提出算法预测的噪声水平值分别作为BM3D算法的输入参数，对各噪声级别下的噪声图像进行降噪，将两者降噪后的峰值信噪比(PSNR)指标值用散点图的形式展现出来，如图 4所示。从图 4可以看出，所有的点基本上处于45°对角线上或者附近，说明使用所提出算法预测的噪声水平值作为参数的情况与使用真实值的实际降噪效果差别不大，充分表明了所提出NLE算法的实用性。

图 4 分别使用噪声真实值和预测值作为BM3D算法输入参数的降噪效果对比

Fig. 4 Performance difference of BM3D algorithm by using estimated noise levels and ground truths

3.5 执行时间

为了比较各NLE算法的执行时间，计算所有对比算法在图 2所示的10幅常用图像(大小为512×512像素)上的平均执行时间，实验结果列在表 6中。从表 6中可以看出，本文算法的执行时间控制在14 ms以内，排名位于所有参与比较算法的第3名。但由表 3—表 5的数据可知，排名前2位的Immerkær和CNN-NLE算法的预测准确性比较差。因此，综合考虑预测准确度和执行效率两方面，改进算法较其他算法更具优势。

表 6 各算法执行时间的比较
Table 6 Average execution time of different NLE methods

下载CSV

/ms
算法	执行时间
Immerkær^[22]	2.3
Zoran^[4]	1 141.6
Yang^[23]	39.5
Liu^[6]	791.3
Chen^[8]	79.4
Rakhshanfar^[24]	35.8
CNN-NLE	12.6
本文	13.9

4 结论

本文从NLAF特征提取和噪声水平值映射两个方面全面提升现有NLE算法的性能，获得了一种新的在预测准确性和执行效率上俱佳的改进算法。该算法直接利用CNN网络模型自动提取NLAF特征值，避免了基于人工领域知识设计NLAF特征提取算法的局限性。然后在Adaboost技术的支撑下利用多个BP神经网络预测器构建一个非线性映射能力强大的增强BP网络预测模型, 将NLAF特征矢量更为准确地映射为噪声水平值，保证了算法的预测准确性。此外，改进算法在图块级上实现，具有较高的执行效率。实验数据表明，与现有的主流NLE算法相比，所提出的NLE算法能够在预测准确性和执行效率两个方面取得很好的平衡。未来将从映射能力更强的深度神经网络入手，进一步提高噪声水平估计算法的预测准确性和执行效率。

参考文献

[1] Xu S P, Zhang X Q, Jiang Y N, et al. Noise level estimation based on local means and its application to the blind BM3D denoising algorithm[J]. Journal of Image and Graphics, 2017, 22(4): 422–434. [徐少平, 张兴强, 姜尹楠, 等. 局部均值噪声估计的盲3维滤波降噪算法[J]. 中国图象图形学报, 2017, 22(4): 422–434. ] [DOI:10.11834/jig.20170402]

[2] Dabov K, Foi A, Katkovnik V, et al. Image denoising by sparse 3-D transform-domain collaborative filtering[J]. IEEE Transactions on Image Processing, 2007, 16(8): 2080–2095. [DOI:10.1109/TIP.2007.901238]

[3] Xie Y, Gu S H, Liu Y, et al. Weighted Schatten p-norm minimization for image denoising and background subtraction[J]. IEEE Transactions on Image Processing, 2016, 25(10): 4842–4857. [DOI:10.1109/TIP.2016.2599290]

[4] Zoran D, Weiss Y. Scale invariance and noise in natural images[C]//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan: IEEE, 2009: 2209-2216.[DOI: 10.1109/ICCV.2009.5459476]

[5] Dong L, Zhou J T. Noise level estimation for natural images based on scale-invariant kurtosis and piecewise stationarity[J]. IEEE Transactions on Image Processing, 2017, 26(2): 1017–1030. [DOI:10.1109/TIP.2016.2639447]

[6] Liu X H, Tanaka M, Okutomi M. Single-image noise level estimation for blind denoising[J]. IEEE Transactions on Image Processing, 2013, 22(12): 5226–5237. [DOI:10.1109/TIP.2013.2283400]

[7] Pyatykh S, Hesser J, Zheng L. Image noise level estimation by principal component analysis[J]. IEEE Transactions on Image Processing, 2013, 22(2): 687–699. [DOI:10.1109/TIP.2012.2221728]

[8] Chen G Y, Zhu F Y, Heng P A. An efficient statistical method for image noise level estimation[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 477-485.[DOI: 10.1109/ICCV.2015.62]

[9] Xu S P, Zeng X X, Tang Y L. Fast noise level estimation algorithm based on two-stage support vector regression[J]. Journal of Computer-Aided Design & Computer Graphics, 2018, 30(3): 447–458. [徐少平, 曾小霞, 唐祎玲. 基于两阶段支持向量回归的快速噪声水平估计算法[J]. 计算机辅助设计与图形学学报, 2018, 30(3): 447–458. ] [DOI:10.3724/SP.J.1089.2018.16422]

[10] Hu F, Xia G S, Wang Z F, et al. Unsupervised feature learning via spectral clustering of multidimensional patches for remotely sensed scene classification[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2015, 8(5): 2015–2030. [DOI:10.1109/JSTARS.2015.2444405]

[11] Zhu H G, Chen X G, Dai W Q, et al. Orientation robust object detection in aerial images using deep convolutional neural network[C]//Proceedings of 2015 IEEE International Conference on Image Processing. Quebec City, Canada: IEEE, 2015: 3735-3739.[DOI: 10.1109/ICIP.2015.7351502]

[12] Ren J J, Fang X Y, Chen S W, et al. Image deblurring based on fast convolutional neural networks[J]. Journal of Computer-Aided Design & Computer Graphics, 2017, 29(8): 1444–1456. [任静静, 方贤勇, 陈尚文, 等. 基于快速卷积神经网络的图像去模糊[J]. 计算机辅助设计与图形学学报, 2017, 29(8): 1444–1456. ] [DOI:10.3969/j.issn.1003-9775.2017.08.007]

[13] Long Y, Gong Y P, Xiao Z F, et al. Accurate object localization in remote sensing images based on convolutional neural networks[J]. IEEE Transactions on Geoscience and Remote Sensing, 2017, 55(5): 2486–2498. [DOI:10.1109/TGRS.2016.2645610]

[14] Akcay S, Kundegorski M E, Willcocks C G, et al. Using deep convolutional neural network architectures for object classification and detection within X-ray baggage security imagery[J]. IEEE Transactions on Information Forensics and Security, 2018, 13(9): 2203–2215. [DOI:10.1109/TIFS.2018.2812196]

[15] Zhang M M, Li W, Du Q. Diverse region-based CNN for hyperspectral image classification[J]. IEEE Transactions on Image Processing, 2018, 27(6): 2623–2634. [DOI:10.1109/TIP.2018.2809606]

[16] Liu F Y, Shen C H, Lin G S. Deep convolutional neural fields for depth estimation from a single image[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 5162-5170.[DOI: 10.1109/CVPR.2015.7299152]

[17] Zhou F Y, Jin L P, Dong J. Review of convolutional neural network[J]. Chinese Journal of Computers, 2017, 40(6): 1229–1251. [周飞燕, 金林鹏, 董军. 卷积神经网络研究综述[J]. 计算机学报, 2017, 40(6): 1229–1251. ] [DOI:10.11897/SP.J.1016.2017.01229]

[18] Chang L, Deng X M, Zhou M Q, et al. Convolutional neural networks in image understanding[J]. Acta Automatica Sinica, 2016, 42(9): 1300–1312. [常亮, 邓小明, 周明全, 等. 图像理解中的卷积神经网络[J]. 自动化学报, 2016, 42(9): 1300–1312. ] [DOI:10.16383/j.aas.2016.c150800]

[19] Arbelaez P, Maire M, Fowlkes C, et al. Contour detection and hierarchical image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33(5): 898–916. [DOI:10.1109/TPAMI.2010.161]

[20] Zhang L, Zhang L, Mou X Q, et al. A comprehensive evaluation of full reference image quality assessment algorithms[C]//Proceedings of the 19th IEEE International Conference on Image Processing. Orlando, FL, USA: IEEE, 2012: 1477-1480.[DOI: 10.1109/ICIP.2012.6467150]

[21] Baig M M, Awais M M, El-Alfy E S M. AdaBoost-based artificial neural network learning[J]. Neurocomputing, 2017, 248: 120–126. [DOI:10.1016/j.neucom.2017.02.077]

[22] Immerkær J. Fast noise variance estimation[J]. Computer Vision and Image Understanding, 1996, 64(2): 300–302. [DOI:10.1006/cviu.1996.0060]

[23] Yang S M, Tai S C. Fast and reliable image-noise estimation using a hybrid approach[J]. Journal of Electronic Imaging, 2010, 19(3): #033007. [DOI:10.1117/1.3476329]

[24] Rakhshanfar M, Amer M A. Estimation of Gaussian, Poissonian-Gaussian, and processed visual noise and its level function[J]. IEEE Transactions on Image Processing, 2016, 25(9): 4172–4185. [DOI:10.1109/TIP.2016.2588320]