Saliency object detection method based on complex domains
Cui Liqun, Zhao Yue, Hu Zhiyi, Zhao Yukang
School of Software, Liaoning Technical University, Huludao 125105, China
Supported by: National Natural Science Foundation of China(61172144)

Objective Saliency object detection with development of human visual attention mechanism has been widely studied by computer vision researchers. Visual significance is an important mechanism of human visual system. It simulates the human visual attention mechanism, extracts the most interesting areas of the scene quickly and accurately, and ignores redundant information. Saliency object detection has been widely used in image compression, segmentation, redirection, video coding, target detection, recognition, and many other tasks. Although numerous significant target detection methods are available, problems remain. For example, the detection results look well when the background is simple, but when the background is complex, the results may have some uncertainty as regards the environment, cluttered background in the area around the target, or influence of selection on the significant target detection method. The problem of cluttered background and inaccurate detection area often occurs when the salient object detection method generates significant graphs. To solve these problems, saliency object detection method is proposed based on complex domain. The complex domain combines frequency, spatial, and wavelet domains; takes advantage of the complex domain to combine the advantages on three domains; and suppresses the background to obtain an accurate and clear salient target area. Method Environmental conditions are one of the key factors that influence saliency object detection; for example, weak light or foggy days can cause unclear images and lead to poor results of significant target detection. Multi-scale retinex is an image enhancement algorithm based on color theory. By introducing multi-scale retinex algorithm, the image restoration is realized by linear weighting in the process of dynamically scaling a picture. First, multi-scale retinex enhancement algorithm is used to preliminarily process the original image in spatial domain and exclude environmental impacts. After image processing, the brightness becomes more appropriate to the real scene brightness, and the foreground and background contrast is also significantly improved. In addition to the environmental impact, the background areas of the non-significant target often occupy most of the image space in the saliency object detection images. These background areas increase the error detection problem and reduce the accuracy rate. Experiments found that most background areas are the sky, trees, grasslands, and buildings, which are beyond the scope of this study. The characteristics of the background areas with repeatability can be suppressed by hyper-complex Fourier transform. Then, undirected graph is established and node features on the images are extracted preliminarily. The hyper-complex Fourier transform in the frequency domain is reconstructed to acquire the smoothing amplitude spectrum, phase spectrum, and Euler spectrum. Then, background suppression graphs are obtained through the smoothness of multi-scale Gaussian kernel. At the same time, the multi-level feature of wavelet transform in the wavelet domain is utilized to extract multiple features in terms of images, and the saliency graph of multiple features is calculated. The saliency graph effectively preserves the details of the image because of the unique localization characteristics of the wavelet domain. Finally, the proposed adaptive threshold selection method is used to fuse the background suppression diagram with the saliency graph of multiple features and the final saliency graph is selected and obtained. The final saliency figure suppresses the background while preserving the details of the image. Result To make the experimental effect persuasive, saliency object detection experiments in the standard test dataset images MSRA10K and THUR15k are conducted. MSRA10K datasets consist of 10 000 images of hand-annotated and accurate to pixel-level salient target annotations, including images of natural scenery, biology, architecture, and transportation. THUR15K datasets consist of 15 000 web images with five keywords, namely, butterflies, airplanes, giraffes, cups, and dogs, representing significant targets with pixel precision as the former datasets. The two datasets are public standard image databases and are widely used in salient target detection and image segmentation. A total of 300 background-complex pictures are selected from each dataset, under the same experimental conditions, and compared with six popular significant target detection methods. Results show that the problems presented by our method had a good solution. Even in a complex environment, the accuracy and recall rate of the algorithm are higher than those of state-of-the-art contrast algorithms. In MSRA10k datasets, the mean absolute error (MAE) value is 0.106; in THUR15K datasets, the mean absolute error value was reduced to 0.068, and the average structure (s) measure value was 0.844 9. The result of the MAE evaluation reflects the advantage of a saliency object detection method based on complex domain in terms of overall performance, and the s-measure indicates that the detected target is highly similar to the structure of the target of the ground truth graph. Conclusion Saliency object detection is a promising preprocessing operation in image processing and analysis. In this study, a new saliency object detection method based on complex domain is proposed. Multi-scale retinex algorithm in spatial domain can be used for pretreatment of images; it enhances contrast and prevents images from being affected by environmental factors. Hyper-complex Fourier transform in the frequency domain can suppress complex repetitive background regions, and the significant target detection method in the wavelet domain can completely describe the details of the target. Moreover, the proposed algorithm integrates the advantages of multiple domains and improves the accuracy while suppressing background clutter. Thus, the proposed algorithm is suitable for detecting significant target images, such as natural scenery, biology, architecture, and transportation. To improve the algorithm speed, our next research project aims to reduce the complexity of the algorithm by using the influence of wavelet transform function on time complexity.

salient objects detection; multi-scale Retinex enhancement algorithm; hyper complex Fourier transformation; wavelet transform; adaptive threshold selection method

# 1.1 空域MSR增强目标表示

MSR是一种图像增强算法，在Retinex的基础上提出的。Retinex是一种基于色彩理论的图像增强算法，人眼看到的图像是由自然界中的光照射在物体上，再由物体反射到人眼中所形成的景象。数学表达为

 $\mathit{\boldsymbol{I}} = \mathit{\boldsymbol{L}} * \mathit{\boldsymbol{R}}$ (1)

$\mathit{\boldsymbol{I}}$表示人眼观察到的图像，$\mathit{\boldsymbol{L}}$表示环境中光的照射分量，$\mathit{\boldsymbol{R}}$表示场景中物体的反射分量。

MSR算法在Retinex的基础上，引入多尺度概念，在动态缩放图片的过程中进行线性加权实现图像的恢复。由于光照的影响，场景中显著性目标与背景的边界模糊，对比度减弱。为了克服光照的影响，增强暗区域的图像细节，采用MSR算法实现图像色彩的恢复。物体的反射分量是其在不受环境影响的物体自然的状态，其数学表达式为

 $\begin{array}{*{20}{c}} {{{\log }_2}\left[ {R\left( {x,y} \right)} \right] = {{\log }_2}\left[ {R\left( {x,y} \right)} \right] + }\\ {w\left( i \right) \times \left( {{{\log }_2}\left[ {{I_i}\left( {x,y} \right)} \right] - {{\log }_2}\left[ {{L_i}\left( {x,y} \right)} \right]} \right)} \end{array}$ (2)

$w\left( i \right)$代表各个尺度下的权重值，权重值一般取等权重，权重和为1。

MSR算法对图像去雾、老照片、非均匀光照下的图片都有明显的效果。图 2为非均匀光照下MSR算法对图片的处理效果。由图 2可知，经过处理后的图像，亮度会更加贴切真实场景的亮度，前景与背景的对比度也会有明显的提高。

# 1.2 频域HFT抑制背景

 $\begin{array}{*{20}{c}} {\mathit{\boldsymbol{f}}\left( {n,m} \right) = {\varepsilon _1}{\mathit{\boldsymbol{F}}_1} + {\varepsilon _2}{\mathit{\boldsymbol{F}}_2}a + }\\ {{\varepsilon _3}{\mathit{\boldsymbol{F}}_3}b + {\varepsilon _4}{\mathit{\boldsymbol{F}}_4}c} \end{array}$ (3)

 ${\mathit{\boldsymbol{F}}_H}\left[ {u,v} \right] = \frac{1}{{\sqrt {MN} }}\sum\limits_{m = 0}^{M - 1} {\sum\limits_{n = 0}^{N - 1} {{{\rm{e}}^{ - \mu 2{\rm{ \mathsf{ π} }}\left( {\left( {\frac{{mv}}{M}} \right) + \left( {\frac{{mu}}{N}} \right)} \right)}}\mathit{\boldsymbol{f}}\left( {n,m} \right)} }$ (5)

 ${\mathit{\boldsymbol{F}}_H}\left[ {u,v} \right] = \left\| {{\mathit{\boldsymbol{F}}_H}\left[ {u,v} \right]} \right\|{{\rm{e}}^{\mu \varphi \left( {u,v} \right)}}$ (6)

 $\left\{ \begin{array}{l} \mathit{\boldsymbol{A}}\left( {u,v} \right) = \left\| {{\mathit{\boldsymbol{F}}_H}\left( {u,v} \right)} \right\|\\ \mathit{\boldsymbol{P}}\left( {u,v} \right) = \varphi \left( {u,v} \right) = {\tan ^{ - 1}}\frac{{\left\| {V\left( {{\mathit{\boldsymbol{F}}_H}\left( {u,v} \right)} \right)} \right\|}}{{S\left( {\mathit{\boldsymbol{F}}\left( {u,v} \right)} \right)}}\\ \mathit{\boldsymbol{X}}\left( {u,v} \right) = \mu \left( {u,v} \right) = \frac{{V\left( {{\mathit{\boldsymbol{F}}_H}\left( {u,v} \right)} \right)}}{{\left\| {V\left( {{\mathit{\boldsymbol{F}}_H}\left( {u,v} \right)} \right)} \right\|}} \end{array} \right.$ (7)

 $\mathit{\boldsymbol{H}}_i^c\left( x \right) = \mathit{\boldsymbol{W}}\left( {\mathit{\boldsymbol{I}}_i^c\left( x \right)} \right)$ (10)

$i$表示特征数目，$c$表示LAB 3个通道，$W$表示贝多西小波函数，$\mathit{\boldsymbol{H}}\left( x \right)$表示小波分解得到的多个特征。

 ${\mathit{\boldsymbol{f}}_i} = {\mathit{\boldsymbol{g}}_{m \times n}} * {\mathit{\boldsymbol{W}}^{ - 1}}\left[ {\mathit{\boldsymbol{H}}_i^c\left( x \right)} \right]$ (11)

 ${\mathit{\boldsymbol{f}}_{\rm{w}}} = \sum\limits_{i = 1}^n {{\omega _i} \times {\mathit{\boldsymbol{f}}_i} - \tau }$ (12)

${\omega _i}$表示特征对应的权重值，$\tau$表示调节系数，一般取值为0.20.5[19]

# 1.4 自适应阈值修正

 ${\mathit{\boldsymbol{f}}_{{\rm{fin}}}} = {\mathit{\boldsymbol{f}}_{\rm{w}}} * T\left( t \right)$ (13)

# 2.2 评价体系

Table 1 Status values

 检测结果 目标区域 背景区域 显著性区域 TP FP 非显著性区域 FN TN

 $\begin{array}{l} P = \frac{{TP}}{{TP + FP}}\\ R = \frac{{TP}}{{TP + FN}}\\ MAE = \frac{{FP + FN}}{M} \end{array}$ (14)

# 2.4.2 客观比较与分析

 $\begin{array}{l} S = \sigma \times {S_o} + \left( {1 - \sigma } \right) \times {S_r}\\ {S_r} = \sum\limits_{k = 1}^K {{w_k} \times s\left( k \right)} \\ {S_o} = \partial \times {O_{{\rm{FG}}}} + \left( {1 - \partial } \right) \times {O_{{\rm{BG}}}} \end{array}$ (15)

${S_r}$表示物体部分的相似性，${w_k}$表示每个块的权重值，$s\left( k \right)$表示结构相似性；${S_o}$表示面向物体的结构度量，${O_{{\rm{FG}}}}$表示显著图与标注图之间物体级别的相似性，${O_{{\rm{BG}}}}$表示背景的相似性，$\partial$表示标注图中前景区域与图像区域(宽×高)比值；$S$为结构相似性指标，$\sigma$属于0到1，本实验中取0.5。对每个数据集中选择的300幅图片计算所有方法的S-measure平均值，作S-measure的折线图如图 9(a)(b)所示。

