多尺度显著区域检测图像压缩
Image compression method based on multi-scale saliency region detection
- 2020年25卷第1期 页码:31-42
收稿日期:2019-05-07,
修回日期:2019-06-26,
录用日期:2019-7-3,
纸质出版日期:2020-01-16
DOI: 10.11834/jig.190168
移动端阅览
浏览全部资源
扫码关注微信
收稿日期:2019-05-07,
修回日期:2019-06-26,
录用日期:2019-7-3,
纸质出版日期:2020-01-16
移动端阅览
目的
2
为了解决利用显著区域进行图像压缩已有方法中存在的对多目标的图像内容不能有效感知,从而影响重建图像的质量问题,提出一种基于多尺度深度特征显著区域检测图像压缩方法。
方法
2
利用改进的卷积神经网络(CNNs),进行多尺度图像深度特征检测,得到不同尺度显著区域;然后根据输入图像尺寸自适应调整显著区域图的尺寸,同时引入高斯函数,对显著区域进行滤波,得到多尺度融合显著区域;最后结合编码压缩技术,对显著区域实行近无损压缩,非显著区域利用有损编码技术进行有损压缩,完成图像的压缩和重建工作。
结果
2
提出的图像压缩方法较JPEG压缩方法,编码码率为0.39 bit/像素左右时,在数据集Kodak PhotoCD上,峰值信噪比(PSNR)提高了2.23 dB,结构相似性(SSIM)提高了0.024;在数据集Pascal Voc上,PSNR和SSIM两个指标分别提高了1.63 dB和0.039。同时,将提出的多尺度特征显著区域方法结合多级树集合分裂(SPIHT)和游程编码(RLE)压缩技术,在Kodak数据集上,PSNR分别提高了1.85 dB、1.98 dB,SSIM分别提高了0.006、0.023。
结论
2
提出的利用多尺度深度特征进行图像压缩方法得到了较传统编码技术更好的结果,该方法通过有效地进行图像内容的感知,使得在图像压缩过程中,减少了图像内容损失,从而提高了压缩后重建图像的质量。
Objective
2
Image compression
which aims to remove redundant information in an image
is a popular issue in image processing and computer vision. In recent years
image compression based on deep learning has attracted much attention of scholars in the field of image processing. Image compression using convolutional neural networks (CNNs) can be roughly divided into two categories. One is the image compression method based on the end-to-end convolutional network. The other category is CNNs combined with the traditional image compression method
which uses CNNs to deeply perceive the image content and obtains salient regions. High-quality coding is then applied to the salient regions
and lower-quality coding is used for non-significant regions to improve the visual quality of the compressed reconstructed images. However
in the latter method
the quality of the reconstructed image is often considerably affected because there is no effective perception of the image content information. In view of the effectiveness of image content perception
the influence of scale on image content detection is disregarded in several conventionally proposed salient region detection methods. Furthermore
the difference in size between the input image and the output saliency map is not considered
which limits the model's perception domain to the image. Consequently
several salient objects in the original image cannot be effectively perceived
which affects the reconstructed image's quality in the subsequent compression. A novel image compression method based on multi-scale depth feature salient region (MS-DFSR) detection is proposed in the current study to deal with this problem.
Method
2
Improved CNNs are used to detect the depth features of multi-scale images. For multi-scale images
with the help of the scale space concept
a plurality of saliency maps is generated by inputting an image into the MS-DFSR model using a pyramid structure to complete the detection of multi-scale saliency regions. Scale selection
in the presence of an extremely large scale
causes the resulting salient area to become too divergent and loses salient meaning. Therefore
two scales are used in this work. The first one is the standard output scale of the network
and the second scale is the larger scale adopted in this work. The latter scale is used to effectively detect multiple salient objects in an image and perceive the image content effectively. For depth features' salient region detection
we replace the fully connected layer and the fourth max pooling layer with a global average pooling layer and an avg pooling layer in order to retain spatial location information on multiple salient objects in an image as much as possible. Then
the salient areas of different scales that are detected by MS-DFSR are obtained. To increase the perceived domain of an image and the perceived image content effectively
the size of the salient region map is adaptively adjusted according to the size of the input image by considering the difference between the input and output salient image sizes. Meanwhile
a Gaussian function is introduced to filter the salient region
retain the original image content information
and obtain a multi-scale fusion saliency region map. Finally
we complete image compression and reconstruction by combining the obtained multi-scale saliency region map with image coding methods. To protect the image's salient content and improve the reconstructed image's quality
the salient regions of an image are compressed using near-lossless and lossy compression methods
such as joint photographic experts' group (JPEG) and set partitioning in hierarchical trees (SPIHT)
on the non-salient regions.
Result
2
We compare our model with three traditional compression methods
namely
JPEG
SPIHT
and run-length encoding (RLE) compression techniques. The experimental datasets include two public datasets
namely
Kodak PhotoCD and Pascal Voc. The quantitative evaluation metrics (higher is better) include the peak signal-to-noise ratio (PSNR)
the structural similarity index measure (SSIM)
and a modified PSNR metric based on HVS (PSNR-HVS). Experiment results show that our model outperforms all the other traditional methods on the Kodak PhotoCD and Pascal Voc datasets. The saliency map shows that our model can produce results that cover multiple salient objects and improve the effective perception of image content. We compare the image compression method based on MS-DFSR detection with the image compression method based on single-scale depth feature salient region (SS-DFSR) detection
and the validity of the MS-DFSR detection model is verified. Comparative experiments demonstrate that the proposed compression method improves image compression quality. The quality of the image reconstructed using the proposed compression method is higher than that using the JPEG image compression method. When the code rate is approximately 0.39 bpp on the Kodak PhotoCD dataset
PSNR is improved by 2.23 dB
SSIM by 0.024
and PSNR-HVS by 2.07. On the Pascal Voc dataset
PSNR
SSIM
and PSNR-HVS increase by 1.63 dB
0.039
and 1.57
respectively. At the same time
when MS-DFSR is combined with SPIHT and RLE compression technology on the Kodak PhotoCD dataset
PSNR is increased by 1.85 dB and 1.98 dB
respectively. SSIM is improved by 0.006 and 0.023
respectively
and PSNR-HVS is increaseal by 1.90 and 1.88
respectively.
Conclusion
2
The proposed image compression method using multi-scale depth features exhibits better performance than traditional image compression methods because the proposed method effectively reduces image content loss by improving the effectiveness of image content perception during the image compression process. Consequently
the quality of the reconstructed image can be improved significantly.
Cheng M M, Zhang G X, Mitra N J, Huang X L and Hu S M. 2011, Global contrast based salient region detection//Proceedings of CVPR 2011. Colorado Springs, CO, USA: IEEE: 409-416[ DOI: 10.1109/CVPR.2011.5995344 http://dx.doi.org/10.1109/CVPR.2011.5995344 ]
Cui L L, Xu J L, Xu G and Wu Q. 2018. Image saliency detection method based on a pair of feature maps. Journal of Image and Graphics, 23(4):583-594
崔玲玲, 许金兰, 徐岗, 吴卿. 2018.融合双特征图信息的图像显著性检测方法.中国图象图形学报, 23(4):583-594
Egiazarian K, Astola J, Ponomarenko N, Lukin V, Battisti F and Caril M. 2006. A new full-reference quality metrics based on HVS//Proceedings of the 2nd International Workshop on Video Processing and Quality Metrics. Scottsdale, USA: CD-ROM.
Zünd F, Pritch Y, Sorkine-Hornung A, Mangold S and Gross T. 2013. Content-aware compression using saliency-driven image retargeting//Proceedings of 2013 IEEE International Conference on Image Processing. Melbourne, VIC, Australia: IEEE: 1845-1849[ DOI: 10.1109/ICIP.2013.6738380 http://dx.doi.org/10.1109/ICIP.2013.6738380 ]
Fang Z, Cao T Y, Hong S Z and Xiang S K. 2018. Saliency detection via fusion of deep model and traditional model. Journal of Image and Graphics, 23(12):1864-1873
方正, 曹铁勇, 洪施展, 项圣凯. 2018.融合深度模型和传统模型的显著性检测.中国图象图形学报, 2018, 23(12):1864-1873[DOI:10.11834/jig.180073]
Islam A, Kalash M and Bruce N D B. 2018. Revisiting salient object detection: simultaneous detection, ranking, and subitizing of multiple salient objects//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 7142-7150[ DOI: 10.1109/CVPR.2018.00746 http://dx.doi.org/10.1109/CVPR.2018.00746 ]
Itti L, Koch C and Niebur E. 1998. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254-1259[DOI:10.1109/34.730558]
Johnston N, Vincent D, Minnen D, Covell M, Singh S, Chinen T, Hwang S J, Shor J and Toderici G. 2018. Improved lossy image compression with priming and spatially adaptive bit rates for recurrent networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 4385-4393[ DOI: 10.1109/C-VPR.2018.00461 http://dx.doi.org/10.1109/C-VPR.2018.00461 ]
Li M, Zuo W M, Gu S H, Zhao D B and Zhang D. 2018. Learning convolutional networks for content-weighted image compression//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 3214-3223[ DOI: 10.1109/CVPR.2018.00339 http://dx.doi.org/10.1109/CVPR.2018.00339 ]
Mnih V, Heess N, Graves A and Kavukcuoglu K. 2014. Recurrent models of visual attention//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: ACM: 2204-2212.
Prakash A, Moran N, Garber S, Dilillo A and Storer J. 2017. Semantic perceptual image compression using deep convolution Networks//Proceedings of 2017 Data Compression Conference. Snowbird, UT, USA: IEEE, 250-259[ DOI: 10.1109/DCC.2017.56 http://dx.doi.org/10.1109/DCC.2017.56 ]
Said A and Pearlman W A. 1996. A new, fast, and efficient image codec based on set partitioning in hierarchical trees. IEEE Transactions on Circuits and Systems for Video Technology, 6(3):243-250[DOI:10.1109/76.499834]
Shi C P. 2016. Research on Hierarchical Compression Method of Optical Remote Sensing Images. Harbin: Harbin Institute of Technology http://cdmd.cnki.com.cn/Article/CDMD-10213-1017862244.htm .
石翠萍. 2016.光学遥感图像分级压缩方法研究.哈尔滨: 哈尔滨工业大学
Shapiro J M. 1993. Embedded image coding using zerotrees of wavelet coefficients. IEEE Transactions on Signal Processing, 41(12):3445-3462[DOI:10.1109/78.258085]
Toderici G, Vincent D, Johnston N, Hwang S J, Minnen D, Shor J and Covell M. 2017. Full resolution image compression with recurrent neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 5435-5443[ DOI: 10.1109/CVPR.2017.577 http://dx.doi.org/10.1109/CVPR.2017.577 ]
Wang W G, Shen J B, Dong X P and Borji A. 2018. Salient object detection driven by fixation prediction//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE[ DOI: 10.1109/CVPR.2018.00184 http://dx.doi.org/10.1109/CVPR.2018.00184 ]
Wang X H and Song C M. 2009. Image and Video Scalable Coding. Beijing:Science Press
王相海, 宋传鸣. 2009.图像及视频可分级编码.北京:科学出版社
Xia Q, Li S, Hao A M and Zhao Q P. 2019. Deep learning for digital geometry processing and analysis:a review. Journal of Computer Research and Development, 56(1):155-182
夏清, 李帅, 郝爱民, 赵沁平. 2019.基于深度学习的数字几何处理与分析技术研究进展.计算机研究与发展, 56(1):155-182[DOI:10.7544/issn1000-1239.2019.20180709]
Zhang J, Huang Y J, Dai K X and Li G H. 2009. Decomposing SAR image and protecting target region for compression. Journal of Image and Graphics, 14(1):3-7
张军, 黄英君, 代科学, 李国辉. 2009.图像分解和区域保护在SAR图像压缩中的应用.中国图象图形学报, 14(1):3-7[DOI:10.11834/jig.20090101]
Zhou S J, Ren F J, Du J and Yang S. 2017. Salient region detection based on the integration of background-bias prior and center-bias prior. Journal of Image and Graphics, 22(5):584-595
周帅骏, 任福继, 堵俊, 杨赛. 2017.融合背景先验与中心先验的显著性目标检测.中国图象图形学报, 22(5):584-595[DOI:10.11834/jig.160387]
Zhang X N, Wang T T, Qi J Q, Lu H C and Wang G. 2018. Progressive attention guided recurrent network for salient object detection//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, UT, USA: IEEE: 714-722[ DOI: 10.1109/CVPR.2018.00081 http://dx.doi.org/10.1109/CVPR.2018.00081 ]
Zhou B L, Khosla A, Lapedriza A, Oliva A and Torralba A. 2016. Learning deep features for discriminative localization//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 2921-2929[ DOI: 10.1109/CVPR.2016.319 http://dx.doi.org/10.1109/CVPR.2016.319 ]
相关作者
相关机构