结合全卷积神经网络的室内场景分割
Indoor scene segmentation based on fully convolutional neural networks
- 2019年24卷第1期 页码:64-72
收稿:2018-06-05,
修回:2018-8-10,
纸质出版:2019-01-16
DOI: 10.11834/jig.180364
移动端阅览

浏览全部资源
扫码关注微信
收稿:2018-06-05,
修回:2018-8-10,
纸质出版:2019-01-16
移动端阅览
目的
2
视觉假体通过向盲人体内植入电极刺激视神经产生光幻视,盲人所能感受到的物体只是大体轮廓,对物体识别率低,针对视觉假体中室内应用场景的特点,提出一种快速卷积神经网络图像分割方法对室内场景图像进行分割,通过图像分割技术把物品大致的位置和轮廓显示出来,辅助盲人识别。
方法
2
构建了用于室内场景图像分割的FFCN(fast fully convolutional networks)网络,通过层间融合的方法,避免连续卷积对图像特征信息的损失。为了验证网络的有效性,创建了室内环境中的基本生活物品数据集(以下简称XAUT数据集),在原图上通过灰度标记每个物品的类别,然后附加一张颜色表把灰度图映射成伪彩色图作为语义标签。采用XAUT数据集在Caffe(convolutional architecture for fast feature embedding)框架下对FFCN网络进行训练,得到适应于盲人视觉假体的室内场景分割模型。同时,为了对比模型的有效性,对传统的多尺度融合方法FCN-8s、FCN-16s、FCN-32s等进行结构微调,并采用该数据集进行训练得到适用于室内场景分割的相应算法模型。
结果
2
各类网络的像素识别精度都达到了85%以上,均交并比(MIU)均达到60%以上,其中FCN-8s at-once网络的均交并比最高,达到70.4%,但其分割速度仅为FFCN的1/5。在其他各类指标相差不大的前提下,FFCN快速分割卷积神经网络上平均分割速度达到40帧/s。
结论
2
本文提出的FFCN卷积神经网络可以有效利用多层卷积提取图像信息,避免亮度、颜色、纹理等底层信息的影响,通过尺度融合技术可以很好地避免图像特征信息在网络卷积和池化中的损失,相比于其他FCN网络具有更快的速度,有利于提高图像预处理的实时性。
Objective
2
Vision is one of the most important ways by which humans obtain information. Visual prosthesis refers to the process where electrodes are implanted into a blind body to stimulate the optic nerve
such that the blind can see hallucinations. Therefore
the objects felt by the blind are only the general features
such as low resolution and poor linearity. In some cases
the blind can hardly distinguish optical illusions. Before the electrodes were stimulated
image segmentation was adopted to display the general position and outline of objects to help blind people clearly recognize every familiar object. The image fast segmentation of the convolution neural network was proposed to segment the indoor scene of visual prosthesis in terms of its application features.
Method
2
According to the demand of visual prosthesis for real-time image processing
the fast fully convolutional network (FFCN) network structure proposed in this paper was improved on the AlexNet classification network structure. The network reduced the error rate of top five in the ImageNet dataset to 16.4%
which was better than the 26.2% of the second. The AlexNet uses the convolution layer to extract deep feature information
adds the structure of the overlapping pool layer to reduce the parameters that must be learned
and defines the Relu activation function to solve the gradient diffusion of the Sigmod function in deeper networks. In contrast to other networks
it presents characteristics such as light weight and fast training speed. First
the FFCN for image segmentation in the indoor scene was constructed. It was composed of five convolution layers and one deconvolution layer. The loss produced by the continuous convolution in the picture feature information was avoided by scale fusion. To verify the effectiveness of the network
a dataset of basic items that can be touched by the blind in an indoor environment was created. The dataset was divided into nine categories and included 664 items
such as beds
seats
lamps
televisions
cupboards
cups
and people (XAUT dataset). The type of each item was marked by grayscale in the original image
and a color table was added to map the gray image into pseudo-color map as the semantic label. The XAUT dataset was used to train the FFCN network under the Caffe framework
and the image features were extracted using the deep learning feature and scale fusion of the convolution neural network to obtain the segmentation model in the indoor scene for adapting to the visual prosthesis for the blind. To assess the validity of the model
the fine adjustment of traditional models
including FCN-8s
FCN-16s
FCN-32s
and FCN-8s at-once
was examined. The dataset was used to obtain the corresponding segmentation model in the indoor scene for adapting to the visual prosthesis for the blind.
Results
2
A comparative experiment was conducted on the Ubuntu16.04 version of the Amax Sever environment. The training time of the model lasted for 13 h
and a training model was saved every 4 000 iterations. The tests are tested at 4 000
12 000
36 000
and 80 000 iterations. The pixel recognition accuracy of all kinds of networks exceeded 85%
and the mean IU was above 60%. The FCN-8s at-once network had the highest mean IU (70.4%)
but its segmentation speed was only one-fifth of that of the FFCN. Under the assumption that the other indicators differed insignificantly
the average segmentation speed of the FFCN reached 40 frame/s.
Conclusion
2
The FFCN can effectively use multi-layer convolution to extract picture information and avoid the influences of the underlying information
such as brightness
color
and texture. Moreover
it can avoid the loss of image feature information in the network convolution and pool through scale fusion. Compared with other FCN networks
the FFCN has a faster speed and can improve the real-time image preprocessing.
Yang Y, Quan N N, Bu J J, et al. A soft decoding algorithm and hardware implementation for the visual prosthesis based on high order soft demodulation[J]. BioMedical Engineering Online, 2016, 15:110.[DOI:10.1186/s12938-016-0229-3]
Guo F, Yang Y, Gao Y. Optimization of Visual Information Presentation for Visual Prosthesis[J]. International Journal of Biomedical Imaging, 2018, 2018:#3198342.[DOI:10.1155/2018/3198342]
Jiang F, Gu Q, Hao H Z, et al. Survey on content-based image segmentation methods[J]. Journal of Software, 2017, 28(1):160-183.
姜枫, 顾庆, 郝慧珍, 等.基于内容的图像分割方法综述[J].软件学报, 2017, 28(1):160-183.][DOI:10.13328/j.cnki.jos.005136]
Jung H, Choi M K, Soon K, et al. End-to-end pedestrian collision warning system based on a convolutional neural network with semantic segmentation[C]//Proceedings of 2018 IEEE International Conference on Consumer Electronics. Las Vegas, USA: IEEE, 2018: 1-3.[ DOI: 10.1109/ICCE.2018.8326129 http://dx.doi.org/10.1109/ICCE.2018.8326129 ]
Shi J B, Malik J. Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(8):888-905.[DOI:10.1109/34.868688]
Boykov Y, Funka-Lea G. Graph cuts and efficient N-D image segmentation[J]. International Journal of Computer Vision, 2006, 70(2):109-131.[DOI:10.1007/s11263-006-7934-5]
Fukunaga K, Hostetler L D. The estimation of the gradient of a density function, with applications in pattern recognition[J]. IEEE Transactions on Information Theory, 1975, 21(1):32-40.[DOI:10.1109/tit.1975.1055330]
Achanta R, Shaji A, Smith K, et al. SLIC superpixels compared to state-of-the-art superpixel methods[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012, 34(11):2274-2282.[DOI:10.1109/tpami.2012.120]
Li Z Q, Chen J S. Superpixel segmentation using Linear Spectral Clustering[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 1356-1363.[ DOI: 10.1109/cvpr.2015.7298741 http://dx.doi.org/10.1109/cvpr.2015.7298741 ]
Liu Y, Liu J, Li Z C, et al. Weakly-supervised dual clustering for image semantic segmentation[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013: 2075-2082.[ DOI: 10.1109/CVPR.2013.270 http://dx.doi.org/10.1109/CVPR.2013.270 ]
Li Z C, Tang J H. Weakly supervised deep matrix factorization for social image understanding[J]. IEEE Transactions on Image Processing, 2017, 26(1):276-288.[DOI:10.1109/TIP.2016.2624140]
Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference onComputer Vision. Santiago, Chile: IEEE, 2015: 1440-1448.[ DOI: 10.1109/iccv.2015.169 http://dx.doi.org/10.1109/iccv.2015.169 ]
Kido S, Hirano Y, Hashimoto N. Detection and classification of lung abnormalities by use of convolutional neural network (CNN) and regions with CNN features (R-CNN)[C]//Proceedings of 2018 International Workshop on Advanced Image Technology. Chiang Mai, Thailand: IEEE, 2018: 1-4.[ DOI: 10.1109/IWAIT.2018.8369798 http://dx.doi.org/10.1109/IWAIT.2018.8369798 ]
Yuan Y D, Chao M, Lo Y C. Automatic skin lesion segmentation using deep fully convolutional networks with jaccard distance[J]. IEEE Transactions on Medical Imaging, 2017, 36(9):1876-1886.[DOI:10.1109/TMI.2017.2695227]
Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2107, 39(4):640-651.[DOI:10.1109/TPAMI.2016.2572683]
Qassim H, Verma A, Feinzimer D. Compressed residual-VGG16 CNN model for big data places image recognition[C]//Proceedings of 2018 IEEE 8th Annual Computing and Communication Workshop and Conference. Las Vegas, NV, USA: IEEE, 2018: 169-175.[ DOI: 10.1109/CCWC.2018.8301729 http://dx.doi.org/10.1109/CCWC.2018.8301729 ]
Cheng P M, Malhi H S. Transfer learning with convolutional neural networks for classification of abdominal ultrasound images[J]. Journal of Digital Imaging, 2017, 30(2):234-243.[DOI:10.1007/s10278-016-9929-2]
Mun J, Jang W D, Sung D J, et al. Comparison of objective functions in CNN-based prostate magnetic resonance image segmentation[C]//Proceedings of 2017 IEEE International Conference on Image Processing. Beijing, China: IEEE, 2017: 3859-3863.[ DOI: 10.1109/ICIP.2017.8297005 http://dx.doi.org/10.1109/ICIP.2017.8297005 ]
Xu P Y, Sarikaya R. Convolutional neural network based triangular CRF for joint intent detection and slot filling[C]//Proceedings of 2013 IEEE Workshop on Automatic Speech Recognition and Understanding. Olomouc, Czech Republic: IEEE, 2013: 78-83.[ DOI: 10.1109/ASRU.2013.6707709 http://dx.doi.org/10.1109/ASRU.2013.6707709 ]
Chen L C, Papandreou G, Kokkinos I, et al. DeepLab:semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 40(4):834-848.[DOI:10.1109/TPAMI.2017.2699184]
相关作者
相关机构
京公网安备11010802024621