严毅1, 邓超1, 李琳2, 朱凌坤3, 叶彪2(1.武汉科技大学汽车与交通工程学院, 武汉 430063;2.武汉科技大学计算机科学与技术学院, 武汉 430063;3.武汉理工大学交通与物流工程学院, 武汉 430063)
语义分割任务是很多计算机视觉任务的前提与基础，在虚拟现实、无人驾驶等领域具有重要的应用价值。随着深度学习技术的快速发展，尤其是卷积神经网络（convolutional neural network，CNN）的出现，使得图像语义分割取得了长足的进步。首先，本文介绍了语义分割概念、相关背景和语义分割基本处理流程。然后，总结开源的2D、2.5D、3D数据集和其相适应的分割方法，详细描述了不同网络的分割特点、优缺点及分割精确度，得出监督学习是有效的训练方式。同时，介绍了权威的算法性能评价指标，根据不同方法的侧重点，对各个分割方法的相关实验进行了对比分析，指出了目前实验方面整体存在的问题，其中，DeepLab-V3+网络在分割精确度和速度方面都具有良好的性能，应用价值较高。在此基础上，本文针对国内外的研究现状，提出了当前面临的几点挑战和未来可能的研究方向。通过总结与分析，能够为相关研究人员进行图像语义分割相关研究提供参考。
Survey of image semantic segmentation methods in the deep learning era
Yan Yi1, Deng Chao1, Li Lin2, Zhu Lingkun3, Ye Biao2(1.School of Automobile and Traffic Engineering, Wuhan University of Science and Technology, Wuhan 430063, China;2.School of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan 430063, China;3.School of Transportation and Logistics Engineering, Wuhan University of Technology, Wuhan 430063, China)
Introduced by Ohta in 1980，image semantic segmentation assigns each pixel in an image with a pre-defined label that represents its semantic category. Aiming to understand the different scenes of images，image semantic segmentation has received much research attention in the field of computer vision. In recent years，many research laboratories around the world have carried out research work on image semantic segmentation based on deep learning. Academic conferences in the fields of automation，artificial intelligence，and pattern recognition also reported research results on semantic segmentation. At the same time，semantic segmentation serves as the premise and basis of many computer vision tasks and has important application value in virtual reality，such as automatic driving and human-computer interaction. With the rapid development of deep learning technology，especially the emergence of convolutional neural networks，image semantic segmentation technology has made great progress and has far outperformed traditional methods in terms of accuracy and efficiency. First，this paper introduces the concept of semantic segmentation along with its background and basic process. In general，image semantic segmentation based on deep learning goes through three processing modules，namely，the feature extraction，semantic segmentation，and refinement processing modules. Second，this paper summarizes the open source 2D，RGB-D，and 3D datasets that have been used in recent years and their corresponding segmentation methods. The semantic segmentation methods for 2D data are divided into method based on candidate region，method based on fully supervised learning，and method based on weakly supervised learning. As RGB-D and 3D date，only a few semantic segmentation methods need to be classified，thus no further classification is performed. This paper describes in detail the network structure of several classical algorithms，the segmentation characteristics，advantages，and disadvantages of different networks，and their segmentation accuracy. Through this summary，this study reveals that most segmentation methods are based on fully supervised learning，which is an effective training method. Third，this paper introduces several authoritative performance evaluation indexes of algorithms，such as mean average precision（mAP）and mean intersection over union （mIoU），and tests the segmentation accuracy and computing performance of the semantic segmentation method when applied in 2D-data-related experiments. The Experimental section shows that the DeepLab-V3+ network has good segmentation accuracy and speed，which attest to its high application value. The semantic segmentation performance for 2. 5D and 3D data is also compared. The following key problems are highlighted in this section：some algorithms are not tested on authoritative datasets；some algorithms are not open source；and some experiments do not describe the relevant experimental parameters in detail. Therefore，considering the current situation of research at home and abroad，this paper highlights several challenges and proposes some new directions for future research. First，segmentation algorithms tend to prioritize either accuracy or real time while ignoring the other. Second，a segmented network usually needs large amounts of memory to realize reasoning and training，hence making it unsuitable for some devices. Third，the design of the segmentation algorithm adapted to 3D data is a current research focus，but high-quality 3D datasets are generally lacking，and the existing 3D datasets are patchwork datasets. Fourth，only a few segmentation algorithms are available for RGB-D and 3D data（particularly for 3D data），and open-source algorithms generally have low accuracy. Fifth，sequence data have temporal consistency. Sixth，some methods solve the problem of video or sequence segmentation，while others do not use time series information to improve accuracy or segmentation efficiency. Seventh，some papers have proposed that face detection can be realized without training deep neural network and examined whether semantic segmentation can be realized without a training network. Through summary and analysis，this paper hopes to provide some valuable reference for future research on image semantic segmentation.