Current Issue Cover
基于深度学习的实时语义分割综述

王卓, 瞿绍军(湖南师范大学)

摘 要
语义分割作为计算机视觉领域的重要研究方向之一,应用十分广泛,其目的是根据预先定义好的类别对输入图像进行像素级别的分类,实时语义分割则在一般语义分割的基础上又增加了对速度的要求,被广泛应用于如无人驾驶、医学图像分析、视频监控与航拍图像等领域。其要求分割方法不仅要取得较高的分割精度,且分割速度也要快。随着深度学习和神经网络的快速发展,实时语义分割也取得了一定的研究成果。本文在前人已有工作的基础上对基于深度学习的实时语义分割算法进行系统地归纳总结,特别是最新的基于transformer和剪枝的方法,全面介绍实时语义分割方法在各领域中的应用。本文首先介绍实时语义分割的概念,再根据标签的数量和质量,将现有的基于深度学习的实时语义分割方法分为强监督学习、弱监督学习和无监督学习三个类别;在分类的基础上,结合各个类别中最具有代表性的方法,对其优缺点展开分析,并从多个角度进行比较。随后介绍目前实时语义分割常用的数据集和评价指标,并对比分析各算法在各数据集上的实验效果。阐述现阶段实时语义分割的应用场景。最后,讨论了基于深度学习的实时语义分割存在的挑战,并对实时语义分割未来值得研究的方向进行展望,为研究者们解决存在的问题提供便利。
关键词
A review of real-time semantic segmentation based on deep learning

wangzhuo, Changsha(hunan normal university)

Abstract
Semantic segmentation is widely used as one of the important research directions in the field of computer vision, and its purpose is to classify the input image at the pixel level according to predefined categories. Real-time semantic segmentation, as a subfield of semantic segmentation, adds speed requirements to segmentation methods on the basis of general semantic segmentation, and is widely used in fields such as unmanned driving, medical image analysis, video surveillance and aerial images. It requires that the segmentation method should not only achieve high segmentation accuracy, but also fast segmentation speed (specifically, the speed of processing images per unit time reaches 30 frames). With the rapid development of deep learning technology and neural networks, real-time semantic segmentation has also achieved certain research results. Most of the previous researchers have discussed semantic segmentation, and there are fewer review papers on real-time semantic segmentation methods. In this paper, we systematically summarize the real-time semantic segmentation algorithms based on deep learning on the basis of the existing work of the previous researchers, and we will firstly introduce the concept of real-time semantic segmentation, and then, according to the number and quality of the participating training labels, The existing real-time semantic segmentation methods based on deep learning are categorized into three classes: strongly supervised learning, weakly supervised learning and unsupervised learning. For strongly supervised learning methods, this paper categorizes them from three perspectives: improving accuracy, improving speed and other methods. The accuracy improvement methods are further divided into subcategories according to the network structure and feature fusion methods. According to the network structure, the real-time semantic segmentation methods can be categorized into: encoder-decoder structure, two-branc structure, and multi-branch structure; the representative networks in the encoder-decoder section are FCN and UNet; the networks with two-branch structure are BiSeNet series; and the multi-branch structure has ICNet and DFANet. According to the different ways of feature fusion, real-time semantic segmentation methods can be categorized into multi-scale feature fusion and attention mechanism. According to the different ways of feature sampling in the process of multiscale feature fusion, this paper divides multiscale feature fusion into atrous spatial pyramid pooling and ordinary pyramid pooling; the attention mechanism can be further divided into: self-attention mechanism, channel attention, and spatial attention according to the computation method of the attention vector. The methods to improve the speed are analyzed and discussed from the perspectives of improving convolutional blocks and lightweight networks; the methods to improve convolutional blocks can be divided into separable convolution (separable convolution can be divided into depth separable convolution and spatial separable convolution), grouped convolution, and atrous convolution. Among other methods of strongly supervised learning, we also specifically add methods of knowledge distillation, transformer-based methods and pruning, which are less mentioned in other literatures. Due to the large number of methods for real-time semantic segmentation based on strongly supervised learning, we also perform a comparative analysis of the strengths and weaknesses of all the mentioned methods. In real-time semantic segmentation based on weakly supervised learning, this paper classifies them into methods based on image-level labeling, methods based on point labeling, methods based on object box labeling, and methods based on object underlining labeling. For unsupervised learning, this paper introduces its concept and describes the commonly used unsupervised semantic segmentation methods at the present stage, including the method with the introduction of the generalized domain adaptation problem and the method with the introduction of unsupervised pre-adaptation task. Subsequently, the data sets and evaluation indexes commonly used in real-time semantic segmentation are introduced, in addition to the street scene data set commonly used in unmanned counting, this paper also supplements the medical image data set; in the evaluation indexes, this paper gives a detailed introduction to the accuracy measure and speed measure, and then compares the experimental effects of the algorithms on the data sets so far through the table, so as to obtain the latest research progress in the field. The application scenarios of real-time semantic segmentation are further elaborated in detail. Real-time semantic segmentation can be applied to automatic driving, which can segment road scene images in a short time to help identify roads, traffic signs, pedestrians, vehicles and other objects; by segmenting medical images at the pixel level, real-time semantic segmentation can also help doctors identify and localize lesion areas more accurately; in the field of natural disaster monitoring and emergency rescue, real-time semantic segmentation can quickly identify airplanes and aircrafts, and can also help doctors identify and locate lesion areas more accurately. Real-time semantic segmentation can quickly recognize disaster areas in aerial images; by real-time segmentation of scenes and objects in surveillance videos, it can provide more accurate and intelligent data for surveillance systems. Then, according to the specific application scenarios of real-time semantic segmentation and the problems encountered at this stage, this paper considers that the challenges faced by real-time semantic segmentation include: (1) The mobile segmentation problem which is difficult to develop large-scale computation on low storage devices. (2) The problem of how to get away from the dependence of efficient networks on hardware devices (3) The problem that the experimental accuracy of the current real-time semantic segmentation model is still difficult to reach the standard of automatic driving. (4) The problem of lack of scene data for medical image and 3D point cloud design. Finally, this paper gives an outlook on the future directions of real-time semantic segmentation that are worth researching, which are: occlusion segmentation, real-time semantic segmentation of small targets, adaptive learning model, cross-modal joint learning, data-centered real-time semantic segmentation, and small-sample real-time semantic segmentation.
Keywords

订阅号|日报