Current Issue Cover

李洋, 吴晓群(北京工商大学计算机学院)

摘 要
RGB-D图像包含丰富的多层特征,如底层的线特征、平面特征,高层的语义特征,面向RGB-D图像的多层特征提取结果可以作为先验知识提升室内场景重建、SLAM(Simultaneous Localization and Mapping)等多个任务的输出质量,是计算机图形学领域的热点研究内容之一。传统的多层特征提取算法一般利用RGB图像中丰富的颜色、纹理信息,深度图像中的几何信息提取多层特征,此类提取算法依赖输入RGB-D图像的质量,而受采集过程中环境和人为因素的影响,很难得到高质量的RGB-D图像。随着深度学习技术的快速发展,基于深度学习的多层特征提取算法突破了这一限制,涌现出一批高质量的研究成果。本文对面向RGB-D图像的多层特征提取算法进行了综述,首先汇总了现有的常用于多层特征提取任务的RGB-D数据集和相关算法的质量评价指标。然后,按照特征所处的不同层次,依次对线、平面、语义特征相关算法进行了总结。此外,文中对各算法的优缺点依次进行了比较并结合常用算法质量评价标准进行了定量分析。最后,本文讨论了当前多层特征提取算法亟待解决的问题并展望了未来发展的趋势。
A survey of multi-level feature extraction methods for RGB-D images

Li Yang, Wu Xiaoqun(School of Computer and Information Engineering,Beijing Technology and Business University,Beijing Key Laboratory of Big Data Technology for Food Safety,Beijing,100048)

RGB-D images contain rich multi-level features, such as low-level line features, planar features and high-level semantic features. These different levels of features provide valuable information for various computer vision tasks. By leveraging these multi-level features, computer vision algorithms can extract meaningful information from RGB-D images and improve the performance of various tasks, including object detection, tracking, indoor scene reconstruction, and more. Terms such as feature lines and contour lines can be used when describing line features that are present in a single RGB-D image. Line features provide crucial information about the spatial relationships and boundaries in the input image, aiding in the understanding and interpretation of input data. Plane and surface are used to describe planar features and those refer to flat or nearly flat regions in the RGB-D image. When describing an object, terms such as instance labels and semantic labels can be used. Instance labels refer to unique identifiers or labels assigned to individual instances or occurrences of objects in an image while semantic labels represent the broader class or category to which an object belongs. Semantic labels provide a higher-level understanding of the objects in the image, grouping them into meaningful categories that indicating the general type of object present. Traditional methods for extracting line features often make use of color, texture information of RGB image and geometric information in the depth image to extract feature lines and contour lines. When extracting planar features, it involves clustering to extract sets of points with similar properties, and further extracting planar features based on that. Semantic feature extraction aims to assign specific semantic categories to each pixel in the RGB-D input, and most of the methods used for this task are implemented based on deep learning. The multi-level feature extraction results for RGB-D images can be used as prior knowledge aids such as indoor scene reconstruction, scene understanding, object recognition and other tasks to improve the quality of network output. Besides, multi-level feature extraction for RGB-D images is also one of the hot topics in the field of computer graphics. With the development and popularization of commercial depth cameras, acquiring RGB-D data has become increasingly convenient. However, the quality of captured RGB-D data is often compromised by environmental and human factors during the acquisition process. This, to some extent, leads to issues such as noise and depth missing, which in turn negatively affects the quality of multi-level feature extraction results to some extent. These problems are detrimental to traditional methods, but the emergence of deep learning approaches has to a certain extent overcome these issues. With the rapid development of deep learning technology, a number of high-quality research results have emerged for multi-level feature extraction tasks based on deep learning. In this paper, we have summarized the commonly used RGB-D datasets for multi-level feature extraction tasks, such as NYU v2 and SUN RGB-D. These datasets contain diverse scene data, consisting of RGB images paired with corresponding depth images. Taking NYU v2 as an example, the dataset includes 1 499 RGB-D images, derived from 464 distinct indoor scenes across 26 scene classes. After introducing the datasets, this paper provides a summary of commonly used evaluation criteria for assessing the quality of line features, planar features, and semantic features. Detailed explanations are given for each evaluation criterion"s computation method. When reviewing line feature extraction methods, we presented a comprehensive summary based on both traditional and deep learning approaches. We provided detailed explanations of the principles, advantages, and limitations of different methods. Furthermore, we conducted quantitative comparisons of the extraction results from several different methods. When summarizing planar feature extraction methods, we provided a comprehensive overview from two perspectives: traditional planar feature extraction methods and deep learning-based planar feature extraction methods. We gathered relevant research papers and conducted a quality comparison of planar feature extraction methods. Additionally, we provided detailed explanations of the advantages and limitations of each method. In this article, we provided a comprehensive review of deep learning-based semantic feature extraction methods from two aspects: fully supervised learning based semantic feature extraction method and semi supervised learning based semantic feature extraction method. We summarized relevant research papers in the part. When comparing different semantic feature extraction methods, this paper used evaluation metrics such as pixel accuracy (PA), mean pixel accuracy (MPA) and mean intersection over union (mIoU) to measure the quality of the extraction algorithms. From the results of the quantitative comparisons, it can be observed that semantic feature extraction methods oriented towards RGB-D data exhibit superior extraction quality. These comparisons provide evidence that feature extraction methods designed specifically for RGB-D data are capable of achieving better results compared to methods that only utilize RGB data. The incorporation of depth information in RGB-D data allows for more accurate and robust extraction of semantic features, leading to enhanced performance in various tasks such as scene understanding and object recognition. Data annotation has indeed been a challenge for feature extraction methods based on deep learning. Annotating large-scale datasets requires significant time and human resources. To overcome these challenges, researchers have been actively seeking ways to reduce the workload of data annotation or make effective use of existing annotated data. Therefore, unsupervised learning, semi-supervised learning, and transfer learning are widely researched to leverage unlabeled or sparsely labeled data for feature extraction. Finally, we discuss the problems that need to be solved of the current multi-level feature extraction algorithm and looks forward to the future development trend at the end of this paper.