Current Issue Cover
形状的不变量特征提取与识别

徐浩然1, 杨剑宇1, 黄伟国1, 尚丽2(1.苏州大学城市轨道交通学院, 苏州 215000;2.苏州市职业大学, 苏州 215000)

摘 要
目的 形状作为图像检索、目标识别等任务中的一种重要线索,一直是计算机视觉领域研究的重点课题。形状识别在实际应用中经常受到视角变化、非线性形变等因素的干扰,导致识别精度较低。针对这一状况,提出一种多尺度的不变量形状描述。方法 方法首先在多个尺度下对形状轮廓进行计算,提取5种不变量特征,以构建对形状的有效描述,然后利用动态时间规整(DTW)算法对形状描述进行匹配,计算形状之间的相似度,以完成形状的匹配与识别。结果 基于不变量多尺度的形状描述对于旋转、缩放、局部遮挡、铰接形变、类内差异,以及噪声等干扰具有很强的鲁棒性。同时,方法被用于对MPEG-7、Kimia99、Kimia216以及铰接形状数据库中的形状进行识别,取得了较高的识别精度,分别为91.79%、95.27%、91.33%,以及89.75%。此外,在MPEG-7数据库中进行形状识别的平均耗时为65 ms,优于大多数同类方法。结论 提出了一种基于不变量多尺度的形状描述方法。该方法能提取形状在不同尺度下的多种不变量特征,对形状进行有效描述,提高了形状描述对几何变换和非线性形变等干扰的鲁棒性以及形状匹配识别精度,适用于大多数应用场景下的目标识别任务。尤其是在旋转、缩放、类内差异、局部遮挡和铰接变形等干扰存在的情况下也能保持较高的识别正确率。
关键词
Invariant feature extraction and recognition for shapes

Xu Haoran1, Yang Jianyu1, Huang Weiguo1, Shang Li2(1.School of Urban Rail Transportation, Soochow University, Suzhou 215000, China;2.Suzhou Vocational University, Suzhou 215000, China)

Abstract
Objective The shape of object contour is an important indication for image retrieval and object recognition;it is usually represented by a binary image.Although the binary images of objects have few features, such as color or texture, human can still recognize them only by shapes.By contrast, the shapes of objects cannot be recognized by computer directly.In recent years, shape retrieval and recognition have been fundamental topics in computer vision and have been widely studied for various applications, such as character recognition, biomedical image analysis, hand gesture recognition, robot navigation, and human gait recognition.To extract salient features for the representative characterization of a shape, many shape descriptors have been proposed and have reported promising results.However, the influences of viewpoint variations and nonlinear deformations, such as significant intra-class differences, geometric transformations, and partial occlusions, are challenging problems that decrease the accuracy of shape matching and recognition.Most traditional shape descriptors utilize local or global information of shapes, which cannot solve the problems on shape deformations and intra-class variations simultaneously.The local descriptors can represent the local shape features effectively but do not consider the global shape structure.By contrast, the global descriptors are robust to local noise and deformations but ignore the detailed local shape features and cannot deal with occlusion.A novel invariant multi-scale descriptor with different types of invariant features is proposed to capture the local and semi-global features of shapes.Method The invariant multi-scale descriptor is defined with five types of invariants, which capture shape features in five forms, including area, changing rate of area, arc length, changing rate of arc length, and central distance.These five types of invariants are normalized between 0 and 1 to capture the inconsistent variations adaptively within one shape and avoid scale transformation.The proposed multi-scale descriptor calculates invariants in multiple scales to combine the advantages of local and global descriptors.This method uses small scales to capture shape details and large scales to represent semi-global features, thereby obtaining rich characterizations of shapes.Considering that different numbers of sample points are usually in two contours for shape matching, dynamic time warping (DTW) algorithm is employed to determine the best correspondence between two sequences of contour points and offer the similarity measure of two different shapes based on their invariant multi-scale descriptors.Result The invariance and robustness of the proposed invariant multi-scale descriptor is evaluated through multiple comparative experiments.In the particular experiments, the five types of invariants of shapes with different influences are plotted, and their Euclidean distances are calculated to show the similarity between different shapes from the same class.The experimental results validate that the proposed descriptor is robust to rotation, scale transformation, partial occlusion, intra-class variations, articulated variations, and noise.Moreover, the effectiveness in shape matching of the proposed method is evaluated in the experiments of shape retrieval on several benchmark datasets.The bull's eye score is used as the rule of judgment in the experiments.In comparison with other methods, the proposed method has the highest accuracy in all four shape datasets, that is, 91.79% in MPEG-7 dataset, 89.75% in the articulated dataset, 95.27% in Kimia's 99 dataset, and 91.33% in Kimia's 216 dataset.At the same time, the average time consumed by the shape recognition in MPEG-7 dataset with the proposed method is 65 ms, which is better than the other recognition methods.The state-of-the-art results demonstrate that the proposed method is effective for shape recognition and retrieval tasks.Conclusion A novel invariant multi-scale descriptor is proposed for shape representation, matching, and recognition.In the proposed descriptor, five types of invariants are utilized to capture shape features from different aspects.These invariants are calculated in several scales, assuring that the local and global information of shapes can be represented simultaneously.The DTW algorithm is employed to determine the best correspondence between two sequences of contour points based on their invariant multiscale descriptors, thereby identifying the appropriate similarity measure for different shapes.The experimental results validate that the proposed descriptor is invariant to rotation, scaling, partial occlusion, intra-class variations, and articulated deformations.The plots of different invariant functions show that the local and semi-global features are both captured by the invariants in different scales.The proposed DTW algorithm can appropriately measure the similarities among different shapes, regardless of the number of their contour points.The retrieval experiments on the benchmark datasets verify that the proposed method has a comparable advantage on retrieval accuracy and efficiency, which are better than the other popular shape recognition methods.The proposed method in this study is suitable for shape recognition and retrieval tasks in complex environments.This method cannot use the prior knowledge of large datasets to accelerate the computation speed and improve the accuracy of shape retrieval and recognition in shape datasets.Therefore, for future studies, the metric learning method would be introduced into shape matching for the better performance of the proposed method.
Keywords

订阅号|日报