发布时间: 2018-08-16 摘要点击次数: 全文下载次数: DOI: 10.11834/jig.180016 2018 | Volume 23 | Number 8 图像理解和计算机视觉

1. 广东轻工职业技术学院, 广州 510300;
2. 广州无线电集团, 广州 510656;
3. 南方医科大学生物医学工程学院, 广州 510515
 收稿日期: 2018-01-08; 修回日期: 2018-03-08 基金项目: 国家自然科学基金项目（61628105） 第一作者简介: 吴绍根, 1968年生, 男, 副教授, 1993年于大连理工大学获计算机应用技术专业硕士学位, 主要研究方向为计算机视觉、模式识别和图像处理。E-mail:bill3000@126.com;聂为清, 男, 博士, 主要研究方向为生物识别及应用。E-mail:wqnie0902@163.com;路利军, 男, 副教授, 博士, 主要研究方向为图像重建。E-mail:lulijun@smu.edu.cn. 中图法分类号: TP391 文献标识码: A 文章编号: 1006-8961(2018)08-1242-12

# 关键词

Comparative study of classic region-based shape descriptors
Wu Shaogen1, Nie Weiqing2, Lu Lijun3, Liu Yaqin3
1. Guangdong industry technical college, Guangzhou 510300, China;
2. Guangzhou Radio Group, Guangzhou 510656, China;
3. Southern Medical University, Guangzhou 510515, China
Supported by: National Natural Science Foundation of China(61628105)

# Abstract

Objective Shape representation and shape matching are the basic tasks in computer vision and pattern recognition. Among all the region-based methods, several classic methods are already available, including Hu moment invariants (Hu method), angular radial transform (ART method), generic Fourier descriptor (GFD method), histogram of Radon transform (HRT method), and multi-scale integral invariant (MSⅡ method). Given the long time spans between all these proposed methods and the fact that only one factor (i.e., retrieval accuracy) is always compared in the past studies, we need a comprehensive comparative study of all these methods to help in application engineering and in future studies. Method To compare the different aspects of the five shape descriptors, we utilize three shape databases. The first shape database, which is a group of simple geometry and one that we modified, includes ten seed shapes. From each of these ten seed shapes, we construct three more shapes through non-rigid deformation, with increasing deformation from the first one to the third one, that is, 40 basic shapes constructed in total. Finally, for each of these 40 basic shapes, we obtain another three more similar shapes by random scale, random rotation, and random translation. Consequently, 160 shapes in the first shape database are constructed. In the retrieval test of the first database, we define a new rule of test scoring, which not only count the retrieval score but also considered the retrieval result order. Therefore, this new test score rule can inspect the intrinsic representation and retrieval ability of the shape descriptor. The second shape database we used in our comparative experiments is the MPEG-7, which consists of 70 different shape categories with each category consisting of 20 shapes of the same category modulo with various rigid and non-rigid deformations and is the standard shape database for shape descriptor and shape retrieval. The experiments are performed on 1400 shape images. Test score for MPEG-7 shape database is based on bullseye score, which counts the number of shapes in the same category (20 shapes in this case) within 40 best matching shapes. The third shape database we used in our experiments is the collection of vehicle trademark shapes. We collect 100 common vehicle flags, such as from Bents, BMW, and Toyota. For each of these 100 vehicle flag shapes, we construct three additional shapes by random scale, random rotation, and random translation and another three shapes using the random perspective parameters. Thus, we obtain a total of 700 vehicle flag shapes. The test score we used for this third database is also based on bullseye score, which counts the number of shapes in the same category (seven shapes in this case) within 14 best matching shapes. In all retrieval experiments for all the three shape databases, we not only compute the test scores but also the retrieval stability using the standard deviations of retrieval scores. We analyze and verified the computation complexity of the compared shape descriptors. After obtaining the test scores, retrieval stability, and computation complexity, a weighted formula, which considers all the three factors, is also defined to measure their comprehensive performances. Result In the retrieval experiments of the first simple geometry shape database, the HRT method achieves the best test score and lowest standard deviation, with the GFD method following, which indicates that HRT is not sensitive to noise in comparison with the other methods. In the retrieval experiments of the second shape database, ART and GFD methods obtain almost the same retrieval scores. In the retrieval experiments of the third shape database, GFD, ART, and HRT methods almost achieve the same retrieval score. In all of the retrieval experiments, the five compared methods are all invariant to scale, rotation, and translation, which are the fundamental requirements for a shape descriptor. We analyze and verify the computation complexity of the 5 methods and find that in the stage of feature creation, Hu method has the lowest computation complexity, and in the stage of feature matching, except for HRT method, all the other four methods have low matching computation complexity. The comprehensive performances of these five methods are measured by a weighted formula, and the GFD method has the best performance with ART as the next. HRT method can degrade with large number of shapes, because HRT method has higher complexity in matching phase than the other methods. The performances of Hu and MSⅡ methods are not satisfactory in all our experiments. The visual features of a shape can also be captured practically by the method of projecting shape onto a basis of orthogonal base functions. In this study, we suppose that the visual features of an image can also be captured practically by the same projection method. Conclusion Among all the evaluated region-based methods, GFD and ART methods have the best performance, and we suppose that they can be employed in engineering applications. Finding new basis of orthogonal base functions may be a fruitful research direction in shape visual feature extraction, as well as in image visual feature extraction, because projecting a shape onto the orthogonal base functions can capture its intrinsic vision features.

# Key words

computer vision; pattern recognition; shape projection; shape descriptor; shape matching

# 1.1 ART方法

ART方法是基于矩的形状表示方法，也是MPEG-7推荐的基于形状区域的表示方法。

# 1.3 GFD方法

GFD方法首先将形状变换到极坐标下，然后对在极坐标下表示的形状在2D的笛卡儿坐标下做傅里叶变换，这个过程称为MPFT (modified polar fourier transform)。对任意给定的形状图像函数${f\left( {x, y} \right)}$，MPFT定义为

 $PF\left( {\rho ,\phi } \right) = \sum\limits_r {\sum\limits_i {f\left( {r,{\theta _i}} \right)\exp \left[ {{\rm{j}}2{\rm{ \mathsf{ π} }}\left( {\frac{r}{R}\rho + \frac{{2{\rm{ \mathsf{ π} }}i}}{T}\phi } \right)} \right]} }$

# 1.5 MSII方法

MSII方法将形状定义为指定定义域$\mathit{\Omega }$上的一个区域$D$，并使用特征函数来描述形状，即

 ${\chi _D}\left( x \right) = \left\{ \begin{array}{l} 1\;\;\;\;x \in D\\ 0\;\;\;x \in \mathit{\Omega }\backslash D \end{array} \right.$

 ${\cal F}_D^\sigma = {\chi _D}\left( x \right) \cdot \left( {1 - {G_\sigma } * {\chi _D}\left( x \right)} \right)$

 ${G_\sigma }\left( x \right) = \frac{1}{{2{\rm{ \mathsf{ π} }}{\sigma ^2}}}{{\rm{e}}^{ - \frac{{{{\left\| x \right\|}^2}}}{{2{\sigma ^2}}}}}$

 $\begin{array}{*{20}{c}} {{S_D}\left( \sigma \right) = \frac{{\int_\Omega {{\cal F}_D^\rho \left( x \right){\rm{d}}x} }}{{\int_D {{\rm{d}}x} }} = }\\ {\frac{{\int_\Omega {{\chi _D}\left( x \right) \cdot \left( {1 - {G_\sigma } * {\chi _D}\left( x \right)} \right){\rm{d}}x} }}{{\int_\Omega {{\chi _D}\left( x \right){\rm{d}}x} }}} \end{array}$

 $\rho = \sigma \sqrt {\int_\Omega {{\chi _D}\left( x \right){\rm{d}}x} }$

$\sigma$为一系列不同值，得到形状在不同尺度下的特征签名，将所有签名作为元素来构建形状的特征向量，然后使用Wasserstein距离作为两个形状的相似性误差值。两个向量的Wasserstein距离定义为

 $d\left( {{D_1},{D_2}} \right) = \int\limits_0^\infty {\left| {{S_{{D_1}}}\left( \sigma \right) - {S_{{D_2}}}\left( \sigma \right)} \right|{\rm{d}}\sigma }$

# 4 研究结论

 $\begin{array}{*{20}{c}} {\rho = 0.7 \times \sum\limits_{i = 1}^3 {\frac{{{Q_i}}}{3}} + }\\ {0.1 \times \left( {100 - \sum\limits_{i = 1}^3 {\frac{{{S_i}}}{3}} } \right) + }\\ {0.2 \times \left( {100 - 100 \times M} \right)} \end{array}$

# 参考文献

• [1] Belongie S, Malik J, Puzicha J. Shape matching and object recognition using shape contexts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(4): 509–522. [DOI:10.1109/34.993558]
• [2] Ling H B, Jacobs D W. Shape classification using the inner-distance[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(2): 286–299. [DOI:10.1109/TPAMI.2007.41]
• [3] Livarinen J, Visa A J E. Shape recognition of irregular objects[C]//Proceedings of Volume 2904, Intelligent Robots and Computer Vision XV. Boston, MA: SPIE, 1996: 25-32. [DOI: 10.1117/12.256280]
• [4] Liu Y K, Wei W, Wang P J, et al. Compressed vertex chain codes[J]. Pattern Recognition, 2007, 40(11): 2908–2913. [DOI:10.1016/j.patcog.2007.03.001]
• [5] Zhang D S, Lu G J. A comparative study of curvature scale space and Fourier descriptors for shape-based image retrieval[J]. Journal of Visual Communication and Image Representation, 2003, 14(1): 39–57. [DOI:10.1016/S1047-3203(03)00003-8]
• [6] Zhang D S, Lu G J. A comparative study of Fourier descriptors for shape representation and retrieval[C]//Proceedings of the 5th Asian Conference on Computer Vision. Melbourne, Australia: ACCV, 2002: 646-651.
• [7] Mokhtarian F, Mackworth A K. A theory of multiscale, curvature-based shape representation for planar curves[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1992, 14(8): 789–805. [DOI:10.1109/34.149591]
• [8] Wu S G, Wang K, Lu L J, et al. GCT transform and similarity determination of geometry shapes[J]. Journal of Image and Graphics, 2016, 21(12): 1671–1684. [吴绍根, 王康, 路利军, 等. GCT变换及几何图形形状相似性判定[J]. 中国图象图形学报, 2016, 21(12): 1671–1684. ] [DOI:10.11834/jig.20161212]
• [9] Yang J Y, Wang H X, Yuan J S, et al. Invariant multi-scale descriptor for shape representation, matching and retrieval[J]. Computer Vision and Image Understanding, 2016, 145: 43–58. [DOI:10.1016/j.cviu.2016.01.005]
• [10] Presles B, Debayle J. A distance-based shape descriptor invariant to similitude and its application to shape classification[C]//Proceedings of the 23rd International Conference on Pattern Recognition. Cancun, Mexico: IEEE, 2017: 2598-2603. [DOI: 10.1109/icpr.2016.7900027]
• [11] Wu S G, Nie W Q, Lu L J, et al. Inside-circle distance transform for shapes[J]. Journal of Image and Graphics, 2018, 23(1): 39–51. [吴绍根, 聂为清, 路利军, 等. 形状的圆内距离变换[J]. 中国图象图形学报, 2018, 23(1): 39–51. ] [DOI:10.11834/jig.170362]
• [12] Hu M K. Visual pattern recognition by moment invariants[J]. IRE Transactions on Information Theory, 1962, 8(2): 179–187. [DOI:10.1109/TIT.1962.1057692]
• [13] Taubin G, Cooper D B. Recognition and positioning of rigid objects using algebraic moment invariants[C]//Proceedings Volume 1570, Geometric Methods in Computer Vision. San Diego, CA: SPIE, 1991: 175-186. [DOI: 10.1117/12.48423]
• [14] Mukundan R, Ong S H, Lee P A. Image analysis by Tchebichef moments[J]. IEEE Transactions on Image Processing, 2001, 10(9): 1357–1364. [DOI:10.1109/83.941859]
• [15] Hmimid A, Sayyouri M, Qjidaa H. Image classification using a new set of separable two-dimensional discrete orthogonal invariant moments[J]. Journal of Electronic Imaging, 2014, 23(1): 013026. [DOI:10.1117/1.jei.23.1.013026]
• [16] Khotanzad A, Hong Y H. Invariant image recognition by Zernike moments[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1990, 12(5): 489–497. [DOI:10.1109/34.55109]
• [17] Zhu H Q. Image analysis using separable two-dimensional discrete orthogonal moments[C]//Proceedings of the 18th IEEE International Conference on Image Processing. Brussels, Belgium: IEEE, 2011: 817-820. [DOI: 10.1109/icip.2011.6116681]
• [18] Zhang D S, Lu G J. Shape-based image retrieval using generic Fourier descriptor[J]. Signal Processing:Image Communication, 2002, 17(10): 825–848. [DOI:10.1016/S0923-5965(02)00084-X]
• [19] Ricard J, Coeurjolly D, Baskurt A. Generalization of angular radial transform[C]//Proceedings of 2004 International Conference on Image Processing. Singapore: IEEE, 2004: 2211-2214. [DOI: 10.1109/icip.2004.1421536]
• [20] Khalil M I, Bayoumi M M. A dyadic wavelet affine invariant function for 2d shape recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(10): 1152–1164. [DOI:10.1109/34.954605]
• [21] Tabbone S, Terrades O R, Barrat S. Histogram of radon transform. A useful descriptor for shape retrieval[C]//Proceedings of the 19th International Conference on Pattern Recognition. Tampa, FL, USA: IEEE, 2008: 1-4. [DOI: 10.1109/ICPR.2008.4761555]
• [22] Santosh K C, Lamiroy B, Wendling L. DTW-radon-based shape descriptor for pattern recognition[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2013, 27(3): 1350008. [DOI:10.1142/s0218001413500080]
• [23] Hasegawa M, Tabbone S. Amplitude-only log radon transform for geometric invariant shape descriptor[J]. Pattern Recognition, 2014, 47(2): 643–658. [DOI:10.1016/j.patcog.2013.07.024]
• [24] Hasegawa M, Tabbone S. Histogram of radon transform with angle correlation matrix for distortion invariant shape descriptor[J]. Neurocomputing, 2016, 173: 24–35. [DOI:10.1016/j.neucom.2015.04.100]
• [25] Wang B, Gao Y S. Structure integral transform versus radon transform:a 2D mathematical tool for invariant shape recognition[J]. IEEE Transactions on Image Processing, 2016, 25(12): 5635–5648. [DOI:10.1109/TIP.2016.2609816]
• [26] Manay S, Cremers D, Hong B W, et al. Integral invariants for shape matching[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2006, 28(10): 1602–1618. [DOI:10.1109/TPAMI.2006.208]
• [27] Hong B W, Soatto S. Shape matching using multiscale integral invariants[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(1): 151–160. [DOI:10.1109/TPAMI.2014.2342215]
• [28] Yang M Q, Kidiyo K, Joseph R. A survey of shape feature extraction techniques[J]. Pattern Recognition, 2008: 43–90. [DOI:10.5772/6237]
• [29] Latecki L J, Lakamper R, Eckhardt T. Shape descriptors for non-rigid shapes with a single closed contour[C]//Proceedings of 2000 IEEE Conference on Computer Vision and Pattern Recognition. Hilton Head Island, SC, USA: IEEE, 2000: 424-429. [DOI: 10.1109/cvpr.2000.855850]