面向可解释性的物体拓扑结构骨架表征方法
Skeleton characterization of object topology toward explainability
- 2020年25卷第12期 页码:2587-2602
收稿:2019-12-19,
修回:2020-4-30,
录用:2020-5-7,
纸质出版:2020-12-16
DOI: 10.11834/jig.190661
移动端阅览

浏览全部资源
扫码关注微信
收稿:2019-12-19,
修回:2020-4-30,
录用:2020-5-7,
纸质出版:2020-12-16
移动端阅览
目的
2
模式识别中,通常使用大量标注数据和有效的机器学习算法训练分类器应对不确定性问题。然而,这一过程缺乏知识表征和可解释性。认知心理学和实验心理学的研究表明,人类往往不使用代价如此巨大的机制,而是使用表征、归纳、推理、解释和约束传播等与符号主义人工智能方法类似的手段来应对物体识别中的不确定性并提供可解释性。因此本文旨在从传统的符号计算出发,利用骨架拓扑结构表征提供一种可解释性的思路。
方法
2
以骨架树为基本手段来形成物体拓扑结构特征和几何特征的形式化表征,并基于泛化框架对少量同类表征进行知识抽取来形成关于物体类别的知识概括显式化表征。
结果
2
在形成物体类别的概括表征实验中,通过路径重建直观展示了同类属物体上得到的最一般表征的几何物理意义。在可解释性验证实验中,通过跨数据的拓扑应用展示了新测试样本相对于概括表征的特定差异,表明该表征具有良好的可解释性。最后在形状补全的不确定性推理实验中,不仅可以得到识别结论,而且清晰展示了识别背后做出的判断依据,进一步验证了该表征的可解释性。
结论
2
实验表明一般化的形式表征能够应对尺寸、颜色和形状等不确定性问题,本文方法避免了基于纹理特征所带来的不确定性,适用于任意基于基元的表征方式,具有更好的鲁棒性、普适性和可解释性,计算代价更小。
Objective
2
Understanding the shape and structure of objects is extremely important in object recognition. The most commonly utilized pattern recognition method is machine learning
which often requires a large number of training data. However
this object-oriented learning method lacks a priori knowledge
uses a large amount of training data and complex computations
and is unable to extract explicit knowledge after learning (i.e.
"knowing how without knowing why"). Great uncertainties are encountered in object recognition tasks due to changes in size
color
illumination
position
and environmental background. To deal with such uncertainties
a large number of samples should be trained and powerful machine learning algorithms should be used to generate a classifier. Despite achieving a favorable recognition accuracy in some standard datasets
these models lack explainability
and recent studies have shown that these purely data-driven models are vulnerable. These models also often ignore knowledge representation and even consider this aspect redundant. However
cognitive and experimental psychology research suggests that humans do not adopt such mechanism. Similar symbolic artificial intelligence methods
such as representation
induction
reasoning
interpretation
and constraint propagation have also been used to deal with the uncertainties in object recognition. In vision tasks
improving explainability is considered more important than improving accuracy. Such is the goal of interpretable artificial intelligence. Accordingly
this paper aims to provide an interpretable way of thinking from the traditional symbolic computing idea and adopts the skeleton topology representation.
Method
2
Psychological research reveals that humans show strong topological and geometric preferences when engaged in visual tasks. To explicitly characterize geometric and topological features
the proposed method adopts skeleton descriptors with excellent topological geometric characteristics. First
an object was decomposed into several connected components based on a skeleton graph
and each component was represented by a skeleton branch. Second
the statistical parameter of these skeleton branches were obtained
including their area ratio
path length
and skeletal mass ratio distribution. Third
the skeleton radius path was used to describe the contour of the object. Fourth
to form a robust spatial topology constraint
the spine-like axis (SPA) was used to describe the spatial distribution of shape components. Finally
a skeleton tree was used to represent the topological structure (RTS) of objects. A similarity measure based on RTS was also proposed
and the optimal subsequence bijection (OSB) was used for the elastic matching of object shapes. A multi-level generalization framework was then built to extract knowledge from a small number of similar representations and to subsequently form a generalized explicit representation (GRTS) of the object categories. The uncertainty reasoning and explainability were verified based on certainty factor theory.
Result
2
The proposed model illustrates the process of generating GRTS on the Kimia99 and Tari56 datasets and presents the physical meanings of most general representations obtained from homogeneous objects. The skeletal path of an object of the same category was used for reconstruction to clearly describe the object meaning of each part of GRTS. In the explainability verification experiment
the GRTS of several categories obtained from the Tari56 dataset was used to apply the topological character to a sample of Kimia99's closest category to discover the specific differences of the new test sample relative to the GRTS. Results show that the representation has good explainability. Meanwhile
in the shape complementing experiments
RTS was initially extracted from incomplete shapes to gather evidence
and the uncertainty reasoning was validated with the rule set (Tari56) that was established according to GRTS. The proposed model only provided are cognition conclusion and showed specific judgment basis
thereby further verifying the explainability of the representation.
Conclusion
2
A skeleton tree was used as the basic means for generating a formal representation of the topological and geometric features of an object. Based on the generalization framework
the knowledge extracted from a small number of similar representations was used to form a generalized explicit representation of the knowledge about an object category. The knowledge representation results were then used to conduct uncertainty reasoning experiments. This work presents a new perspective toward explainability and helps build trust-based relationships between models and people. Experiment results show that the generalized formal representation can cope with uncertainties in size
color
and shape. This representation also has strong robustness and universality
can prevent uncertainties arising from texture features
and is suitable for any primitive-based representation. The proposed approach significantly outperforms the mainstream machine learning methods in terms of explainability and computational cost.
Adankon M M and Cheriet M. 2014. Support Vector Machine. New York:Springer:1-28[DOI:10.1007/978-3-642-27733-7_299-3]
Andreopoulos A and Tsotsos J K. 2013. 50 years of object recognition:directions forward. Computer Vision and Image Understanding, 117(8):827-891[DOI:10.1016/j.cviu.2013.04.005]
Asian C and Tari S. 2005. An axis-based representation for recognition//Proceedings of the 10th IEEE International Conference on Computer Vision (ICCV'05). Beijing, China: IEEE: 1339-1346[ DOI: 10.1109/ICCV.2005.32 http://dx.doi.org/10.1109/ICCV.2005.32 ]
Aubry M and Russell B C. 2015. Understanding deep features with computer-generated imagery//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 2875-2883[ DOI: 10.1109/ICCV.2015.329 http://dx.doi.org/10.1109/ICCV.2015.329 ]
Bai X, Wang X G, Latecki L J, Liu W Y and Tu Z W. 2009. Active skeleton for non-rigid object detection//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan: IEEE: 575-582[ DOI: 10.1109/ICCV.2009.5459188 http://dx.doi.org/10.1109/ICCV.2009.5459188 ]
Bau D, Zhou B L, Khosla A, Oliva A and Torralba A. 2017. Network dissection: quantifying interpretability of deep visual representations//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 6541-6549[ DOI: 10.1109/CVPR.2017.354 http://dx.doi.org/10.1109/CVPR.2017.354 ]
Biederman I. 1987. Recognition-by-components:a theory of human image understanding. Psychological Review, 94(2):115-147[DOI:10.1037/0033-295X.94.2.115]
Blum H. 1973. Biological shape and visual science (part I). Journal of Theoretical Biology, 38(2):205-287[DOI:10.1016/0022-5193(73)90175-6]
Brendel W and Bethge M. 2019. Approximating CNNs with bag-of-local-features models works surprisingly well on ImageNet. https://arxiv.org/pdf/1904.00760.pdf https://arxiv.org/pdf/1904.00760.pdf
Catherine R and Cohen W. 2016. Personalized recommendations using knowledge graphs: a probabilistic logic programming approach//Proceedings of the 10th ACM Conference on Recommender Systems. Boston, USA: ACM: 325-332[ DOI: 10.1145/2959100.2959131 http://dx.doi.org/10.1145/2959100.2959131 ]
Chen P Y, Sharma Y, Zhang H, Yi J F and Hsieh C J. 2018. EAD: elastic-net attacks to deep neural networks via adversarial examples//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, Louisiana, USA: AAAI
Cho S, Maqbool M H, Liu F and Foroosh H. 2019. Self-attention network for skeleton-based human action recognition[EB/OL].[2019-12-19] . https://arxiv.org/pdf/1912.08435.pdf https://arxiv.org/pdf/1912.08435.pdf
Crevier D and Lepage R. 1997. Knowledge-based image understanding systems:a survey. Computer Vision and Image Understanding, 67(2):161-185[DOI:10.1006/cviu.1996.0520]
Demirci M F, Shokoufandeh A and Dickinson S J. 2009. Skeletal shape abstraction from examples. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5):944-952[DOI:10.1109/TPAMI.2008.267]
Dosovitskiy A and Brox T. 2016. Inverting visual representations with convolutional networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 4829-4837[ DOI: 10.1109/CVPR.2016.522 http://dx.doi.org/10.1109/CVPR.2016.522 ]
Eiter T and Mannila H. 1994. Computing Discrete Fréchet Distance. Technical Report CD-TR 94/64, Technische Universitat Wien
Eysenck M W and Keane M T. 2005. Cognitive Psychology: A Student's Handbook. 5th ed.[s.l.]: Psychology Press
Fu K S. 1977. Syntactic Pattern Recognition Applications. New York: Springer-Verlag
Gao H, Zhuang L, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE: 4700-4708[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]
Geirhos R, Rubisch P, Michaelis C, Bethge M, Wichmann F A and Brendel W. 2018. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness[EB/OL].[2019-12-19] . https://arxiv.org/pdf/1811.12231.pdf https://arxiv.org/pdf/1811.12231.pdf
Gonzalez R C and Thomas M G. 1978. Syntactic pattern recognition: an introduction. Reading: Addison-Wesley
Goodfellow I J, Shlens J and Szegedy C. 2014. Explaining and harnessing adversarial examples[EB/OL].[2019-12-19] . https://arxiv.org/pdf/1412.6572.pdf https://arxiv.org/pdf/1412.6572.pdf
Graham S A and Diesendruck G. 2010. Fifteen-month-old infants attend to shape over other perceptual properties in an induction task. Cognitive Development, 25(2):111-123[DOI:10.1016/j.cogdev.2009.06.002]
Hayes J and Danezis G. 2017. Machine learning as an adversarial service: learning black-box adversarial example[EB/OL].[2019-12-19] . https://arxiv.org/pdf/1708.05207.pdf https://arxiv.org/pdf/1708.05207.pdf
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Hofmann M, Bourne J and Brodersen A. 1989. Knowledge representation for reasoning about devices//Proceedings of the 21st Southeastern Symposium on System Theory. Tallahassee: IEEE: 446-449[ DOI: 10.1109/SSST.1989.72508 http://dx.doi.org/10.1109/SSST.1989.72508 ]
Hubel D H and Wiesel T N. 1959. Receptive fields of single neurones in the cat's striate cortex. The Journal of Physiology, 148(3):574-591[DOI:10.1113/jphysiol.1959.sp006308]
Hupp J M. 2008. Demonstration of the shape bias without label extension. Infant Behavior and Development, 31(3):511-517[DOI:10.1016/j.infbeh.2008.04.002]
Iwańska L, Mata N and Kruger K. 1999. Fully automatic acquisition of taxonomic knowledge from large corpora of texts: limited syntax knowledge representation system based on natural language//Proceedings of the 11th International Symposium on Methodologies for Intelligent Systems. Poland: Springer: 430-438[ DOI: 10.1007/BFb0095130 http://dx.doi.org/10.1007/BFb0095130 ]
Kindermans P J, Schütt K T, Alber M, Müller K R, Erhan D, Kim B and Dähne S. 2017. Learning how to explain neural networks: patternnet and patternattribution[EB/OL].[2019-12-19] . https://arxiv.org/pdf/1705.05598.pdf https://arxiv.org/pdf/1705.05598.pdf
Kriegeskorte N. 2015. Deep neural networks:a new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1:417-446[DOI:10.1146/annurev-vision-082114-035447]
Kubilius J, Bracci S and de Beeck H P O. 2016. Deep neural networks as a computational model for human shape sensitivity. PLoS Computational Biology, 12(4):e1004896[DOI:10.1371/journal.pcbi.1004896]
Landau B, Smith L B and Jones S S. 1988. The importance of shape in early lexical learning. Cognitive Development, 3(3):299-321[DOI:10.1016/0885-2014(88)90014-7]
Latecki L J, Wang Q, Koknar-Tezel S and Megalooikonomou V. 2007. Optimal subsequence bijection//Proceedings of the 7th IEEE International Conference on Data Mining. Omaha, NE, USA: IEEE: 565-570[ DOI: 10.1109/ICDM.2007.47 http://dx.doi.org/10.1109/ICDM.2007.47 ]
LeCun Y, Bengio Y and Hinton G. 2015. Deep learning. Nature, 521(7553):436-444[DOI:10.1038/nature14539]
Li H B and Yu J P. 2014. Knowledge representation and discovery for the interaction between syntax and semantics: a case study of must//Proceedings of 2014 International Conference on Progress in Informatics and Computing. Shanghai: IEEE: 153-157[ DOI: 10.1109/PIC.2014.6972315 http://dx.doi.org/10.1109/PIC.2014.6972315 ]
Li H J, Liu L J, Sun F M, Yu B and Liu C X. 2015. Multi-level feature representations for video semantic concept detection. Neurocomputing, 172:64-70[DOI:10.1016/j.neucom.2014.09.096]
Li X and Li F X. 2017. Adversarial examples detection in deep networks with convolutional filter statistics//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 5764-5772[ DOI: 10.1109/iccv.2017.615 http://dx.doi.org/10.1109/iccv.2017.615 ]
Li X, Zha Y F, Zhang T Z, Cui Z, Zuo W M, Hou Z Q, Lu H C and Wang H Z. 2019. Survey of visual object tracking algorithms based on deep learning. Journal of Image and Graphics, 24(12):2057-2080
李玺, 查宇飞, 张天柱, 崔振, 左旺孟, 侯志强, 卢湖川, 王菡子. 2019.深度学习的目标跟踪算法综述.中国图象图形学报, 24(12):2057-2080[DOI:10.11834/jig.190372]
Liu B. 2010. Uncertainty Theory: A Branch of Mathematics for Modeling Human Uncertainty. LZsT of FZGURES.85(13.4)
Mahendran A and Vedaldi A. 2015. Understanding deep image representations by inverting them//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE: 5188-5196[ DOI: 10.1109/CVPR.2015.7299155 http://dx.doi.org/10.1109/CVPR.2015.7299155 ]
McMahon T and Oommen B J. 2017. Enhancing English-Japanese translation using syntactic pattern recognition methods//Proceedings of the 10th International Conference on Computer Recognition Systems. Cham: Springer: 33-42[ DOI: 10.1007/978-3-319-59162-9_4 http://dx.doi.org/10.1007/978-3-319-59162-9_4 ]
Moosavi-Dezfooli S M, Fawzi A and Frossard P. 2016. DeepFool: a simple and accurate method to fool deep neural networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 2574-2582[ DOI: 10.1109/CVPR.2016.282 http://dx.doi.org/10.1109/CVPR.2016.282 ]
Nguyen A, Yosinski J and Clune J. 2015. Deep neural networks are easily fooled: high confidence predictions for unrecognizable images//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE: 427-436[ DOI: 10.1109/CVPR.2015.7298640 http://dx.doi.org/10.1109/CVPR.2015.7298640 ]
Papadopoulos K, Ghorbel E, Aouada D and Ottersten B. 2019. Vertex feature encoding and hierarchical temporal modeling in a spatial-temporal graph convolutional network for action recognition[EB/OL].[2019-12-19] . https://arxiv.org/pdf/1912.09745.pdf https://arxiv.org/pdf/1912.09745.pdf
Piaget J, Inhelder B, Langdon F J and Lunzer J L. 1957. The child's conception of space. British Journal of Educational Studies, 5(2):187-189
Rastogi A, Arora R and Sharma S. 2015. Leaf disease detection and grading using computer vision technology and fuzzy logic//Proceedings of the 2nd International Conference on Signal Processing and Integrated Networks (SPIN). Noida, India: IEEE: #7095350[ DOI: 10.1109/SPIN.2015.7095350 http://dx.doi.org/10.1109/SPIN.2015.7095350 ]
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN:towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6):1137-1149[DOI:10.1109/TPAMI.2016.2577031]
Ritter S, Barrett D G T, Santoro A and Botvinick M M. 2017. Cognitive psychology for deep neural networks: a shape bias case study//Proceedings of the 34th International Conference on Machine Learning. Sydney, NSW, Australia: ACM: 2940-2949
Sanocki T, Bowyer K W, Heath M D and Sarkar S. 1998. Are edges sufficient for object recognition? Journal of Experimental Psychology Human Perception and Performance, 24(1): 340-349[ DOI: 10.1037/0096-1523.24.1.340 http://dx.doi.org/10.1037/0096-1523.24.1.340 ]
Sebastian T B, Klein P N and Kimia B B. 2004. Recognition of shapes by editing their shock graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(5):550-571[DOI:10.1109/TPAMI.2004.1273924]
Selvaraju R R, Cogswell M, Das A, Vedantam R, Parikh D and Batra D. 2017. Grad-CAM: visual explanations from deep networks via gradient-based localization//Proceedings of 2017 IEEE International Conference on Computer Vision: Venice, Italy: IEEE: 618-626[ DOI: 10.1007/s11263-019-01228-7 http://dx.doi.org/10.1007/s11263-019-01228-7 ]
Sera M D and Millett K G. 2011. Developmental differences in shape processing. Cognitive Development, 26(1):40-56[DOI:10.1016/j.cogdev.2010.07.003]
Shen W, Wang Y, Bai X, Wang H Y and Latecki L J. 2013. Shape clustering:common structure discovery. Pattern Recognition, 46(2):539-550[DOI:10.1016/j.patcog.2012.07.023]
Shen W, Zhao K, Jiang Y, Wang Y, Zhang Z J and Bai X. 2016. Object skeleton extraction in natural images by fusing scale-associated deep side outputs//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 222-230[ DOI: 10.1109/CVPR.2016.31 http://dx.doi.org/10.1109/CVPR.2016.31 ]
Simonyan K, Vedaldi A and Zisserman A. 2013. Deep inside convolutional networks: visualising image classification models and saliency maps[EB/OL].[2019-12-19] . https://arxiv.org/pdf/1312.6034.pdf https://arxiv.org/pdf/1312.6034.pdf
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I and Fergus R. 2013. Intriguing properties of neural networks[EB/OL].[2019-12-19] . https://arxiv.org/pdf/1312.6199.pdf https://arxiv.org/pdf/1312.6199.pdf
Torsello A and Hancock E R. 2006. Learning shape-classes using a mixture of tree-unions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(6):954-967[DOI:10.1109/TPAMI.2006.125]
Wei H, Yu Q and Yang C Z. 2016. Shape-based object recognition via evidence accumulation inference. Pattern Recognition Letters, 77:42-49[DOI:10.1016/j.patrec.2016.03.022]
Wu T F, Sun W, Li X L, Song X and Li B. 2017. Towards interpretable R-CNN by unfolding latent structures[EB/OL].[2019-12-19] . https://arxiv.org/pdf/1711.05226.pdf https://arxiv.org/pdf/1711.05226.pdf
Yosinski J, Clune J, Bengio Y and Lipson H. 2014. How transferable are features in deep neural networks?//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Quebec, Canada: ACM: 3320-3328
Yuille A L and Liu C X. 2018. Deep Nets: what have they ever done for Vision?[EB/OL].[2019-12-19] . https://arxiv.org/pdf/1805.04025.pdf https://arxiv.org/pdf/1805.04025.pdf
Zeiler M D and Fergus R. 2014. Visualizing and understanding convolutional networks//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 818-833[ DOI: 10.1007/978-3-319-10590-1_53 http://dx.doi.org/10.1007/978-3-319-10590-1_53 ]
Zhang Q S, Cao R M, Zhang S M, Redmonds M, Wu Y N and Zhu S C. 2017. Interactively transferring CNN patterns for part localization. https://arxiv.org/pdf/1708.01783.pdf https://arxiv.org/pdf/1708.01783.pdf
Zhang Q S and Zhu S C. 2018. Visual interpretability for deep learning:a survey. Frontiers of Information Technology and Electronic Engineering, 19(1):27-39[DOI:10.1631/FITEE.1700808]
Zhong S P. 2007. Transformation group and geometry. Science and Technology Information, (34):507, 540
钟师鹏. 2007.变换群与几何学.科技信息(科学教研), (34):507, 540
Zhou B L, Khosla A, Lapedriza A, Oliva A and Torralba A. 2016. Learning deep features for discriminative localization//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, NV, USA: IEEE: 2921-2929[ DOI: 10.1109/CVPR.2016.319 http://dx.doi.org/10.1109/CVPR.2016.319 ]
Zhou M, Shi Z W and Ding H P. 2017. Aircraft classification in remote-sensing images using convolutional neural networks. Journal of Image and Graphics, 22(5):702-708
周敏, 史振威, 丁火平. 2017.遥感图像飞机目标分类的卷积神经网络方法.中国图象图形学报, 22(5):702-708[DOI:10.11834/jig.160595]
Zintgraf L M, Cohen T S, Adel T and Welling M. 2017. Visualizing deep neural network decisions: prediction difference analysis[EB/OL].[2019-12-19] . https://arxiv.org/pdf/1702.04595.pdf https://arxiv.org/pdf/1702.04595.pdf
相关作者
相关机构
京公网安备11010802024621