混合增强视觉认知架构及其关键技术进展
Hybrid enhanced visual cognition framework and its key technologies
- 2021年26卷第11期 页码:2619-2629
收稿:2020-08-12,
修回:2020-12-18,
录用:2020-12-25,
纸质出版:2021-11-16
DOI: 10.11834/jig.200446
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-08-12,
修回:2020-12-18,
录用:2020-12-25,
纸质出版:2021-11-16
移动端阅览
智能视觉系统虽然在大规模信息的特征检测、提取与匹配等处理上具备一定优势,但是在深层次认知上仍存在不确定性和脆弱性,尤其是针对视觉感知基础上的视觉认知任务,相关数理逻辑和图像处理方法并未实现质的突破,智能算法难以取代人类执行较为复杂的理解、推理、决策和学习等操作。为助力智能视觉感知和认知技术的进一步发展,本文总结了混合增强智能在视觉认知领域的应用现状,给出了混合增强视觉认知的基本架构,并对可纳入该架构下的应用领域及关键技术进行了综述。首先,在分析智能视觉感知内涵和基本范畴的基础上,融合人的视觉感知与心理认知,探讨混合增强视觉认知的定义、范畴及其深化过程,对不同的视觉信息处理阶段进行对比,进而在分析相关认知模型发展现状的基础上,构建混合增强视觉认知的基本框架。该架构不仅可依靠智能算法进行快速地检测、识别、理解等处理,最大限度地挖掘"机"的计算潜能,而且可凭借适时、适当的人工推理、预测和决策有效增强系统认知的准确性和可靠性,最大程度地发挥人的认知优势。其次,分别从混合增强的视觉监测、视觉驾驶、视觉决策以及视觉共享等4个领域探讨可纳入该架构的代表性应用及存在的问题,指出混合增强视觉认知架构是现有技术条件下能够更好地发挥计算机效能、减轻人处理信息压力的方式。最后,基于高、中、低计算机视觉处理技术体系,分析混合增强视觉认知架构中部分中高级视觉处理技术的宏观、微观关系,重点综述可视化分析、视觉增强、视觉注意、视觉理解、视觉推理、交互式学习以及认知评估等关键技术。混合增强视觉认知架构有助于突破当前视觉信息认知"弱人工智能"的瓶颈,将有力促进智能视觉系统向人机深度融合方向发展。下一步,还需在纯粹的基础创新、高效的人机交互、柔性的连接通路等方面开展更加深入的研究。
Although the current intelligent vision system has certain advantages in feature detection
the extraction and matching of large-scale visual information and the cognition of deep-seated visual information remain uncertain and fragile. How to mine and understand the connotation of visual information efficiently
and make cognitive decisions is an engaging research field in computer vision. Especially for the visual cognitive task based on visual perception
the related mathematical logic and image processing methods have not achieved a qualitative breakthrough at present due to limitations by the western philosophy system. It makes the development of computer vision processing intelligent algorithm enter a bottleneck period and completely replacing human to perform more complex operations such as understanding
reasoning
decision making
and learning difficult. The basic framework of hybrid enhanced visual cognition and the application fields and key technologies that can be included in the framework to promote the development of intelligent visual perception and cognitive technology based on the application status of hybrid enhanced intelligence in the field of visual cognition are summarized in this paper. First
on the basis of analyzing the connotation and basic category of intelligent visual perception
human visual perception and psychological cognition are integrated; the definition
category
and deepening of hybrid enhanced visual cognition are discussed; different visual information processing stages are compared and analyzed; and then the basic framework of hybrid enhanced visual cognition on analyzing the development status of relevant cognitive models is constructed. The framework can rely on intelligent algorithms for rapid detection
recognition
understanding
and other processing to maximize the computational potential of "machine"; can effectively enhance the accuracy and reliability of system cognition with timely
appropriate artificial reasoning
prediction
and decision making; and give full play to human cognitive advantages. Second
the representative applications and existing problems of the framework are discussed from four fields
namely
hybrid enhanced visual monitoring
hybrid enhanced visual driving
hybrid enhanced visual decision making
and hybrid enhanced visual sharing
and the hybrid enhanced visual cognitive framework is identified as an expedient measure to enhance computer efficiency and reduce the pressure on people to process information under existing technical conditions. Then
based on high
medium
and low computer vision processing technology systems
the macro and micro relationships of several medium- and high-level visual processing technologies in a hybrid enhanced visual cognition framework are analyzed
focusing on key technologies such as visual analysis
visual enhancement
visual attention
visual understanding
visual reasoning
interactive learning
and cognitive evaluation. This framework will help break through the bottleneck of "weak artificial intelligence" in current visual information cognition and effectively promote the further development of intelligent vision system toward the direction of human-computer deep integration. Next
more indepth research must be carried out on pure basic innovation
efficient human-computer interaction
and flexible connection path.
Aleshinskaya E and Albatsha A. 2020. A cognitive model to enhance professional competence in computer science. Procedia Computer Science, 169: 326-329[DOI:10.1016/j.procs.2020.02.191]
Arbulu M, Mateus P, Wagner M, Beltran C and Harada K. 2018. Industry 4.0, intelligent visual assisted picking approach//Proceedings of International Conference on Mining Intelligence and Knowledge Exploration. Cluj-Napoca, Romania: Springer: 205-214[ DOI: 10.1007/978-3-030-05918-7_18 http://dx.doi.org/10.1007/978-3-030-05918-7_18 ]
Atif Y, Mathew S S and Lakas A. 2015. Building a smart campus to support ubiquitous learning. Journal of Ambient Intelligence and Humanized Computing, 6(2): 223-238[DOI:10.1007/s12652-014-0226-y]
Baradel F, Neverova N, Wolf C, Mille J and Mori G. 2018. Object level visual reasoning in videos//Proceedings of Computer Vision-ECCV 2018. Munich, Germany: Springer: 106-122[ DOI: 10.1007/978-3-030-01261-8_7 http://dx.doi.org/10.1007/978-3-030-01261-8_7 ]
Braun A, Tuttas S, Borrmann A and Stilla U. 2020. Improving progress monitoring by fusing point clouds, semantic data and computer vision. Automation in Construction, 116: #103210[DOI:10.1016/j.autcon.2020.103210]
Chen X J, Ke J, Zhan Y Z, Chen X B, Zhang Q Q, Jiang X M, Song X P, Chen B D, Xu H and Zhang J G. 2017. Improved combined invariant moment for moving targets classification. Multimedia Tools and Applications, 76(19): 19959-19982[DOI:10.1007/s11042-016-4014-x]
Chen X L, Li L J, Li F F and Gupta A. 2018. Iterative visual reasoning beyond convolutions//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 7239-7248[ DOI: 10.1109/CVPR.2018.00756 http://dx.doi.org/10.1109/CVPR.2018.00756 ]
Chen Y, Argentinis E and Weber G. 2016. IBM Watson: how cognitive computing can be applied to big data challenges in life sciences research. Clinical Therapeutics, 38(4): 688-701[DOI:10.1016/j.clinthera.2015.12.001]
Chen Y X. 2019. Taking advantage of Guizhou public security traffic police big data to move to the whole country[EB/OL]. [2020-07-25] . http://kpgz.gog.cn/system/2019/05/22/017246184.shtml http://kpgz.gog.cn/system/2019/05/22/017246184.shtml
陈玉祥. 2019. 借势贵州公安交警大数据走向全国依图新推视觉推理芯片[EB/OL].[2020-07-25] . http://kpgz.gog.cn/system/2019/05/22/017246184.shtml http://kpgz.gog.cn/system/2019/05/22/017246184.shtml
Chernenkiy V, Gapanyuk Y, Terekhov V, Revunkov G and Kaganov Y. 2018. The hybrid intelligent information system approach as the basis for cognitive architecture. Procedia Computer Science, 145: 143-152[DOI:10.1016/j.procs.2018.11.022]
Christensen H I and Nagel H H. 2006. Cognitive Vision Systems. Berlin, Heidelberg: Springer: 221-246[ DOI: 10.1007/11414353 http://dx.doi.org/10.1007/11414353 ]
Daw E. 2020. What is visual reasoning?[EB/OL]. [2020-05-09] . https://www.wisegeek.com/what-is-visual-reasoning.htm https://www.wisegeek.com/what-is-visual-reasoning.htm
Fan S J, Ng T T, Koenig B L, Herberg J S, Jiang M, Shen Z Q and Zhao Q. 2018. Image visual realism: from human perception to machine computation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(9): 2180-2193[DOI:10.1109/TPAMI.2017.2747150]
Feest U. 2021. Gestalt psychology, frontloading phenomenology, and psychophysics. Synthese, 198: 2153-2173[DOI:10.1007/s11229-019-02211-y]
Forbus K D, Liang C and Rabkina I. 2017. Representation and computation in cognitive models. Topics in Cognitive Science, 9(3): 694-718[DOI:10.1111/tops.12277]
Hou R L. 2019. Marching ahead in exchanges and mutual learning. China Today, (6): #2
Huang J, Wang C, Liu Y and Bi T T. 2019. The progress of monocular depth estimation technology. Journal of Image and Graphics, 24(12): 2081-2097
黄军, 王聪, 刘越, 毕天腾. 2019. 单目深度估计技术进展综述. 中国图象图形学报, 24(12): 2081-2097[DOI:10.11834/jig.190455]
Hubbard T L. 2018. Spatial Biases in Perception and Cognition. Cambridge: Cambridge University Press[DOI:10.1017/9781316651247]
Juang L H and Wu M N. 2015. Fall down detection under smart home system. Journal of Medical Systems, 39: #107[DOI:10.1007/s10916-015-0286-3]
Kirschner P A, Sweller J, Kirschner F and Zambrano R J. 2018. From cognitive load theory to collaborative cognitive load theory. International Journal of Computer-Supported Collaborative Learning, 13(2): 213-233[DOI:10.1007/s11412-018-9277-y]
Koch C and Ullman S. 1985. Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology, 4(4): 219-227
Kovalerchuk B. 2017. Visual cognitive algorithms for high-dimensional data and super-intelligence challenges. Cognitive Systems Research, 45: 95-108[DOI:10.1016/j.cogsys.2017.05.007]
Li N, Zhao X B, Ma B Y and Zou X C. 2018. A visual attention model based on human visual cognition//Proceedings of International Conference on Brain Inspired Cognitive Systems. Xi'an, China: Springer: 271-281[DOI:10.1007/978-3-030-00563-4_26]
Liu Y J, Yu M J, Fu Q F, Chen W F, Liu Y and Xie L X. 2016. Cognitive mechanism related to line drawings and its applications in intelligent process of visual media: a survey. Frontiers of Computer Science, 10(2): 216-232[DOI:10.1007/s11704-015-4450-1]
Ma G, Yang X, Zhang B, Qi B Y and Shi Z Z. 2015. An environment visual awareness approach in cognitive model ABGP//Proceedings of the 27th IEEE International Conference on Tools with Artificial Intelligence. Vietri sul Mare, Italy: IEEE: 744-751[ DOI: 10.1109/ICTAI.2015.111 http://dx.doi.org/10.1109/ICTAI.2015.111 ]
Martinez-Hernandez U and Dehghani-Sanij A A. 2018. Adaptive Bayesian inference system for recognition of walking activities and prediction of gait events using wearable sensors. Neural Networks, 102: 107-119[DOI:10.1016/j.neunet.2018.02.017]
Martinez-Villaseñor L and Ponce H. 2020. Design and analysis for fall detection system simplification. Journal of Visualized Experiments, #158[DOI:10.3791/60361]
Nazemi K. 2018. Intelligent visual analytics——a human-adaptive approach for complex and analytical tasks//Karwowski W and Ahram T, eds. Intelligent Human Systems Integration. Dubai, United Arab Emirates: Springer: 180-190[ DOI: 10.1007/978-3-319-73888-8_29 http://dx.doi.org/10.1007/978-3-319-73888-8_29 ]
Nussbaumer A, Verbert K, Hillemann E C, Bedek M A and Albert D. 2016. A framework for cognitive bias detection and feedback in a visual analytics environment//Proceedings of 2016 European Intelligence and Security Informatics Conference. Uppsala, Sweden: IEEE: 148-151[ DOI: 10.1109/EISIC.2016.038 http://dx.doi.org/10.1109/EISIC.2016.038 ]
Pan Y H. 2016. Heading toward artificial intelligence 2.0. Engineering, 2(4): 409-413[DOI:10.1016/J.ENG.2016.04.018]
Puchkin N and Spokoiny V. 2020. An adaptive multiclass nearest neighbor classifier. ESAIM: Probability&Statistics, 24: 69-99[DOI:10.1051/ps/2019021]
Rao H Y. 2002. Dynamic Memory Representation of Visual Information and Cognitive Neuroscience Research on Visual Pathway. Hefei: University of Science and Technology of China
饶恒毅. 2002. 视觉信息的动态记忆表征及视通路的认知神经科学研究. 合肥: 中国科学技术大学
Shi J W, Zhu Q G, Chen Y J, Wu J and Xiong R. 2019a. Human visual perception based image quality assessment for video prediction//Proceedings of 2019 Chinese Automation Congress (CAC). Hangzhou, China: IEEE: 3205-3210[ DOI: 10.1109/CAC48633.2019.8996234 http://dx.doi.org/10.1109/CAC48633.2019.8996234 ]
Shi J X, Zhang H W and Li J Z. 2019b. Explainable and explicit visual reasoning over scene graphs//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 8368-8376[ DOI: 10.1109/CVPR.2019.00857 http://dx.doi.org/10.1109/CVPR.2019.00857 ]
Shi Z. 2017. Chapter 5: Visual Perception//Mind Computation. New Jersey: World Scientific: 183-217[ DOI: 10.1142/9789813145818_0005 http://dx.doi.org/10.1142/9789813145818_0005 ]
Taffar M and Miguet S. 2019. Local appearance modeling forobjects class recognition. Pattern Analysis and Applications, 22(2): 439-455[DOI:10.1007/s10044-017-0639-2]
Wang W G, Shen J B and Jia Y D. 2019. Review of visual attention detection. Journal of Software, 30(2): 416-439
王文冠, 沈建冰, 贾云得. 2019. 视觉注意力检测综述. 软件学报, 30(2): 416-439[DOI:10.13328/j.cnki.jos.005636]
Wu W C, Zheng Y X, Chen K Y, Wang X Y and Cao N. 2018. A visual analytics approach for equipment condition monitoring in smart factories of process industry//Proceedings of 2018 IEEE Pacific Visualization Symposium. Kobe, Japan: IEEE: 140-149[ DOI: 10.1109/PacificVis.2018.00026 http://dx.doi.org/10.1109/PacificVis.2018.00026 ]
Xi Y, Zheng J B, He X J, Jia W J, Li H H, Xie Y F, Feng M C and Li X X. 2020. Beyond context: exploring semantic similarity for small object detection in crowded scenes. Pattern Recognition Letters, 137: 53-60[DOI:10.1016/j.patrec.2019.03.009]
Yang X G, Ma M S, Li W P and Xie X L. 2019. Intelligent visual enhancement system. Journal of Physics: Conference Series, 1168(2): #022002[DOI:10.1088/1742-6596/1168/2/022002]
Zhang D, Lu N, Li Y Z, Teng F and Wang L. 2018. Research situation analysis of intelligent visual perception and understanding. Computer Engineering and Applications, 54(19): 18-25, 33
张迪, 鲁宁, 李宜展, 滕飞, 王丽. 2018. 智能视觉感知与理解研究态势分析. 计算机工程与应用, 54(19): 18-25, 33[DOI:10.3778/j.issn.1002-8331.1808-0125]
Zhang H, Huang J, Tian F, Dai G Z and Wang H A. 2019. Trajectory prediction model for crossing-based target selection. Virtual Reality and Intelligent Hardware, 1(3): 330-340[DOI:10.3724/SP.J.2096-5796.2019.0017]
Zhang H B, Lan X G, Zhou X W, Tian Z Q, Zhang Y and Zheng N N. 2018. Robotic grasping in multi-object stacking scenes based on visual reasoning. Scientia Sinica Technologica, 48(12): 1341-1356
张翰博, 兰旭光, 周欣文, 田智强, 张扬, 郑南宁. 2018. 基于视觉推理的机器人多物体堆叠场景抓取方法. 中国科学(技术科学), 48(12): 1341-1356[DOI:10.1360/N092018-00169]
Zheng N N, Liu Z Y, Ren P J, Ma Y Q, Chen S T, Yu S Y, Xue J R, Chen B D and Wang F Y. 2017. Hybrid-augmented intelligence: collaboration and cognition. Frontiers of Information Technology and Electronic Engineering, 18(2): 153-179[DOI:10.1631/FITEE.1700053]
Zhou J J and Wang P Y. 2019. Development of the visual perception technology and some intelligent suggestions. Video Engineering, 43(5): 91-97
周建军, 王培元. 2019. 视觉感知技术发展及智能化建议. 电视技术, 43(5): 91-97[DOI:10.16280/j.videoe.2019.05.025]
Zibafar A, Saffari E, Alemi M, Meghdari A, Faryan L, Pour A G, RezaSoltani A and Taheri A. 2019. State-of-the-art visual merchandising using a fashionable social robot: RoMa. International Journal of Social Robotics, 13: 509-523[DOI:10.1007/s12369-019-00566-3]
相关作者
相关机构
京公网安备11010802024621