郑雅菁1, 余肇飞1,2, 黄铁军1,2(1.北京大学计算机学院视频与视觉技术国家工程研究中心, 北京 100871;2.北京大学人工智能研究院, 北京 100871)
生物视觉系统的研究一直是计算机视觉算法的重要灵感来源。有许多计算机视觉算法与生物视觉研究具有不同程度的对应关系，包括从纯粹的功能启发到用于解释生物观察的物理模型的方法。从视觉神经科学向计算机视觉界传达的经典观点是视觉皮层分层层次处理的结构。而人工神经网络设计的灵感来源正是视觉系统中的分层结构设计。深度神经网络在计算机视觉和机器学习等领域都占据主导地位。许多神经科学领域的学者也开始将深度神经网络应用在生物视觉系统的计算建模中。深度神经网络多层的结构设计加上误差的反向传播训练，使得它可以拟合绝大多数函数。因此，深度神经网络在学习视觉刺激与神经元响应的映射关系并取得目前性能最好的模型同时，网络内部的单元甚至学习出生物视觉系统子单元的表达。本文将从视网膜等初级视觉皮层和高级视觉皮层（如，视觉皮层第4区（visual area 4，V4）和下颞叶皮层（inferior temporal，IT））分别介绍基于神经网络的视觉系统编码模型。主要内容包括：1）有关视觉系统模型的概念与定义；2）初级视觉系统的神经网络预测模型；3）任务驱动的高级视觉皮层编码模型。最后本文还将介绍最新有关无监督学习的神经编码模型，并展望基于神经网络的视觉系统编码模型的技术挑战与可能的发展方向。
A literature review for neural networks-based encoding models of biological visual system
Zheng Yajing1, Yu Zhaofei1,2, Huang Tiejun1,2(1.School of Computer Science, National Engineering Center of Visual Technology, Peking University, Beijing 100871, China;2.Institute for Artificial Intelligence, Peking University, Beijing 100871, China)
The biological visual system, an important part of the brain's nervous system, has evolved over hundreds of millions of years. About 70% of the information that humans obtain from the outside world comes from vision. Its complicated systematic functions are relevant to visual pathways and visual cortex, as well as its mechanism. Human-perceptive and energy-efficient vision ability is better than machine-based vision system like real-time sensor data processing, perception tasks and motion control. To realize a more advanced machine vision paradigm, it is still challenged to learn from the design of biological ingenious vision system effectively. The biological vision systems-contextual researches can be recognized as one of the key aspects for computer vision algorithms. Conventional visual neuroscience to the computer vision domain is focused on the structure of hierarchical processing in the visual cortex. The following artificial neural networks (ANNs) are targeted on the hierarchical structure design in the visual system. Visual system is mainly composed of the eyes (retina), the lateral geniculate nucleus and the visual cortex (including the primary visual cortex and the striatal cortex). The human visual cortex and its relevance account for about 1/3 area of the cerebral cortex. It has the ability for visual information-related (e.g., extraction, processing and integration) and advanced brain functions-organized (e.g., learning, memory, decision-making, and emotion). For example, for the task of object recognition, the human brain can identify thousands of objects effectively, but this challenging issue is required to be resolved for machine-relevant. In recent years, deep neural networks (DNNs) have been projecting for computer vision and machine learning. To fit more multiple functions of network, the DNN plus multi-layer structure is designed for the back-propagation training. The biological visual system can be used to recognize as the mapping-learnt relationship between the external visual information and the internal neuron expression. In addition, the neural network itself is a biological visual system-derived multi-layer structure design. Nowadays, the DNNs are the most accurate model for learning the mapping relationship between visual stimuli and neuron responses. The internal units of the ANN can learn the expressions of the internal subunits of the visual system further. The DNNs-hierarchical can predict the visual representation of visual neural response as well (e.g., V1, V2 and interior temporal of visual cortex). Furthermore, the latest unsupervised learning is employed to visual cortex. To outreach a new generation of general artificial intelligence (AGI), the research and development of ANNs and the exploration of brain function and its structure can be mutual-benefited. Our visual system-based review is focused on neural network-based coding models on the basis of primary visual cortex like retina and advanced visual cortex (e.g., V4, IT area). The main literatures are involved in:1) concept and definition of the visual system model, 2) the neural network prediction model of the primary visual system, and 3) the goal-driven advanced visual cortex coding model. The latest unsupervised learning reseaches are reviewed and summarized literatly. Technical challenges and future development directions of its neural network-encoding model are predicted further.