fMRI的视觉神经信息编解码方法综述

杜长德; 周琼怡; 刘澈; 何晖光

doi:10.11834/jig.220525

类脑视觉 | 浏览量 : 0 下载量: 0 CSCD: 0

PDF
导出
分享
收藏
专辑

fMRI的视觉神经信息编解码方法综述
Review of visual neural encoding and decoding methods in fMRI
2023年28卷第2期页码：372-384
纸质出版日期： 2023-02-16 ，

录用日期： 2022-07-24
DOI： 10.11834/jig.220525
稿件说明：

移动端阅览

杜长德, 周琼怡, 刘澈, 何晖光. fMRI的视觉神经信息编解码方法综述[J]. 中国图象图形学报, 2023,28(2):372-384.

Changde Du, Qiongyi Zhou, Che Liu, Huiguang He. Review of visual neural encoding and decoding methods in fMRI[J]. Journal of Image and Graphics, 2023,28(2):372-384.
杜长德, 周琼怡, 刘澈, 何晖光. fMRI的视觉神经信息编解码方法综述[J]. 中国图象图形学报, 2023,28(2):372-384. DOI： 10.11834/jig.220525.

Changde Du, Qiongyi Zhou, Che Liu, Huiguang He. Review of visual neural encoding and decoding methods in fMRI[J]. Journal of Image and Graphics, 2023,28(2):372-384. DOI： 10.11834/jig.220525.

摘要

视觉神经信息编解码旨在利用功能磁共振成像（functional magnetic resonance imaging，fMRI）等神经影像数据研究视觉刺激与大脑神经活动之间的关系。编码研究可以对神经活动模式进行建模和预测，有助于脑科学与类脑智能的发展；解码研究可以对人的视知觉状态进行解译，能够促进脑机接口领域的发展。因此，基于fMRI的视觉神经信息编解码方法研究具有重要的科学意义和工程价值。本文在总结基于fMRI的视觉神经信息编解码关键技术与研究进展的基础上，分析现有视觉神经信息编解码方法的局限。在视觉神经信息编码方面，详细介绍了基于群体感受野估计方法的发展过程；在视觉神经信息解码方面，首先，按照任务类型将其划分为语义分类、图像辨识和图像重建3个部分，并深入阐述了每个部分的代表性研究工作和所用的方法。特别地，在图像重建部分着重介绍了基于深度生成模型（主要包括变分自编码器和生成对抗网络）的简单图像、人脸图像和复杂自然图像的重建技术。其次，统计整理了该领域常用的10个开源数据集，并对数据集的样本规模、被试个数、刺激类型、研究用途及下载地址进行了详细归纳。最后，详细介绍了视觉神经信息编解码模型常用的度量指标，分析了当前视觉神经信息编码和解码方法的不足，提出可行的改进意见，并对未来发展方向进行展望。

Abstract

The relationship between human visual experience and evoked neural activity is central to the field of computational neuroscience. The purpose of visual neural encoding and decoding is to study the relationship between visual stimuli and the evoked neural activity by using neuroimaging data such as functional magnetic resonance imaging (fMRI). Neural encoding researches attempt to predict the brain activity according to the presented external stimuli

which contributes to the development of brain science and brain-like artificial intelligence. Neural decoding researches attempt to predict the information about external stimuli by analyzing the observed brain activities

which can interpret the state of human visual perception and promote the development of brain computer interface (BCI). Therefore

fMRI based visual neural encoding and decoding researches have important scientific significance and engineering value. Typically

the encoding models are based on the specific computations that are thought to underlie the observed brain responses for specific visual stimuli. Early studies of visual neural encoding relied heavily on Gabor wavelet features because these features are very good at modeling brain responses in the primary visual cortex. Recently

given the success of deep neural networks (DNNs) in classifying objects in natural images

the representations within these networks have been used to build encoding models of cortical responses to complex visual stimuli. Most of the existing decoding studies are based on multi-voxel pattern analysis (MVPA) method

but brain connectivity pattern is also a key feature of the brain state and can be used for brain decoding. Although recent studies have demonstrated the feasibility of decoding the identity of binary contrast patterns

handwritten characters

human facial images

natural picture/video stimuli and dreams from the corresponding brain activation patterns

the accurate reconstruction of the visual stimuli from fMRI still lacks adequate examination and requires plenty of efforts to improve. On the basis of summarizing the key technologies and research progress of fMRI based visual neural encoding and decoding

this paper further analyzes the limitations of existing visual neural encoding and decoding methods. In terms of visual neural encoding

the development process of population receptive field (pRF) estimation method is introduced in detail. In terms of visual neural decoding

it is divided into semantic classification

image identification and image reconstruction according to task types

and the representative research work of each part and the methods used are described in detail. From the perspective of machine learning

semantic classification is a single label or multi-label classification problem. Simple visual stimuli only contain a single object

while natural visual stimuli often contain multiple semantic labels. For example

an image may contain flowers

water

trees

cars

etc. Predicting one or more semantic labels of the visual stimulus from the brain signal is called semantic decoding. Image retrieval based on brain signal is also a common visual decoding task where the model is created to "decode" neural activity by retrieving a picture of what a person has just seen or imagined. In particular

the reconstruction techniques of simple image

face image and complex natural image based on deep generative models (including variational auto-encoders (VAEs) and generative adversarial networks (GANs)) are introduced in the part of image reconstruction. Secondly

10 open source datasets commonly used in this field were statistically sorted out

and the sample size

number of subjects

types of stimuli

research purposes and download links of the datasets were summarized in detail. These datasets have made important contributions to the development of this field. Finally

we introduce the commonly used measurement metrics of visual neural encoding and decoding model in detail

analyze the shortcomings of current visual neural encoding and decoding methods

propose feasible suggestions for improvement

and show the future development directions. Specifically

for neural encoding

the existing methods still have the following shortcomings: 1) the computational models are mostly based on the existing neural network architecture

which cannot reflect the real biological visual information flow; 2) due to the selective attention of each person in the visual perception and the inevitable noise in the fMRI data collection

individual differences are significant; 3) the sample size of the existing fMRI data set is insufficient; 4) most researchers construct feature spaces of neural encoding models based on fixed types of pre-trained neural networks (such as AlexNet)

causing problems such as insufficient diversity of visual features. On the other hand

although the existing visual neural decoding methods perform well in the semantic classification and image identification tasks

it is still very difficult to establish an accurate mapping between visual stimuli and visual neural signals

and the results of image reconstruction are often blurry and lack of clear semantics. Moreover

most of the existing visual neural decoding methods are based on linear transformation or deep network transformation of visual images

lacking exploration of new visual features. Factors that hinder researchers from effectively decoding visual information and reconstructing images or videos mainly include high dimension of fMRI data

small sample size and serious noise. In the future

more advanced artificial intelligence technology should be used to develop more effective methods of neural encoding and decoding

and try to translate brain signals into images

video

voice

text and other multimedia content

so as to achieve more BCI applications. The significant research directions include 1) multi-modal neural encoding and decoding based on the union of image and text; 2) brain-guided computer vision model training and enhancement; 3) visual neural encoding and decoding based on the high efficient features of large-scale pre-trained models. In addition

since brain signals are characterized by complexity

high dimension

large individual diversity

high dynamic nature and small sample size

future research needs to combine computational neuroscience and artificial intelligence theories to develop visual neural encoding and decoding methods with higher robustness

adaptability and interpretability.

关键词

神经编码神经解码图像重建视觉认知计算深度学习脑机接口(BCI)

Keywords

neural encodingneural decodingimage reconstructionvisual cognitive computingdeep learningbrain computer interface(BCI)

references

Allen E J, St-Yves G, Wu Y H, Breedlove J L, Prince J S, Dowdle L T, Nau M, Caron B, Pestilli F, Charest I, Hutchinson J B, Naselaris T and Kay K. 2022. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence. Nature Neuroscience, 25(1): 116-126 [DOI: 10.1038/s41593-021-00962-x]

Chang N, Pyles J A, Marcus A, Gupta A, Tarr M J and Aminoff E M. 2019. BOLD5000, a public fMRI dataset while viewing 5000 visual images. Scientific Data, 6(1): #49 [DOI: 10.1038/s41597-019-0052-3]

Cowen A S, Chun M M and Kuhl B A. 2014. Neural portraits of perception: reconstructing face images from evoked brain activity. NeuroImage, 94: 12-22 [DOI: 10.1016/j.neuroimage.2014.03.018]

Cui Y B, Qiao K, Zhang C, Wang L Y, Yan B and Tong L. 2021. GaborNet visual encoding: a lightweight region-based visual encoding model with good expressiveness and biological interpretability. Frontiers in Neuroscience, 15: #614182 [DOI: 10.3389/fnins.2021.614182]

Deng J, Dong W, Socher R, Li L J, Li K and Li F F. 2009. ImageNet: a large-scale hierarchical image database//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami, USA: IEEE: 248-255 [DOI: 10.1109/cvpr.2009.5206848http://dx.doi.org/10.1109/cvpr.2009.5206848]

Du C D, Du C Y, Huang L J and He H G. 2019. Reconstructing perceived images from human brain activities with Bayesian deep multiview learning. IEEE Transactions on Neural Networks and Learning Systems, 30(8): 2310-2323 [DOI: 10.1109/tnnls.2018.2882456]

Du C D, Du C Y, Huang L J, Wang H B and He H G. 2022. Structured neural decoding with multi-task transfer learning of deep neural network representations. IEEE Transactions on Neural Networks and Learning Systems, 33(2): 600-614 [DOI: 10.1109/tnnls.2020.3028167]

Dumoulin S O and Wandell B A. 2008. Population receptive field estimates in human visual cortex. NeuroImage, 39(2): 647-660 [DOI: 10.1016/j.neuroimage.2007.09.034]

Fang T, Qi Y and Pan G. 2020. Reconstructing perceptive images from brain activity by shape-semantic GAN//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc. : 13038-13048

Fujiwara Y, Miyawaki Y and Kamitani Y. 2013. Modular encoding and decoding models derived from Bayesian canonical correlation analysis. Neural Computation, 25(4): 979-1005 [DOI: 10.1162/neco_a_00423]

Güçlütürk Y, Güçlü U, Seeliger K, Bosch S, van Lier R and van Gerven M A J. 2017. Reconstructing perceived faces from brain activations with deep adversarial neural decoding//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. : 4249-4260

Haxby J V, Gobbini M I, Furey M L, Ishai A, Schouten J L and Pietrini P. 2001. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science, 293(5539): 2425-2430 [DOI: 10.1126/science.1063736]

Haynes J D and Rees G. 2006. Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7): 523-534 [DOI: 10.1038/nrn1931]

Horikawa T and Kamitani Y. 2017. Generic decoding of seen and imagined objects using hierarchical visual features. Nature Communications, 8(1): #15037 [DOI: 10.1038/ncomms15037]

Huth A G, Lee T, Nishimoto S, Bilenko N Y, Vu A T and Gallant J L. 2016. Decoding the semantic content of natural movies from human brain activity. Frontiers in Systems Neuroscience, 10: #81 [DOI: 10.3389/fnsys.2016.00081]

Huth A G, Nishimoto S, Vu A T and Gallant J L. 2012. A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron, 76(6): 1210-1224 [DOI: 10.1016/j.neuron.2012.10.014]

Kamitani Y and Tong F. 2005. Decoding the visual and subjective contents of the human brain. Nature Neuroscience, 8(5): 679-685 [DOI: 10.1038/nn1444]

Kay K N, Naselaris T, Prenger R J and Gallant J L. 2008. Identifying natural images from human brain activity. Nature, 452(7185): 352-355 [DOI: 10.1038/nature06713]

Kay K N, Winawer J, Rokem A, Mezer A and Wandell B A. 2013. A two-stage cascade model of BOLD responses in human visual cortex. PLoS Computational Biology, 9(5): #e1003079 [DOI: 10.1371/journal.pcbi.1003079]

Khosla M, Ngo G H, Jamison K, Kuceyeski A and Sabuncu M R. 2020. Neural encoding with visual attention. Advances in Neural Information Processing Systems, 33: 15942-15953

Kriegeskorte N, Mur M and Bandettini P A. 2008. Representational similarity analysis-connecting the branches of systems neuroscience. Frontiers in Systems Neuroscience, 2: #4 [DOI: 10.3389/neuro.06.004.2008]

Lee S, Papanikolaou A, Logothetis N K, Smirnakis S M and Keliris G A. 2013. A new method for estimating population receptive field topography in visual cortex. NeuroImage, 81: 144-157 [DOI: 10.1016/j.neuroimage.2013.05.026]

Li D, Du C D, Huang L J, Chen Z Q and He H G. 2018. Multi-label semantic decoding from human brain activity//Proceedings of the 24th International Conference on Pattern Recognition (ICPR). Beijing, China: IEEE: 3796-3801 [DOI: 10.1109/icpr.2018.8545855http://dx.doi.org/10.1109/icpr.2018.8545855]

Li D, Du C D, Wang H B, Zhou Q Y and He H G. 2022. Deep modality assistance co-training network for semi-supervised multi-label semantic decoding. IEEE Transactions on Multimedia, 24: 3287-3299 [DOI: 10.1109/tmm.2021.3104980]

Li D, Du C D, Wang S P, Wang H B and He H G. 2021. Multi-subject data augmentation for target subject semantic decoding with deep multi-view adversarial learning. Information Sciences, 547: 1025-1044 [DOI: 10.1016/j.ins.2020.09.012]

Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755 [DOI: 10.1007/978-3-319-10602-1_48http://dx.doi.org/10.1007/978-3-319-10602-1_48]

Liu Z W, Luo P, Wang X G and Tang X O. 2015. Deep learning face attributes in the wild//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3730-3738 [DOI: 10.1109/iccv.2015.425http://dx.doi.org/10.1109/iccv.2015.425]

Miyawaki Y, Uchida H, Yamashita O, Sato M A, Morito Y, Tanabe H C, Sadato N and Kamitani Y. 2008. Visual image reconstruction from human brain activity using a combination of multiscale local image decoders. Neuron, 60(5): 915-929 [DOI: 10.1016/j.neuron.2008.11.004]

Naselaris T, Kay K N, Nishimoto S and Gallant J L. 2011. Encoding and decoding in fMRI. NeuroImage, 56(2): 400-410 [DOI: 10.1016/j.neuroimage.2010.07.073]

Naselaris T, Prenger R J, Kay K N, Oliver M and Gallant J L. 2009. Bayesian reconstruction of natural images from human brain activity. Neuron, 63(6): 902-915 [DOI: 10.1016/j.neuron.2009.09.006]

Nishimoto S, Vu A T, Naselaris T, Benjamini Y, Yu B and Gallant J L. 2011. Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology, 21(19): 1641-1646 [DOI: 10.1016/j.cub.2011.08.031]

Norman K A, Polyn S M, Detre G J and Haxby J V. 2006. Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9): 424-430 [DOI: 10.1016/j.tics.2006.07.005]

Schmah T, Hinton G E, Zemel R S, Small S L and Strother S. 2008. Generative versus discriminative training of RBMs for classification of fMRI images//Proceedings of the 21st International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc. : 1409-1416

Schoenmakers S, Barth M, Heskes T and van Gerven M. 2013. Linear reconstruction of perceived images from human brain activity. NeuroImage, 83: 951-961 [DOI: 10.1016/j.neuroimage.2013.07.043]

Shen G H, Horikawa T, Majima K and Kamitani Y. 2019. Deep image reconstruction from human brain activity. PLoS Computational Biology, 15(1): #e1006633 [DOI: 10.1371/journal.pcbi.1006633]

Stansbury D E, Naselaris T and Gallant J L. 2013. Natural scene statistics account for the representation of scene categories in human visual cortex. Neuron, 79(5): 1025-1034 [DOI: 10.1016/j.neuron.2013.06.034]

St-Yves G and Naselaris T. 2018. The feature-weighted receptive field: an interpretableencoding model for complex feature spaces. NeuroImage, 180: 188-202 [DOI: 10.1016/j.neuroimage.2017.06.035]

Van Gerven M A J, De Lange F P and Heskes T. 2010. Neural decoding with hierarchical generative models. Neural Computation, 22(12): 3127-3142 [DOI: 10.1162/neco_a_00047]

VanRullen R and Reddy L. 2019. Reconstructing faces from fMRI patterns using deep generative neural networks. Communications Biology, 2(1): #193 [DOI: 10.1038/s42003-019-0438-y]

Wang C, Yan H M, Huang W, Li J Y, Wang Y T, Fan Y S, Sheng W, Liu T, Li R and Chen H F. 2022. Reconstructing rapid natural vision with fMRI-conditional video generative adversarial network. Cerebral Cortex, 32(20): 4502-4511 [DOI: 10.1093/cercor/bhab498]

Wang H B, Huang L J, Du C D, Li D, Wang B and He H G. 2021. Neural encoding for human visual cortex with deep neural networks learning "What" and "Where". IEEE Transactions on Cognitive and Developmental Systems, 13(4): 827-840 [DOI: 10.1109/tcds.2020.3007761]

Wang Z, Bovik A C, Sheikh H R and Simoncelli E P. 2004. Image quality assessment: from error visibility to structural similarity. IEEE Transactions on Image Processing, 13(4): 600-612 [DOI: 10.1109/tip.2003.819861]

Wen H G, Shi J X, Zhang Y Z, Lu K H, Cao J Y and Liu Z M. 2018. Neural encoding and decoding with deep learning for dynamic natural vision. Cerebral Cortex, 28(12): 4136-4160 [DOI: 10.1093/cercor/bhx268]

Xiao J X, Hays J, Ehinger K A, Oliva A and Torralba A. 2010. SUN database: large-scale scene recognition from abbey to zoo//Proceedings of 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Francisco, USA: IEEE: 3485-3492 [DOI: 10.1109/cvpr.2010.5539970http://dx.doi.org/10.1109/cvpr.2010.5539970]

Ye H H, He H J, Fang J W, Tong Q Q, Zhou Z H and Liu H F. 2022. Research progress of quantitative multimodal brain imaging technology. Journal of Image and Graphics, 27(6): 1944-1955

叶慧慧, 何宏建, 方静宛, 童琪琦, 周子涵, 刘华锋. 2022. 大脑多模态成像技术定量研究进展. 中国图象图形学报, 27(6): 1944-1955 [DOI: 10.11834/jig.220153]

Zeidman P, Silson E H, Schwarzkopf D S, Baker C I and Penny W. 2018. Bayesian population receptive field modelling. NeuroImage, 180: 173-187 [DOI: 10.1016/j.neuroimage.2017.09.008]

Zhang H Y, Wang T B, Li M Z, Zhao Z, Pu S L and Wu F. 2022. Comprehensive review of visual-language-oriented multimodal pre-training methods. Journal of Image and Graphics, 27(9): 2652-2682

张浩宇, 王天保, 李孟择, 赵洲, 浦世亮, 吴飞. 2022. 视觉语言多模态预训练综述. 中国图象图形学报, 27(9): 2652-2682 [DOI: 10.11834/jig.220173]

Zhou Q Y, Du C D, Li D, Wang H B, Liu K J and He H G. 2022a. Neural encoding and decoding with a flow-based invertible generative model. IEEE Transactions on Cognitive and Developmental Systems. Early Access,https://ieeexplore.ieee.org/document/9780264https://ieeexplore.ieee.org/document/9780264[DOI: 10.1109/TCDS.2022.3176977http://dx.doi.org/10.1109/TCDS.2022.3176977].

Zhou Q Y, Du C D, He H G. 2022b. Exploring the brain-like properties of deep neural networks: a neural encoding perspective. Machine Intelligence Research, 19(5): 439-455 [DOI: 10.1007/s11633-022-1348-x]

Zuiderbaan W, Harvey B M and Dumoulin S O. 2012. Modeling center-surround configurations in population receptive fields using fMRI. Journal of Vision, 12(3): #10 [DOI: 10.1167/12.3.10]

文章被引用时，请邮件提醒。

提交