多视角神经网络非接触式脉搏信号提取
Noncontact pulse signal extraction based on multiview neural network
- 2020年25卷第11期 页码:2428-2438
纸质出版日期: 2020-11-16 ,
录用日期: 2020-09-27
DOI: 10.11834/jig.200415
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2020-11-16 ,
录用日期: 2020-09-27
移动端阅览
赵昶辰, 居峰, 冯远静. 多视角神经网络非接触式脉搏信号提取[J]. 中国图象图形学报, 2020,25(11):2428-2438.
Changchen Zhao, Feng Ju, Yuanjing Feng. Noncontact pulse signal extraction based on multiview neural network[J]. Journal of Image and Graphics, 2020,25(11):2428-2438.
目的
2
远程光体积描记(remote photoplethysmography,rPPG)是一种基于视频的非接触式心率测量技术,受到学者的广泛关注。从视频数据中提取脉搏信号需要同时考虑时间和空间信息,然而现有方法往往将空间处理与时间处理割裂开,从而造成建模不准确、测量精度不高等问题。本文提出一种基于多视角2维卷积的神经网络模型,对帧内和帧间相关性进行建模,从而提高测量精度。
方法
2
所提网络包括普通2维卷积块和多视角卷积块。普通2维卷积块将输入数据在空间维度做初步抽象。多视角卷积块包括3个通道,分别从输入数据的高—宽、高—时间、宽—时间3个视角进行2维卷积操作,再将3个视角的互补时空特征进行融合得到最终的脉搏信号。所提多视角2维卷积是对传统单视角2维卷积网络在时间维度的扩展。该方法不破坏视频原有结构,通过3个视角的卷积操作挖掘时空互补特征,从而提高脉搏测量精度。
结果
2
在公共数据集PURE(pulse rate detection dataset)和自建数据集Self-rPPG(self-built rPPG dataset)上的实验结果表明,所提网络提取脉搏信号的信噪比相比于传统方法在两个数据集上分别提高了3.92 dB和1.92 dB,平均绝对误差分别降低了3.81 bpm和2.91 bpm;信噪比相比于单视角网络分别提高了2.93 dB和3.20 dB,平均绝对误差分别降低了2.20 bpm和3.61 bpm。
结论
2
所提网络能够在复杂环境中以较高精度估计出受试者的脉搏信号,表明了多视角2维卷积在rPPG脉搏提取的有效性。与基于单视角2维神经网络的rPPG算法相比,本文方法提取的脉搏信号噪声、低频分量更少,泛化能力更强。
Objective
2
Remote photoplethysmography (rPPG) has recently attracted considerable research attention due to its capability of measuring blood volume pulse from video recordings using computer vision techniques without any physical contact with the subject. Extracting pulse signals from video data requires the simultaneous consideration of both spatial and temporal information. However
such signals commonly processed separately using different methods can result in inaccurate modeling and low measurement accuracy. A multiview 2D convolutional neural network for pulse extraction from video is proposed to model the intra- and interframe correlation of video data from three points of view. This study aims to investigate the effective spatiotemporal modeling method for rPPG and improve the pulse measurement accuracy.
Method
2
The proposed network contains three pathways. The network performs 2D convolution operations in a given video segment from three perspectives of input data
namely
height-width
height-time
and width-time
and then integrates complementary spatiotemporal features of the three perspectives to obtain the final pulse signal
which is called multiview heart rate network (MVHRNet). MVHRNet consists of two normal (
H
-
W
) convolutional blocks and three multiview (height-width
height-time
width-time (
H
-
W
H
-
T
and
W
-
T
)) 2D convolutional blocks. Each convolutional block (except the last block) includes dropout
convolutional
pooling
and batch normalization layers. The input and output of the network are a video clip and a predicted pulse signal
respectively. Multiview 2D convolution is a natural generalization of the single-view 2D convolution to all the three viewpoints of volumetric data. The normal 2D convolution in
H
-
W
view is taken as an example.
H
-
W
filters go through one image from left to right
top to bottom
move to next frame (slice)
and repeat the process. Filters can learn the spatial correlation within each slice in
H
-
W
view in this manner. Similarly
the same process is performed on each slice in
H
-
T
view to ensure that filters can learn the correlation within
H
-
T
view
which is the partial temporal information of the video clip. The convolution in
W
-
T
view can learn the temporal information within W-T. Compared with rPPG methods
the proposed method simultaneously models the spatiotemporal information
preserves the original structure of the video
and exploits the complimentary spatiotemporal features by performing a three-view 2D convolution.
Result
2
Extensive experiments on two datasets (one public dataset pulse rate detection dataset(PURE) and one self-built dataset self-built rPPG dataset(Self-rPPG)) are conducted
including an ablation study
comparison experiments
and cross-data set testing. Experimental results showed that the signal-to-noise ratio (SNR) of the extracted signal via the proposed network is 3.92 dB and 1.92 dB higher than that of the signal extracted using traditional methods on two datasets
respectively
and 2.93 dB and 3.2 dB higher than the single-view network on two datasets
respectively. We also evaluate the impact of the window length of the input video clip on the quality of the extracted signal. The results showed that the SNR of the extracted signal increases and the mean absolute error (MAE) decreases as the window length of the input video clip increases. SNR and MAE tend to saturate when
T
is greater than 120. The experimental results showed that the training times of multiview and single-view networks have the same order of magnitude.
Conclusion
2
Spatiotemporal correlation in videos can be effectively modeled using multiview 2D convolution. Compared with traditional rPPG methods (plane-orthogonal-to-skin (POS) and chrominance-based methods (CHROM))
SNR of pulse signals extracted via the proposed method using two datasets increases by 52.9% and 42.3%. Compared with the rPPG algorithm based on the single-view 2D convolutional neural network (CNN)
the proposed network can extract pulse signals with less noise
fewer low-frequency components
stronger generalization ability
and nearly equal computational cost. This study demonstrates the effectiveness of multiview 2D CNN in rPPG pulse extraction. Hence
the proposed network outperforms existing methods in extracting pulse signals of subjects in complex environments.
心率测量神经网络远程光体积描记(rPPG)多视角卷积时空特征
heart rate measurementneural networkremote photoplethysmography(rPPG)multiview convolutionspatialtemporal feature
Bertinetto L, Valmadre J, Golodetz S, Miksik O and Torr P H S. 2016. Staple: complementary learners for real-time tracking//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 1401-1409[DOI: 10.1109/CVPR.2016.156http://dx.doi.org/10.1109/CVPR.2016.156]
Bousefsaf F, Pruski A and Maaoui C. 2019.3D convolutional neural networks for remote pulse rate measurement and mapping from facial video. Applied Sciences, 9(20):#4364[DOI:10.3390/app9204364]
Chaichulee S, Villarroel M, Jorge J, Arteta C, Green G, McCormick K, Zisserman A and Tarassenko L. 2017. Multi-task convolutional neural network for patient detection and skin segmentation in continuous non-contact vital sign monitoring//Proceedings of the 12th IEEE International Conference on Automatic Face and Gesture Recognition. Washington DC, USA: IEEE: 266-272[DOI: 10.1109/FG.2017.41http://dx.doi.org/10.1109/FG.2017.41]
Chen W X and McDuff D. 2018. DeepPhys: video-based physiological measurement using convolutional attention networks//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 356-373[DOI: 10.1007/978-3-030-01216-8_22http://dx.doi.org/10.1007/978-3-030-01216-8_22]
de Haan G and Jeanne V. 2013. Robust pulse rate from chrominance-based rPPG. IEEE Transactions on Biomedical Engineering, 60(10):2878-2886[DOI:10.1109/TBME.2013.2266196]
Klaessens J H G M, van den Born M, van der Veen A, Sikkens-van de Kraats J, van den Dungen FAM and Verdaasdonk R M. 2014. Development of a baby friendly non-contact method for measuring vital signs: first results of clinical measurements in an open incubator at a neonatal intensive care unit//Proceedings of SPIE 8935, Advanced Biomedical and Clinical Diagnostic Systems XII. San Francisco, USA: SPIE: #89351P[DOI: 10.1117/12.2038353http://dx.doi.org/10.1117/12.2038353]
Lam A and Kuno Y. 2015. Robust heart rate measurement from video using select random patches//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago, Chile: IEEE: 3640-3648[DOI: 10.1109/ICCV.2015.415http://dx.doi.org/10.1109/ICCV.2015.415]
Liu Y J, Jourabloo A and Liu X M. 2018. Learning deep models for face anti-spoofing: binary or auxiliary supervision//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 389-398[DOI: 10.1109/CVPR.2018.00048http://dx.doi.org/10.1109/CVPR.2018.00048]
Niu X S, Han H, Shan S G and Chen X L. 2018. VIPL-HR: a multi-modal database for pulse estimation from less-constrained face video//Proceedings of the 14th Asian Conference on Computer Vision. Perth, Australia: Springer: 562-576[DOI: 10.1007/978-3-030-20873-8_36http://dx.doi.org/10.1007/978-3-030-20873-8_36]
Niu X S, Zhao X Y, Han H, Das A, Dantcheva A, Shan S G and Chen X L. 2019. Robust remote heart rate estimation from face utilizing spatial-temporal attention//Proceedings of the 14th IEEE International Conference on Automatic Face and Gesture Recognition. Lille, France: IEEE: 1-8[DOI: 10.1109/FG.2019.8756554http://dx.doi.org/10.1109/FG.2019.8756554]
Qi G, Yang X Z, Wu X and Huo L. 2017. Heart rate detection for non-cooperative shaking face. Journal of Image and Graphics, 22(1):126-136
戚刚, 杨学志, 吴秀, 霍亮. 2017.非合作面部晃动情况下的心率检测.中国图象图形学报, 22(1):126-136[DOI:10.11834/jig.20170114]
Qiu Y, Liu Y, Arteaga-Falconi J, Dong H W and El Saddik A. 2019. EVM-CNN:real-time contactless heart rate estimation from facial video. IEEE Transactions on Multimedia, 21(7):1778-1787[DOI:10.1109/TMM.2018.2883866]
Shi J B and Tomasi. 1994. Good features to track//Proceedings of 1994 IEEE Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 593-600[DOI: 10.1109/CVPR.1994.323794http://dx.doi.org/10.1109/CVPR.1994.323794]
Špetlík R. 2018. Robust Visual Heart Rate Estimation. Prague: Czech Technical University in Prague
Stricker R, Müller S and Gross H M. 2014. Non-contact video-based pulse rate measurement on a mobile service robot//Proceedings of the 23rd IEEE International Symposium on Robot and Human Interactive Communication. Edinburgh, UK: IEEE: 1056-1062[DOI: 10.1109/ROMAN.2014.6926392http://dx.doi.org/10.1109/ROMAN.2014.6926392]
Wang W J, den Brinker A C, Stuijk S and de Haan G. 2017a. Robust heart rate from fitness videos. Physiological Measurement, 38(6):1023-1044[DOI:10.1088/1361-6579/aa6 d02]
Wang W J, den Brinker A C, Stuijk S and de Haan G. 2017b. Algorithmic principles of remote PPG. IEEE Transactions on Biomedical Engineering, 64(7):1479-1491[DOI:10.1109/TBME.2016.2609282]
Wu H Y, Rubinstein M, Shih E, Guttag J, Durand F and Freeman W. 2012. Eulerian video magnification for revealing subtle changes in the world. ACM Transactions on Graphics, 31(4):#65[DOI:10.1145/2185520.2185561]
Xiong Q J, Peng L, Wei W and Gao B. 2014. The application of photoplethysmograph in peripheral vascular resistance monitoring. Laser Journal, 35(2):76-78
熊秋菊, 彭玲, 魏蔚, 高博. 2014.探讨脉搏容积波在监测血管阻力方面的应用.激光杂志, 35(2):76-78[DOI:10.3969/j.issn.0253-2743.2014.02.034]
Yang Z, Yang X Z, Huo L, Liu X N and Li J S. 2018. Heart rate estimation from face videos against motion interference. Journal of Electronics and Information Technology, 40(6):1345-1352
杨昭, 杨学志, 霍亮, 刘雪南, 李江山. 2018.抗运动干扰的人脸视频心率估计.电子与信息学报, 40(6):1345-1352[DOI:10.11999/JEIT170824]
Zhang Q, Xu G Q, Wang M, Zhou Y M and Feng W. 2014. Webcam based non-contact real-time monitoring for the physiological parameters of drivers//Proceedings of the 4th Annual IEEE International Conference on Cyber Technology in Automation, Control and Intelligent. Hong Kong, China: IEEE: 648-652[DOI: 10.1109/CYBER.2014.6917541http://dx.doi.org/10.1109/CYBER.2014.6917541]
Zhao C C, Mei P Y, Xu S S, Li Y Q and Feng Y J. 2019. Performance evaluation of visual object detection and tracking algorithms used in remote photoplethysmography//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul, South Korea: IEEE: 1646-1655[DOI: 10.1109/ICCVW.2019.00204http://dx.doi.org/10.1109/ICCVW.2019.00204]
Zhao Q, Meng D Y, Xu Z B, Zuo W M and Zhang L. 2014. Robust principal component analysis with complex noise//Proceedings of the 31st International Conference onInternational Conference on Machine Learning. Beijing, China: JMLR: 55-63
相关作者
相关机构