Heart rate detection for non-cooperative shaking face
Qi Gang, Yang Xuezhi, Wu Xiu, Huo Liang
School of Computer and Information, Hefei University of Technology, Hefei 230009, China
Supported by: National Natural Science Foundation of China (61175033, 61503111);National Natural Science Foundation of Anhui Province, China (1508085SMF222)

# Abstract

Objective Heart rate is one of the important indicators that can directly reflect the health of the human body. Heart rate detection has been applied to many aspects of the medical field, such as physical examination, major surgery, and postoperative treatment. Heart rate detection based on face video processing has recently been performed through a noncontact manner without complex operations and sense of restraint. However, the existing methods cannot predict well in complex realistic scenes, including shaking target. If face detection in video processing is accompanied with face shaking, the facial region of interest is selected inaccurately. Such methods also disregard spatial scale features, which are significant to extract blood volume pulse (BVP) signal. The results of current methods are consequently inadequate. To this end, a new non-contact heart rate detection method based on face video processing is proposed to reduce the influence of face shake and improve precision. Method Our method consists of three major steps. First, we deal with video through a robust face detecting and tracking model to obtain a refined face video in which facial shake is eliminated. Considering that the universal Viola-Jones face detection model generates an incorrect face area when a face is tilted along consecutive frames, discriminative response map fitting is used to detect important feature points for tracking the right face area. For the first frame image, we mark 66 landmark points on the facial organ (eyes, nose, mouth, and facial shape) and four vertexes of facial rectangle. These feature points are then entered into the Kanade-Lucas-Tomasi tracking model to calculate the facial rectangle of subsequent frames. According to the oblique angle of each facial rectangle, the corresponding face image is rotated to a vertical position. Second, the modified face video is handled by a space-time processing algorithm for amplifying the video color variations to separate the spatial scale characteristics of the video and intercept the frequency range of blood volume changes. We average the chrominance of skins under the eyes as clean BVP. Finally, for the BVP signal that belongs to a small sample, frequency domain analysis and iterative Fourier coefficient interpolation are combined to estimate heart rate. Iteration is performed 1 000 times for improved accuracy. Result The proposed method is tested on two different types of face video libraries comprising still and shaking face videos. Each video library contains 60 10-second videos from 20 participants, including twelve men and eight women. We conduct a quantitative analysis for the typical method provided by Poh, the up-to-date method provided by Liu, and our method. Statistically, the overall accuracies of our method in still and shaking face videos are 97.84% and 97.30%, respectively. The accuracy is increased by more than 1% in still face videos and more than 7% in shaking face videos. Conclusion Video-based heart rate detection in complex realistic scenes is affected by facial shaking, which leads to significantly reduced accuracy. Neglecting spatial scale characteristics and the small sample affect detection performance. Hence, this study proposes a novel heart rate detection method applied to complex realistic scenes. We detect and track important facial feature points to effectively analyze the state of facial shaking and adjust the facial slope. After space-time processing for selecting a proper spatial scale, a clean BVP signal is extracted to calculate heart rate iteratively. Experimental results indicate that our method has high accuracy and preferable adaptive performance to cases involving facial shaking.

# Key words

blood volume pulse (BVP); discriminative response map fitting (DRMF); skew correction; video color magnification; heart rate estimation

# 1.1.2 跟踪与倾斜校正

1) 输入第$j$($j$=1, 2, 3…)帧图像作为第0层，计算分解层数L

2) 对前一层图像先进行高斯滤波，经降采样，图像尺寸变为原来的1/4，存为下一层图像，层号加1；

3) 将步骤2)迭代执行L-1次，得到第L层子带图像；

4) $j{\rm{ = }}j{\rm{ + 1}}$，循环执行以上步骤，直到处理完最后一帧图像，跳至下一步；

5) 将每帧的第L层子带图像按时间顺序组合成子带序列，空间分解结束。

# 1.2.2 颜色放大与提取

 $\begin{array}{*{20}{c}} {\mathit{\boldsymbol{B}} = \left\{ {{b_s}\left| {s = 1, 2, 3 \cdots } \right.} \right\} = }\\ {\left\{ {\left( {\sum\limits_{z = 1}^M {{x_s}\left( z \right)} } \right)/M\left| {s = 1, 2, 3 \ldots } \right.} \right\}} \end{array}$ (5)

# 1.3 迭代的心率估计

1) 计算样本数N=300的BVP信号B的功率谱${P_{{\rm{bvp}}}}$和功率最大值所对应的位置T，即

 $\begin{array}{*{20}{c}} {{P_{{\rm{bvp}}}}\left( t \right) = {{\left| {FFT\left( \mathit{\boldsymbol{B}} \right)} \right|}^2}}\\ {T = {\rm{arg}}\;\begin{array}{*{20}{c}} {{\rm{max}}}\\ t \end{array}\left\{ {{P_{{\rm{bvp}}}}(t)} \right\}}\\ {t = 0, 1, \ldots, N-1} \end{array}$ (6)

2) 针对功率谱，执行迭代的傅里叶系数插值算法(基于文献[16])估计心率，迭代插值过程如下：

(1) 初始化傅里叶系数起始偏差${e_0}$=0和迭代次数Q=50。统计发现，迭代50次后傅里叶系数偏差基本保持稳定。

(2) 循环迭代：

 $\begin{array}{l} {\rm{for}}(k = 1;k < = Q;k + + )\\ \;\;\;\left\{ {} \right.\\ \;\;\;\;\;{S_d} = \sum\limits_{n = 0}^{N-1} {B(n)} {e^{\frac{{-j2{\rm{\pi }}n\left( {T + {e_{k-1}} + d} \right)}}{N}}}\\ \;\;\;\;\;{e_k} = {e_{k1}} + r\left( {{e_{k1}}} \right)\\ \left. {\;\;\;\;} \right\} \end{array}$ (7)

(3) 计算最终的心率值

 $H{R_{{\rm{bvp}}}} = 60\frac{{T + {e_Q}}}{N}{f_{\rm{s}}}$ (8)

# 2.1 实验设备及数据

20名成年人参与了本次准确性测试，包括12名男性和8名女性，在每次测试中，每位测试者都进行了多次视频采集和心率检测。测试者佩戴“迈欧”接触式心率手表，在拍摄视频的同时测量真实心率值作为参考。

# 2.2 实验结果

 $H{R_{{\rm{ac}}}} = 1-\frac{1}{N}\sum\limits_{i = 1}^N {\frac{{H{R_{{\rm{remote}}}}-H{R_{{\rm{true}}}}}}{{H{R_{{\rm{true}}}}}}}$ (9)

# 2.2.1 人脸静止与晃动时心率检测性能比较

Table 1 The performances of heart rate detection in still face videos

 方法 Me/(bit/m) SDe RMSE $H{R_{{\rm{ac}}}}$/% $r$ 傅明哲[7] 1.679 6 3.436 0 3.371 7 96.10 0.961 3 刘祎[9] 1.635 2 3.137 4 3.082 4 96.23 0.963 7 本文方法(未校正) 1.136 0 2.008 3 1.922 0 97.62 0.986 9 本文方法(校正) 0.688 6 1.606 3 1.487 2 97.84 0.987 6

Table 2 The performances of heart rate detection in shaking face videos

 方法 Me/(bit/m) SDe RMSE $H{R_{{\rm{ac}}}}$/% $r$ 傅明哲[7] 6.846 3 13.965 2 13.704 1 89.81 0.685 9 刘祎[9] 7.290 7 13.587 2 13.333 2 90.18 0.761 5 本文方法(未校正) 4.007 7 6.747 6 6.635 9 92.43 0.886 9 本文方法(校正) 1.206 3 2.321 7 2.176 8 97.30 0.970 8

# 2.2.2 算法稳定性比较

Table 3 The average running time of our method

 /s 视频时长 5 10 20 40 60 处理时间 10.80 19.70 43.60 91.80 134.50

