发布时间: 2019-06-16
摘要点击次数:
全文下载次数:
DOI: 10.11834/jig.180404
2019 | Volume 24 | Number 6

图像分析和识别

使用人眼几何特征的视线追踪方法

苏海明, 侯振杰, 梁久祯, 许艳, 李兴

常州大学信息科学与工程学院, 常州 213164

收稿日期: 2018-07-04; 修回日期: 2018-12-29

基金项目: 国家自然科学基金项目（61063021）；江苏省产学研前瞻性联合研究项目（BY2015027-12）；江苏省物联网移动互联技术工程重点实验室开放课题项目（JSWLW-2017-013）

第一作者简介: 苏海明, 1992年生, 男, 硕士研究生, 主要研究方向为机器视觉、人机交互。E-mail:1772540221@qq.com;
侯振杰, 男, 教授, 主要研究方向为机器学习。E-mail:houzj@cczu.edu.cn;
梁久祯, 男, 博士, 教授, 主要研究方向为机器视觉。E-mail:jzliang@cczu.edu.cn;
许艳, 女, 硕士, 主要研究方向为行为识别、机器学习。E-mail:97681516@qq.com;
李兴, 男, 硕士研究生, 主要研究方向为机器视觉。E-mail:340299042@qq.com.

中图法分类号: TP391

文献标识码: A

文章编号: 1006-8961(2019)06-0914-10

摘要

目的视线追踪是人机交互的辅助系统，针对传统的虹膜定位方法误判率高且耗时较长的问题，本文提出了一种基于人眼几何特征的视线追踪方法，以提高在2维环境下视线追踪的准确率。方法首先通过人脸定位算法定位人脸位置，使用人脸特征点检测的特征点定位眼角点位置，通过眼角点计算出人眼的位置。直接使用虹膜中心定位算法的耗时较长，为了使虹膜中心定位的速度加快，先利用虹膜图片建立虹膜模板，然后利用虹膜模板检测出虹膜区域的位置，通过虹膜中心精定位算法定位虹膜中心的位置，最后提取出眼角点、虹膜中心点等信息，对点中包含的角度信息、距离信息进行提取，组合成眼动向量特征。使用神经网络模型进行分类，建立注视点映射关系，实现视线的追踪。通过图像的预处理对图像进行增强，之后提取到了相对的虹膜中心。提取到需要的特征点，建立相对稳定的几何特征代表眼动特征。结果在普通的实验光照环境中，头部姿态固定的情况下，识别率最高达到98.9%，平均识别率达到95.74%。而当头部姿态在限制区域内发生变化时，仍能保持较高的识别率，平均识别率达到了90%以上。通过实验分析发现，在头部变化的限制区域内，本文方法具有良好的鲁棒性。结论本文提出使用模板匹配与虹膜精定位相结合的方法来快速定位虹膜中心，利用神经网络来对视线落点进行映射，计算视线落点区域，实验证明本文方法具有较高的精度。

关键词

几何特征; 虹膜模板; 虹膜中心; 眼动向量特征; 注视点

Gaze tracking method for human eye geometric features

Su Haiming, Hou Zhenjie, Liang Jiuzhen, Xu Yan, Li Xing

Department of Information Science & Engineering, Changzhou University, Changzhou 213164, China

Supported by: National Natural Science Foundation of China (61063021)

Abstract

Objective Eye gaze is an input mode that has a potential to serve as an efficient computer interface. Eye movement has consistently been a research hotspot. Knowledge on gaze tracking can provide valuable information on a person's point of attention. The methods used at present are mainly model and regression based. The model-based method extracts facial features and calculates the 3D gaze direction through the geometric relationship between facial features. However, to obtain good accuracy, this method requires individual calibration, which is difficult and reduces user experience. Meanwhile, the regression-based method utilizes powerful computer learning technology to perform mapping from eye appearance characteristics to the gaze direction. Compared with the model-based method, the regression-based method avoids the modeling of the complicated eyeball structure and only needs to collect a large amount of data. Regression-based approaches can be further divided into feature-and appearance-based methods. The feature-based regression method learns the mapping function from an eye feature to the gaze direction, whereas appearance-based regression learns the mapping function of gaze direction from eye appearance. Learning algorithms use traditional support vector regression, random forest, and the latest in-depth learning technology. However, this method requires one or more data sets, thus making the model complicated. Meanwhile, regression-based methods commonly use additional data to compensate for head movements. In addition, substantial data are needed to learn a good mapping function. To improve line-of-sight tracking accuracy in a 2D environment, a new method based on the geometric features of the human eyes is proposed to solve the problem of high error rate and large time consumption of traditional iris location methods. Method First, the position of the face is located by a face location algorithm. The location of the eye angle point is determined by the feature point of the feature point detection, and the eye area is calculated by the angle point. A traditional iris location method may take a long time to locate the iris center. To increase the speed of iris center location, an iris template is established by an iris image and used to detect the location of the iris region. Subsequently, the iris center position is roughly located. Second, the iris center position is located by an iris center precise location algorithm. Through facial feature point localization and iris center localization, the corners of the eye and the iris center are obtained and used as basic information to describe eye movement vectors. The extracted eye motion vector comprises only the information on eye corners and iris center points; thus, the angle is introduced based on the position relation of the points, and the distance from the departure information is adopted as the final eye motion vector. In this study, the neural network model is used to judge the point of sight, and the eye movement vector is utilized as the input feature of the neural network model. Then, the mapping relation of the gaze point is established to realize line-of-sight tracking. Result A camera is used to record videos as the neural network training dataset. In the feature extraction stage, the original data are preprocessed to enhance image quality, thus making the iris center extraction accurate. Training results are obtained via feature extraction, training, and testing. Results show that in an ordinary experimental light environment, the recognition rate reaches 98.9% when the head pose is fixed, and the average recognition rate reaches 95.74%. When the head posture changes, the recognition rate of the algorithm changes to some extent, but the recognition rate can remain stable if stable eye movement features are extracted. When the restricted area of head posture changes, the recognition rate is still high, and the average recognition rate exceeds 90%. Experimental results show that the proposed method has good robustness to restricted area of head variation. Conclusion In this study, a neural network is used to map eye images and the gaze point. Hence, the system does not need to use multiple cameras and infrared light sources, nor does it need camera calibration. A single camera system with no light source is utilized to locate the iris center through a combination of template matching and iris precision positioning. Compared with other methods, this system has a simpler structure, which is realized by using only a single webcam without an auxiliary light source and camera calibration. The neural network is adopted to map the line-of-sight landing point and calculate the line-of-sight landing area. Relatively stable features are extracted in an ordinary light source environment. Experiments show that the method performs well when the camera detects a complete head image in a certain range of head posture changes.

Key words

geometric feature; iris template; iris center; eye movement vector feature; area of injection

0 引言

虹膜检测和跟踪代表了生物测量应用中的两个重要任务。近年来的研究和应用表明, 虹膜运动信息量大, 可以应用于多个领域。它不仅包括人类的运动信息, 而且在心理学、机器人学、安全学, 尤其是神经学研究中都发挥着重要的作用。

视线追踪方法可分为基于模型的方法和基于回归的方法。基于模型的方法^[1-3]通过对人的眼睛与面部的解剖构造了一个3维眼睛模型。利用不同的面部特征和眼睛特征(面部标志、角膜、瞳孔等)之间的几何关系, 可以计算3维注视方向。以3D模型为基础的方法以其准确性和处理头部运动的能力而闻名于世, 目前已广泛应用于许多专业的眼球跟踪器中。由于基于模型的方法需要人眼的知识边缘和相关参数, 因此需要对个人进行校准以获得良好的精度。然而, 校准操作需要用户的明确协作, 这使得眼睛跟踪系统不便于使用并降低了用户体验。基于回归的方法利用计算机强大的学习技术，假设出从眼睛外观特征到注视方向的映射。与基于模型的方法相比, 这种方法避免了对复杂的眼球结构建模, 并且只需要收集大量的数据。基于回归的方法可进一步分为基于特征的和基于外观的方法。基于特征的回归方法^[4-6]是学习从眼睛特征到注视方向的映射函数。典型的眼睛特征包括瞳孔特征、瞳孔眼角向量等特征。基于外观的回归方法^[7-10]学习从眼睛外观到注视方向的映射函数。使用的学习算法从传统的支持向量回归、随机森林到最新的深度学习技术。然而, 基于回归的方法通常使用额外的数据来补偿运动以应对头部运动问题, 此外还需要大量的数据来学习一个良好的映射函数。

综上所述, 本文选取在头部运动下相对稳定的眼动特征, 采用单摄像头无光源的系统架构, 使用基于人工神经网络的2维视线追踪方法，取得了较好的精度。本文使用神经网络来映射人眼图像和注视点, 使系统不需要使用多个摄像头和红外光源, 也无需对摄像头进行标定, 降低了系统的硬件要求, 减少硬件成本, 增强了系统的可用性。

1 视线追踪过程

本文视线追踪过程如图 1所示, 首先通过人脸检测算法从摄像头拍摄的视频帧中检测出人脸, 然后在检测出的人脸中定位出人眼区域，接着在这个人眼区域中准确定位出人眼虹膜中心^[11]。最后先将眼睛观看屏幕不同位置时的特征参数作为人工神经网络的训练、学习样本进行训练, 训练完成之后, 再将眼睛特征参数作为已经训练好的神经网络的输入特征向量, 通过人工神经网络^[12]预测出人眼的视线方向。

图 1 视线跟踪总体流程图

Fig. 1 Block diagram of gaze tracking

1.1 人眼检测与眼角点定位

在约束条件下, 特别是针对穿戴式的视线追踪技术，摄像机直接拍摄人眼，不需要对人脸位置进行追踪，但在非约束视觉跟踪中人脸位置的变化影响人眼位置的变化，所以需要进行人脸检测。本文使用Xiong等人^[13]提出的人脸对齐算法(SDM)来检测人脸特征点。人脸特征点的确定需要先找到人脸框，本文通过经典的Adaboost级联^[14-15]来确定人脸框的位置。在确定的人脸框中使用SDM算法来找到人脸特征点。

本文采用SDM算法定位及跟踪人脸的49个特征点，实际上SDM算法是一个最小化非线性最小二乘法的优化方法，这种方法通过训练来学习最小二乘函数最小的递减方向，避免计算Hessian矩阵和Jacobian矩阵。

使用SDM人脸特征训练原理为

$ f\left( {{x_0} + \Delta x} \right) = \parallel h\left( {d\left( {{x_0} + \Delta x} \right)} \right) - \mathit{\Phi} \parallel _2^2 $

(1)

式中，$d\left( x \right) $表示人脸图像中坐标为$x $的像素点，该人脸图像具有$m $个像素，$ h\left( {d\left( x \right)} \right)$表示人脸图像中提取的尺度不变特征变换(SIFT)特征点，$\mathit{\Phi} $表示手动标记的特征点。为了避免陷入局部最小，SDM采用多次迭代，最终训练目标为

$ \arg \min \sum\limits_{{\mathit{\boldsymbol{d}}^i}} {\sum\limits_{x_k^i} {\parallel \Delta {x^{ki}} - {R_k}{{\mathit{\Phi} }}_k^i - {b_k}{\parallel ^2}} } $

(2)

式中，$ k$为迭代次数，$ {\Delta {x^{ki}}}$表示迭代到第$ k$次时的误差，${\mathit{\boldsymbol{d}}^i} $表示第$ i$张图片，$ x_k^i$表示第$ i$张图的手动标记点，最后根据获得的训练参数${{R_k}} $和${{b_k}} $，获得人脸特征点。

首先使用Adaboost级联算法检测出人脸框，然后通过SDM算法得到人脸特征点。利用获取的眼角和眼睑特征点计算得到眼部区域图像，将利用眼部区域的图像进行虹膜中心检测。在BioID人脸数据库^[16]中的检测结果如图 2所示。

图 2 Adaboost和SDM检测效果

Fig. 2 Detection effect of Adaboost and SDM

((a)Adaboost test results; (b)SDM test results)

1.2 虹膜中心定位

在一般情况下虹膜的内外边缘是不同圆心的，但在实际中受图像分辨率的影响，瞳孔边缘无法精确得到，所以将虹膜外边缘的圆心近似为瞳孔中心。实际采集的图片受光照环境、图片分辨率等因素的影响，图片中都会有噪声，在一定程度上也会发生误判。所以本文的虹膜定位过程采用“先粗后精”的原则：先利用模板对虹膜区域进行粗定位，然后利用微积分算子精定位虹膜。

对眼部区域图像进行观察发现，虹膜区域相对于其他眼部区域具有其更加明显的特征，具有更好的圆形轮廓及更小的平均灰度值，基于上述原因使用模板匹配的方法对虹膜区域进行粗定位。

1.2.1 建立虹膜模板

从摄像机拍摄的图像中获取$M $张虹膜图片构成虹膜数据集$ \mathit{\boldsymbol{U}}$：每张图片可以转换成一个50×50维的向量$\mathit{\boldsymbol{ \boldsymbol{\varGamma} }} $，然后把这$M $个向量放到集合$ \mathit{\boldsymbol{U}}$中

$ \mathit{\boldsymbol{U}} = \left\{ {{\mathit{\boldsymbol{ \boldsymbol{\varGamma} }}_1}, {\mathit{\boldsymbol{ \boldsymbol{\varGamma} }}_2}, {\mathit{\boldsymbol{ \boldsymbol{\varGamma} }}_3}, \cdots , {\mathit{\boldsymbol{ \boldsymbol{\varGamma} }}_M}} \right\} $

(3)

在获取到虹膜向量集合$ \mathit{\boldsymbol{U}}$后，计算得到平均图像

$ \mathit{\boldsymbol{T}} = \frac{1}{M}\sum\limits_{m = 1}^M {{\mathit{\boldsymbol{ \boldsymbol{\varGamma} }}_m}} $

(4)

将$\mathit{\boldsymbol{ \boldsymbol{\varGamma} }} $的各个对应点的值相加求平均，此时的$ \mathit{\boldsymbol{T}}$为一个特征瞳孔，即为所要求得的模板。

1.2.2 模板匹配算法

本文使用的模板匹配算法为相关法，其原理为：模板$T(m, {\rm{ }}n) $叠放在被搜索图$ S(W, {\rm{ }}H)$上平移，其中模板的大小为$ m \times {\rm{ }}n$，被搜索图的大小为$ W \times H$，模板覆盖区域为子图${\mathit{\boldsymbol{S}}_{i, j}} $，$i $和$j $为子图在被搜索图$\mathit{\boldsymbol{S}} $上的坐标, 搜索范围是：$ 1 \le i \le W - n, 1 \le j \le H - m$。衡量$\mathit{\boldsymbol{T}} $和${\mathit{\boldsymbol{S}}_{i, j}} $的相似性为

$ D(i, j) = \sum\limits_{m = 1}^M {\sum\limits_{n = 1}^N {{{\left[ {{S_{i, j}}(m, n) - T(m, n)} \right]}^2}} } $

(5)

式中，$ D$为模板$\mathit{\boldsymbol{T}} $和${\mathit{\boldsymbol{S}}_{i, j}} $的差异值，将$ D$归一化，得到模板匹配的相关系数$ R$，归一化方法为

$ R(i, j) = \frac{{{E_{i, j}}}}{{{E_{{s_{i, j}}}} \times {E_T}}} $

(6)

式中

$ {E_{i, j}} = \sum\limits_{m = 1}^M {\sum\limits_{n = 1}^N {{S_{i, j}}} } (m, n) \times T(m, n) $

(7)

$ {E_{{s_{i, j}}}} = \sqrt {\sum\limits_{m = 1}^M {\sum\limits_{n = 1}^N {{{\left[ {{\mathit{\boldsymbol{S}}_{i, j}}(m, n)} \right]}^2}} } } $

(8)

$ {E_T} = \sqrt {\sum\limits_{m = 1}^M {\sum\limits_{n = 1}^N {{{\left[ {{\mathit{\boldsymbol{T}}_{i, j}}(m, n)} \right]}^2}} } } $

(9)

模板与子图匹配时相关系数$R(i, j)$的取值范围在0到1之间，当被搜索图$ \mathit{\boldsymbol{S}}$完成全部搜索后，各位置的相关系数组成相关系数矩阵$ \mathit{\boldsymbol{R}}$，找出$ \mathit{\boldsymbol{R}}$的最大值${R_{\max }}\left( {{i_m}, {j_m}} \right) $，其对应的子图$ {\mathit{\boldsymbol{S}}_{{i_m}, {j_m}}}$即为匹配目标。

1.2.3 虹膜中心精定位

微积分算子^[17-21]是Daugman提出的一种虹膜定位算法，该算法计算虹膜图像圆环上的灰度差分，然后从所有差分结果中取最大值，得到最后的定位结果。计算为

$ \mathop {\max }\limits_{\left( {{x_0}, {y_0}, r} \right)} \left| {\frac{\partial }{{\partial r}}\oint\limits_{x, y, r} {} \frac{{\mathit{\boldsymbol{I}}(x, y)}}{{2{\rm{ \mathsf{ π} }}r}}{\rm{d}}s} \right| $

(10)

式中，$ {\mathit{\boldsymbol{I}}(x, y)}$为图像矩阵, $ {(x, y)}$为圆心, $r $为半径。

1.3 眼动向量特征的选择

本文使用眼角点与虹膜中心坐标对坐标位置进行归一化操作，建立图像坐标系；由像素点位置换算到图像坐标系

$\left\{ {\begin{array}{*{20}{l}} {x = \frac{{u - {u_0}}}{{{f_x}}}}\\ {y = \frac{{v - {v_0}}}{{{f_y}}}} \end{array}} \right. $

(11)

式中，$u, v $表示像素的实际位置；${u_0}, {v_0} $表示图像中心坐标点; ${f_x}, {f_y} $表示相机的焦距; ${x, y} $表示像素点在图像坐标系中的位置。

眼动向量特征的选择是本文利用神经网络进行视线追踪的关键。神经网络的特征向量需要具有一定的差异性, 否则神经网络的预测输出将不准确。

本文获取了眼角点与虹膜中心的信息，但只使用这6个点的信息量过少。为了提取更多的信息，增加信息的差异，本文使用点之间的距离、角度等信息对特征向量进行扩展。

1) 考虑到人脸位置保持不动时变化量为虹膜中心位置，由于虹膜中心位置的变化带来的是虹膜中心到人的两眼角点距离的变化，本文基于这些特征将眼角点与虹膜中心的距离加入眼动向量之中，但仅有这些信息无法表示眼动向量的唯一性。

2) 为了使向量的特征差异更大，考虑在此基础上加入角度特征，开始时选择的是以眼角点之间的连线为参考，选取眼角点与瞳孔中心之间的夹角，但这种眼动向量仅考虑头部姿态固定时的特征，而在头部姿态发生变化时，向量信息无法得到描述，以原图中心点作为参考点建立坐标轴，以$ x$轴方向作为参考直线，以两眼角作平行于$ x$轴的直线，夹角为瞳孔中心与眼角点的夹角$ \theta $，如图 3所示。

图 3 眼动向量示意图

Fig. 3 Eye movement vector diagram

3) 以上的眼动向量在一定程度上可以表示出头部基本保持不动或者轻微转动时的眼动特征，本文考虑头部发生左右偏转的情况下，以右眼内眼角为参考点，取眼角点与$ x$轴的夹角。

最终确定了一个大小为17维的眼动特征向量如图 3所示，这个眼动向量包含了眼睛瞳孔中心$ o$与眼角点的位置信息$ p$，其他信息包括瞳孔中心与眼角点之间的距离$ d$、瞳孔中心与眼角点连线和两个眼角点之间连线的余弦值$ {\theta _2}$、眼睛相对于图像坐标系倾斜角度的余弦值$ {\theta _1}, {\theta _3}$。

1.4 误差分析

本文在进行人脸特征点检测的时候采用了SDM方法，这种方法的执行速度较快，但也存在一定的不稳定因素，当人脸旋转角度大于30°时，有些人脸轮廓点会消失，检测结果无法保持稳定。本文的虹膜中心检测使用的是Daugman算法，这种算法受灰度变化的影响比较强烈，而虹膜区域的灰度值由于受到眼睑、睫毛、光照等因素的影响易发生误判，本文使用模板匹配的方法使得感兴趣区域(ROI)仅限制在虹膜区域，减少这些不利因素的影响，但无法排除瞳孔形变带来的影响，只能对虹膜中心进行估计，进而对眼动向量的提取带来误差。

2 实验结果

实验在戴尔14R-5420 i5-2310M CPU @ 2.50 GHz处理器上完成，RAM 6.00 GB，Windows 7、MATLAB R2016b。

2.1 建立数据集

头部姿态变化包含图 4中所示的姿态：头部上下偏转(Pitch)、左右偏头(Roll)、左右偏转(Yaw)。头部偏转角度以向上或向左为正(+)，向右或向下为负(－)描述。

图 4 头部姿态

Fig. 4 Head pose

本文构造了两个数据集。构造第1个数据集时，将14寸的屏幕分为如图 5所示的12个区域，头部离屏幕约45 cm。当眼睛注视某一区域时，将这时的眼睛特征向量记录下来作为神经网络的输入，网络结构为4层双隐层网络。以头部固定时其Pitch、Roll、Yaw角度都为0°作为其初始姿态，以下巴作为头部旋转的原点，上下偏转(Pitch)+15°、+30°、－15°；左右偏头(Roll)+15°、+30°、+45°、－15°、－30°、－45°；左右偏转(Yaw)+15°、+30°、+45°、－15°、－30°、－45°。头部运动到上述角度看向屏幕时，对每个位置提取200帧数据，共提取38 400帧数据。第2个数据集是头部在第1个数据集的限制区域内自由摆动，眼睛看向屏幕区域，每个区域获取500帧的数据，共6 000帧图片。数据提取方式是从视频中提取，头部保持姿态并看向屏幕中固定位置时，以约30帧/s的速度拍摄约1 min的视频，从视频中约每9帧随机选取1帧作为样本。

图 5 实验数据采集屏幕区域分布

Fig. 5 Regional distribution of experimental data acquisition screen

2.2 图像预处理

本文对图像进行预处理，一是为了提高图像质量，二是为了增加图片的对比度使虹膜拟合结果更准确。在自然环境下拍摄的图片受光照影响比较强烈，如图 6(a)所示图片在较暗环境下拍摄，虹膜区域轮廓不明显。图 6(b)是利用基于参考白的光线补偿算法^[22]对图像进行处理，使得明暗对比度增加，其灰度图中虹膜区域对比度比较明显。之后将图 6(b)投射到HSV空间后，图片中的虹膜区域进一步分割出来，轮廓明显。

图 6 眼部图像预处理

Fig. 6 Eye image preprocessing ((a) original map; (b)illumination compensation of (a); (c)gray image of (b); (d)gray image of (a); (e) lightness space map of HSV of (a); (f) lightness space map of HSV of (b))

2.3 确定虹膜中心

虹膜中心的确定是视线追踪过程中的重点。在确定虹膜中心时，由于实验环境的影响，眼睛区域会产生大量噪声，上下眼睑也有一定的弧度，这些都会影响到虹膜匹配的效果。

为了降低发生误判的概率，采用基于模板的微积分算子，这种方法先利用模板把虹膜区域限制在一个小范围区域中，相当于虹膜中心粗定位过程，然后利用微积分算子进行虹膜中心的精定位。本文采用模板匹配的方法进行虹膜中心粗定位是由于虹膜区域的灰度值相对于其他区域具有更稳定的形状特征且灰度值更低，适合使用模板匹配的方法。

模板匹配法需要先建立模板图片，本文使用手动切割出的虹膜图片，图 7中显示的是部分建立虹膜模板的图片，在建立过程中先将虹膜图片归一化为50×50像素大小的图片，建立虹膜模板。最终的模板图片如图 8所示。

图 7 虹膜模板的部分数据

Fig. 7 Part of data for iris template

图 8 虹膜模板图片

Fig. 8 Iris template

模板大小的确定，是一个多尺度模板匹配问题。眼睛离摄像头的远近，决定了图像中虹膜的大小，模板的大小应根据眼睛的大小进行缩放。通过实验，本文采用动态大小的模板，根据SDM检测的内眼角点与外眼角的距离作为参考，模板的长度为眼角点距离的0.45倍。图 9显示了部分模板匹配的结果。

图 9 部分模板匹配结果

Fig. 9 Part of template matching results

通过虹膜区域的模板匹配，对虹膜区域进行粗定位，使用微积分算子的虹膜中心定位提取出虹膜中心位置。通过模板匹配的操作将虹膜区域限定为一个极小的区域，实验中模板的长度变化范围为10~20个像素大小。

将直接使用微积分算子与本文基于模板方法作对比，实验图片在自然光源下通过摄像头采集。实验结果如图 10所示。图 10中的结果显示出在光照条件较好的环境下两种算法取得的结果相似，但光照环境较差时本文算法效果更好。

图 10 部分虹膜中心定位的结果

Fig. 10 Part of iris center location result

((a) results of Dangman; (b) results of our method)

本文使用摄像头拍照选取了40张眼部图片，其中20张为光照良好的情况，20张光照环境较差。对比发现在光照良好的环境下虹膜中心检测的效果相差不大，但实验发现由于本文采用了模板使得眼睑巩膜区域的影响变小，在光照良好的环境下取得较高的精度。而在光照环境较差的情况下，由于传统的算法易受外界噪声的干扰，性能急剧下降，传统算法的误判率达到65%，无法满足实验要求。而本文通过模板把虹膜限制在一个小区域中，大大降低了眼睑、巩膜等因素的影响，实验表明本文方法取得了较理想的结果。

本文选取BioID数据1 520张图片中的500张眼睛张开的图片进行测试，虹膜中心位置与标准数据相差值如表 1所示。

表 1 虹膜检测的正确率
Table 1 Accuracy of iris detection

下载CSV

相差值/距离值	百分比/%
0	21
1	32
2	24
3	15
4	5
误检	3

本文方法与传统方法消耗的平均时间对比如表 2所示。表 2显示，本文方法在时间消耗更少。

表 2 虹膜检测时间对比
Table 2 Time of iris detection

下载CSV

算法	平均时间/s	最快用时/s	最慢用时/s
初始方法	0.879	0.685	1.235
本文	0.314	0.235	0.608

2.4 神经网络训练

为了验证本文算法在固定姿态下的稳定性，本文按照头部姿态将数据集1中的各姿态下提取出的数据提取出来分别训练。训练时将数据随机分成3份，2份数据作为训练集，1份为测试集。结果如表 3所示。

表 3 各姿态下的识别率
Table 3 Each attitude recognition rate

下载CSV

/%
位置	姿态
位置	0°	Pitch+ 15°	Pitch+30°	Pitch－15°	Roll+15°	Roll+30°	Roll+45°	Roll－15°	Roll－30°	Roll－45°	Yaw+15°	Yaw+30°	Yaw+45°	Yaw－15°	Yaw－30°	Yaw－45°
1	98.9	94.6	92.1	96.5	95.2	93.3	92.4	93.4	96.2	94.6	94.7	90.4	76.5	95.3	87.5	79.1
2	96.7	94.1	93.2	93.4	94.1	94.1	93.6	92.5	97.2	95.8	95.1	89.7	89.1	93.2	89.1	83.1
3	98.5	95.4	95.2	94.7	92.1	92.3	94.3	95.7	95.5	94.9	96.5	91.2	63.2	94.3	90.3	77.4
4	92.2	93.7	93.4	97.2	94.5	94.5	92.8	94..3	93.2	94.8	93.7	90.3	87.7	95.7	87.6	85.3
5	91.3	94.6	96.8	92.1	96.7	93.2	96.5	92.2	92.7	93.1	94.6	89.4	83.7	96.6	85.9	69.2
6	97.4	92.5	93.2	95.3	97.2	90.4	94.2	94.3	95.3	93.7	95.4	88.7	79.3	91.2	88.7	75.3
7	94.9	97.4	91.7	96.4	94.3	95.3	95.4	95.6	94.6	95.6	93.7	89.5	80.2	93.9	90.2	78.9
8	95.8	92.9	94.6	96.5	92.1	91.6	93.1	93.9	93.7	93.2	93.9	86.7	74.6	92.1	86.7	68.7
9	97.6	91.4	96.7	92.3	89.7	94.5	94.5	96.2	92.1	94.1	94.2	92.1	79.5	91.8	89.2	89.3
10	96.5	93.6	92.1	97.8	94.2	95.4	91.4	92.1	92.5	97.5	93.4	89.4	87.3	93.4	90.3	74.6
11	95.7	92.3	91.2	98.2	94.8	92.3	93.2	94.6	96.2	95.4	95.3	87.9	78.3	92.9	87.9	79.9
12	93.4	90.7	90.6	94.8	95.3	94.8	93.9	95.1	94.2	92.9	94.1	89.6	85.7	93.1	89.7	86.4
平均	95.74	93.60	93.39	95.43	94.18	93.47	93.78	94.16	94.45	94.63	94.55	89.57	80.43	93.63	88.59	78.93
注：黑色字体表示最优结果。

表 3中的数据显示出本文方法在各姿态下的表现良好，头部在限定范围内在Pitch，Roll上表现良好，识别率都在93%以上。但在Yaw上的表现不好，当Yaw角度偏离30°时识别率保持在90%左右，但当Yaw角度变化到45°时识别率只能达到80%，在一些区域上的识别率只能达到63.2%，证明这种方法在头部不发生旋转时能达到比较好的效果。头部发生左右偏头、仰视俯视时视线识别能表现出很好的鲁棒性。

为了证明本文方法的最终识别效果，本文将数据集1作为训练集，将数据集2作为测试集，最终的识别结果如表 4所示。

表 4 最终各位置识别率
Table 4 Each local recognition rate

下载CSV

位置	1	2	3	4	5	6	7	8	9	10	11	12
识别率%	91.2	92.6	90.3	91.3	89.7	90.8	89.4	88.2	89..9	91.7	92.9	91.4

表 4数据显示，本文方法当头部在限制范围内转动时仍具有相对稳定性，总体平均识别率达到了90.78%，但是与表 3的数据相比识别率略有下降。这是由于在数据集2中有大量多方向的数据，但从结果来看总体识别率仍达到90%以上，说明本文方法在头部在多方向上转动时的识别率仍然表现良好，训练的模型具有一定的泛化能力。

2.5 与其他方法的比较

当前精度较高的视线追踪系统大都采用摄像头加光源的方法。文献[23]使用的瞳孔角膜向量反射技术(PCCR)在人机交互系统中的准确率在95%以上，但需要借助光源与CCD相机，硬件构成复杂。文献[24]直接使用3维视线追踪方法，需要对相机进行标定，当头部远离标定位置时，视线追踪的结果将不再准确。本文算法不需要对相机进行标定，在头部不发生大范围转动的情况下能够准确追踪目标，满足实时追踪下对精度的要求且系统构成简单。

3 结论

本文使用模板匹配与虹膜精定位相结合的方法来快速定位虹膜中心，利用神经网络来对视线落点进行映射，计算视线落点区域。该方法仅使用普通网络摄像头且无需对摄像头进行标定，头部可小范围移动，经过实验达到了较高的精度。但也存在部分问题，当前方法适合于头部发生平移时的视线追踪，而当头部发生大范围转动时，眼动向量的提取不准确。此外，眨眼会提取不到虹膜，无法提取眼动向量，如何解决这些问题都是今后研究的方向。

参考文献

[1] Beymer D, Flickner M. Eye gaze tracking using an active stereo head[C]//Proceedings of 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings. Madison, WI, USA: IEEE, 2003: Ⅱ-451.[Doi: 10.1109/CVPR.2003.1211502]

[2] Wang K, Ji Q. Real time eye gaze tracking with Kinect[C]//Proceedings of the 23rd International Conference on Pattern Recognition. Cancun, Mexico: IEEE, 2016: 2752-2757.[Doi: 10.1109/ICPR.2016.7900052]

[3] Wang K, Ji Q. Real time eye gaze tracking with 3D deformable eye-face model[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 1003-1011.[Doi: 10.1109/ICCV.2017.114]

[4] Zhu Z W, Ji Q, Bennett K P. Nonlinear eye gaze mapping function estimation via support vector regression[C]//Proceedings of the 18th International Conference on Pattern Recognition. Hong Kong, China: IEEE, 2006: 1132-1135.[Doi: 10.1109/ICPR.2006.864]

[5] Morimoto C H, Mimica M R M. Eye gaze tracking techniques for interactive applications[J]. Computer Vision and Image Understanding, 2005, 98(1): 4–24. [DOI:10.1016/j.cviu.2004.07.010]

[6] Cheng H, Liu Y Q, Fu W H, et al. Gazing point dependent eye gaze estimation[J]. Pattern Recognition, 2017, 71: 36–44. [DOI:10.1016/j.patcog.2017.04.026]

[7] Lu F, Chen X W, Sato Y. Appearance-based gaze estimation via uncalibrated gaze pattern recovery[J]. IEEE Transactions on Image Processing, 2017, 26(4): 1543–1553. [DOI:10.1109/TIP.2017.2657880]

[8] Zhang X X, Sugano Y, Fritz M, et al. Appearance-based gaze estimation in the wild[C]//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, MA, USA: IEEE, 2015: 4511-4520.[Doi: 10.1109/CVPR.2015.7299081]

[9] Maio W, Chen J X, Ji Q. Constraint-based gaze estimation without active calibration[C]//Proceedings of the Face and Gesture 2011. Santa Barbara, CA, USA: IEEE, 2011: 627-631.[Doi: 10.1109/FG.2011.5771469]

[10] Wood E, Baltruaitis T, Zhang X C, et al. Rendering of eyes for eye-shape registration and gaze estimation[C]//Proceedings of the IEEE International Conference on Computer Vision. Santiago, Chile: IEEE, 2015: 3756-3764.[Doi: 10.1109/ICCV.2015.428]

[11] Wang Y L, Ma J, Li H Q, et al. Race classification based on deep features and Fisher vectors of iris texture[J]. Journal of Image and Graphics, 2018, 23(1): 28–38. [王雅丽, 马静, 李海青, 等. 基于虹膜纹理深度特征和Fisher向量的人种分类[J]. 中国图象图形学报, 2018, 23(1): 28–38. ] [DOI:10.11834/jig.170219]

[12] Huang S C, Fang X Y, Zhou J, et al. Image local blur measure based on BP neural network[J]. Journal of Image and Graphics, 2015, 20(1): 20–28. [黄善春, 方贤勇, 周健, 等. 基于BP神经网络的图像局部模糊测量[J]. 中国图象图形学报, 2015, 20(1): 20–28. ] [DOI:10.11834/jig.20150103]

[13] Xiong X H, De La Torre F. Supervised descent method and its applications to face alignment[C]//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, OR, USA: IEEE, 2013: 532-539.[Doi: 10.1109/CVPR.2013.75]

[14] Viola P, Jones M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Kauai, HI, USA: IEEE, 2001: 905-910.[Doi: 10.1109/CVPR.2001.990517]

[15] Lisu L. Research on face detection classifier using an improved adboost algorithm[C]//Proceedings of 2008 International Symposium on Computer Science and Computational Technology. Shanghai: IEEE, 2008: 78-81.[Doi: 10.1109/ISCSCT.2008.352]

[16] Jesorsky O, Kirchberg K J, Frischholz R. Robust face detection using the Hausdorff distance[C]//International Conference on Audio-& Video-based Biometric Person Authentication. 2001.[Doi: 10.1007/3-540-45344-X_14]

[17] Daugman J G. High confidence visual recognition of persons by a test of statistical independence[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1993, 15(11): 1148–1161. [DOI:10.1109/34.244676]

[18] Daugman J. How iris recognition works[C]//Proceedings of 2002 International Conference on Image Processing. Rochester, NY, USA: IEEE, 2002: 715-739.[Doi: 10.1109/ICIP.2002.1037952]

[19] Shamsi M, Saad P B, Ibrahim S B, et al. Fast algorithm for iris localization using Daugman circular integro differential operator[C]//Proceedings of 2009 International Conference of Soft Computing and Pattern Recognition. Malacca, Malaysia: IEEE, 2009: 393-398.[Doi: 10.1109/SoCPaR.2009.83]

[20] Zhang X D, Dong X P, Wang Q, et al. Location algorithm of RGB iris image based on integro-differential operators[J]. Journal of Northeastern University:Natural Science, 2011, 32(11): 1550–1553. [张祥德, 董晓鹏, 王琪, 等. 基于微积分算子的彩色虹膜图像定位算法[J]. 东北大学学报:自然科学版, 2011, 32(11): 1550–1553. ] [DOI:10.12068/j.issn.1005-3026.2011.11.008]

[21] Sesma-Sanchez L, Zhang Y X, Bulling A, et al. Gaussian processes as an alternative to polynomial gaze estimation functions[C]//Proceedings of the 9th Biennial ACM Symposium on Eye Tracking Research & Applications. Charleston, South Carolina: ACM, 2016: 229-232.[Doi: 10.1145/2857491.2857509]

[22] Klare B, Jain A K. Heterogeneous face recognition: matching NIR to visible light images[C]//Proceedings of the 20th International Conference on Pattern Recognition. Istanbul, Turkey: IEEE, 2010: 1513-1516.[Doi: 10.1109/ICPR.2010.374]

[23] Zhang C, Chi J N, Zhang Z H, et al. A novel eye gaze tracking technique based on pupil center cornea reflection technique[J]. Chinese Journal of Computers, 2010, 33(7): 1272–1285. [张闯, 迟健男, 张朝晖, 等. 一种新的基于瞳孔-角膜反射技术的视线追踪方法[J]. 计算机学报, 2010, 33(7): 1272–1285. ] [DOI:10.3724/SP.J.1016.2010.01272]

[24] Zhu Z W, Ji Q. Novel eye gaze tracking techniques under natural head movement[J]. IEEE Transactions on Biomedical Engineering, 2007, 54(12): 2246–2260. [DOI:10.1109/TBME.2007.895750]