自然光普通摄像头的眼部分割及特征点定位数据集ESLD

张俊杰; 孙光民; 郑鲲; 李煜; 付晓辉; 慈康怡; 申俊杰; 孟凡超; 孔江萍; 张玥

doi:10.11834/jig.210177

数据集论文 | 浏览量 : 0 下载量: 61 CSCD: 0

PDF
导出
分享
收藏
专辑

自然光普通摄像头的眼部分割及特征点定位数据集ESLD
ESLD: eyes segment and landmark detection in the wild
2022年27卷第8期页码：2329-2343
收稿日期：2021-03-18，

修回日期：2021-07-12，

录用日期：2021-7-19，

纸质出版日期：2022-08-16
DOI： 10.11834/jig.210177
稿件说明：

移动端阅览

张俊杰, 孙光民, 郑鲲, 李煜, 付晓辉, 慈康怡, 申俊杰, 孟凡超, 孔江萍, 张玥. 自然光普通摄像头的眼部分割及特征点定位数据集ESLD[J]. 中国图象图形学报, 2022,27(8):2329-2343. DOI： 10.11834/jig.210177.

Junjie Zhang, Guangmin Sun, Kun Zheng, Yu Li, Xiaohui Fu, Kangyi Ci, Junjie Shen, Fanchao Meng, Jiangping Kong, Yue Zhang. ESLD: eyes segment and landmark detection in the wild[J]. Journal of image and graphics, 2022, 27(8): 2329-2343. DOI： 10.11834/jig.210177.

摘要

目的

眼部状态的变化可以作为反映用户真实心理状态及情感变化的依据。由于眼部区域面积较小，瞳孔与虹膜颜色接近，在自然光下利用普通摄像头捕捉瞳孔大小以及位置的变化信息是当前一项具有较大挑战的任务。同时，与现实应用环境类似的具有精细定位和分割信息的眼部结构数据集的欠缺也是制约该领域研究发展的原因之一。针对以上问题，本文利用在普通摄像头场景下采集眼部图像数据，捕捉瞳孔的变化信息并建立了一个眼部图像分割及特征点定位数据集(eye segment and landmark detection dataset

ESLD)。

方法

收集、标注并公开发布一个包含多种眼部类型的图像数据集ESLD。采用3种方式采集图像：1)采集用户使用电脑时的面部图像；2)收集已经公开的数据集中满足在自然光下使用普通摄像机条件时采集到的面部图像；3)基于公开软件UnityEye合成的眼部图像。3种采集方式可分别得到1 386幅、804幅和1 600幅眼部图像。得到原始图像后，在原始图像中分割出眼部区域，将不同尺寸的眼部图像归一化为256×128像素。最后对眼部图像的特征点进行人工标记和眼部结构分割。

结果

ESLD数据集包含多种类型的眼部图像，可满足研究人员的不同需求。因为实际采集和从公开数据集中获取真实眼部图像十分困难，所以本文利用UnityEye生成眼部图像以改善训练数据量少的问题。实验结果表明，合成的眼部图像可以有效地弥补数据量缺少的问题，

$$ {\rm{F1}}$$

值可达0.551。利用深度学习方法分别提供了眼部特征点定位和眼部结构分割任务的基线。采用ResNet101作为特征提取网络情况下，眼部特征点定位的误差为5.828

眼部结构分割的mAP (mean average precision)可达0.965。

结论

ESLD数据集可为研究人员通过眼部图像研究用户情感变化以及心理状态提供数据支持。

Abstract

Objective

Human eyes physiological features are challenged to be captured

which can reflect health

fatigue and emotion of human behaviors. Fatigue phenomenon can be judged according to the state of the patients' eyes. The state of the in-class students' eyes can be predicted by instructorsin terms of students' emotion

psychology and cognitive analyses. Targeted consumers can be recognized through their gaze location when shopping. Camera shot cannot be used to capture the changes in pupil size and orientation in the wild. Meanwhile

there is a lack of eye behavior related dataset with fine landmarks detection and segment similar to the real application scenario. Near-infrared and head-mounted cameras could be used to capture eye images. Light is used to distinguish the iris and pupil

which obtain a high-quality image. Head posture

illumination

occlusion and user-camera distance may affect the quality of image. Therefore

the images collection in the laboratory environment are difficult to apply in the real world.

Method

An eye region segmentation and landmark detection dataset can resolve the issue of mismatch results between the indoor and outdoor scenarios. Our research focuses on collection and annotation of a new eye region segment and landmark detection dataset (eye segment and landmark detection dataset

ESLD) in constraint of dataset for fine landmark detection and eye region

which contain multiple types of eye. First

facial images are collected. There are three ways to collect images

including the facial images of user when using the computer

images in the public dataset captured by the ordinary camera and the synthesized eye images

respectively. The number of images is developed to 1 386

804 and 1 600

respectively. Second

eye region is cut out from the original image. Dlib is used to detect landmarks and eye region is segmented according to the labels of the completed face images involved. For an incomplete face images

eye region should be segment artificially. And then

all eye region images are normalized in 256×128 pixels. The eye region images are restored in a folder according to the type of acquisitions. Finally

annotators are initially to be trained and manually annotated images labels followed. In order to reduce the label error caused by human behavior factors

each annotator selects four images from each type of image for labeling. An experienced annotator will be checked after the landmarks are labeled and completed. The remaining images can be labeled when the annotate standard is reached. Each landmarks location is saved as json file and labelme is used to segment eye region derived the json file. A total of 2 404 images are obtained. Each image contains 16 landmarks around eyes

12 landmarks around iris and 12 pupil surrounded landmarks. The segment labels are relevant to sclera

iris

and pupil and skip around eyes.

Result

Our dataset is classified into training

testing and validation sets by 0.6∶0.2∶0.2. Our demonstration evaluates the proposed dataset using deep learning algorithms and provides baseline for each experiment. First

the model is trained by synthesized eye images. An experiment is conducted to recognize whether the eye is real or not. Our analyzed results show that model cannot recognize real and synthesis accurately

which indicate synthesis eye images can be used as training data. And

deep learning-based algorithms are used to eye region segment. Mask region convolutional neural network(Mask R-CNN) with different backbones are used to train the model. It shows that backbones with deep network structure can obtain high segment accuracy under the same training epoch and the mean average precision (mAP) is 0.965. Finally

Mask R-CNN is modified to landmarks detection task. Euclidean distance is used to test the model and the error is 5.828. Compared to eye region segment task

it is difficult to detect landmarks due to the small region of the eye. Deep structure is efficient to increase the accuracy of landmarks detection with eye region mask.

Conclusion

ESLD is focused on multiple types of eye images in a real environment and bridge the gaps in the fine landmarks detection and segmentation in eye region. To study the relationship between eye state and emotion

a deep learning algorithm can be developed further based on combining ESLD with other datasets.

关键词

Keywords

references

Aifanti N, Papachristou C and Delopoulos A. 2010. The MUG facial expression database//Proceedings of the 11th International Workshop on Image Analysis for Multimedia Interactive Services WIAMIS 10. Desenzano del Garda, Italy: IEEE: 1-4

Aran O, Ari I, Guvensan A, Haberdar H, Kurt Z, Turkmen I, Uyar A and Akarun L. 2007. A database of non-manual signs in Turkish sign language//Proceedings of the 15th IEEE Signal Processing and Communications Applications. Eskisehir, Turkey: IEEE: 1-4[ DOI: 10.1109/SIU.2007.4298708 http://dx.doi.org/10.1109/SIU.2007.4298708 ]

Belhumeur P N, Jacobs D W, Kriegman D J and Kumar N. 2013. Localizing parts of faces using a consensus of exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12): 2930-2940[DOI: 10.1109/TPAMI.2013.23]

Burgos-Artizzu X P, Perona P and Dollár P. 2013. Robust face landmark estimation under occlusion//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 1513-1520[ DOI: 10.1109/ICCV.2013.191 http://dx.doi.org/10.1109/ICCV.2013.191 ]

Dobeš M, Martinek J, Skoupil D, Dobešová Z and Pospíšil J. 2006. Human eye localization using the modified Hough transform. Optik, 117(10): 468-473[DOI: 10.1016/j.ijleo.2005.11.008]

Fuhl W, Kasneci G and Kasneci E. 2021. TEyeD: over 20 million real-world eye images with Pupil, Eyelid, and Iris 2D and 3D Segmentations, 2D and 3D Landmarks, 3D Eyeball, Gaze Vector, and Eye Movement Types[EB/OL ] . [2021-02-19 ] . https://arxiv.org/pdf/2102.02115v1.pdf https://arxiv.org/pdf/2102.02115v1.pdf

He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Compute r Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]

He Z F, Tan T N, Sun Z N and Qiu X C. 2009. Toward accurate and fast iris segmentation for iris biometrics. IEEETransactions on Pattern Analysis and Machine Intelligence, 31(9): 1670-1684[DOI: 10.1109/TPAMI.2008.183]

Huang G, Liu Z, Van Der Maaten L and Weinberger K Q. 2017. Densely connected convolutional networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 2261-2269[ DOI: 10.1109/CVPR.2017.243 http://dx.doi.org/10.1109/CVPR.2017.243 ]

Jeni L A, Tulyakov S, Yin L J, Sebe N and Cohn J F. 2016. The first 3D face alignment in the wild (3DFAW) challenge//Proceedings of the European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 511-520[ DOI: 10.1007/978-3-319-48881-3_35 http://dx.doi.org/10.1007/978-3-319-48881-3_35 ]

Jesorsky O, Kirchberg K J and Frischholz R W. 2001. Robust face detection using the hausdorff distance//Proceedings of the 3rd International Conference on Audio- and Video-Based Biometric Person Authentication. Berlin, Germany: Springer: 90-95[ DOI: 10.1007/3-540-45344-X_14 http://dx.doi.org/10.1007/3-540-45344-X_14 ]

Kasiński A, Florek A and Schmidt A. 2008. The PUT face database. Image Processing and Communications, 13(3/4): 59-64

Kawai S, Takano H and Nakamura K. 2013. Pupil diameter variation in positive and negative emotions with visual stimulus//Proceedings of 2013 IEEE International Conference on Systems, Man, and Cybernetics. Manchester, UK: IEEE: 4179-4183[ DOI: 10.1109/SMC.2013.712 http://dx.doi.org/10.1109/SMC.2013.712 ]

Kazemi V and Sullivan J. 2014. One millisecond face alignment with an ensemble of regression trees//Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE: 1867-1874[ DOI: 10.1109/CVPR.2014.241 http://dx.doi.org/10.1109/CVPR.2014.241 ]

Köstinger M, Wohlhart P, Roth P M and Bischof H. 2011. Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization//Proceedings of 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). Barcelona, Spain: IEEE: 2144-2151[ DOI: 10.1109/ICCVW.2011.6130513 http://dx.doi.org/10.1109/ICCVW.2011.6130513 ]

Lian D Z, Hu L N, Luo W X, Xu Y Y, Duan L X, Yu J Y and Gao S H. 2019. Multiview multitask gaze estimation with deep convolutional neural networks. IEEE Transactions on Neural Networks and Learning Systems, 30(10): 3010-3023[DOI: 10.1109/TNNLS.2018.2865525]

Ma L, Tan T N, Wang Y H and Zhang D X. 2004. Efficient iris recognition by characterizing key local variations. IEEE Transactions on Image Processing, 13(6): 739-750[DOI: 10.1109/TIP.2004.827237]

Magill J and Roy S. 2010. Chips for everyone: a multifaceted approach in electrical engineering outreach. IEEE Transactions on Education, 53(1): 114-119[DOI: 10.1109/TE.2009.2025267]

Messer K, Matas J, Kittler J, Luettin J and Maître G. 1999. XM2VTSDB: the extended M2VTS database//Proceedings of the 2nd International Conference on Audio- and Video-based Biometric Person Authentication. Washington, USA: [s. n.]: 965-966

Milborrow S, Morkel J and Nicolls F. 2010. The MUCT landmarked face database. Pattern Recognition Association of South Africa[EB/OL ] . [2021-03-18 ] . http://www.milbo.org/muct http://www.milbo.org/muct

Phillips P J, Bowyer K W, Flynn P J, Liu X M and Sc ruggs W T. 2008. The iris challenge evaluation 2005//Proceedings of 2008 IEEE Second International Conference on Biometrics: Theory, Applications and Systems. Washington, USA: IEEE: 1-8[ DOI: 10.1109/BTAS.2008.4699333 http://dx.doi.org/10.1109/BTAS.2008.4699333 ]

Phillips P J, Scruggs W T, O'Toole A J, Flynn P J, Bowyer K W, Schott C L and Sharpe M. 2010. FRVT 2006 and ICE 2006 large-scale experimental results. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(5): 831-846[DOI: 10.1109/TPAMI.2009.59]

Proença H and Alexandre L A. 2005. UBIRIS: a noisy iris image database//Proceedings of the 13th International Conference on Image Analysis and Processing. Cagliari, Italy: Springer: 970-977[ DOI: 10.1007/11553595_119 http://dx.doi.org/10.1007/11553595_119 ]

Proença H, Filipe S, Santos R, Oliveira J and Alexandre L A. 2010. The UBIRIS. v2: a database of visible wavelength iris images captured on-the-move and at-a-distance. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(8): 1529-1535[DOI: 10.1109/TPAMI.2009.66]

Russell B C, Torralba A, Murphy K P and Freeman W T. 2008. LabelMe: a database and web-based tool for image annotation. International Journal of Computer Vision, 77(1/3): 157-173[DOI: 10.1007/s11263-007-0090-8]

Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S and Pantic M. 2016. 300 faces in-the-wild challenge: database and results. Image and Vision Computing, 47: 3-18[DOI: 10.1016/j.imavis.2016.01.002]

Shah S and Ross A. 2006. Generating synthetic irises by feature agglomeration//Proceedings of 2006 IEEE International Conference on Image Processing (ICIP). Atlanta,USA: IEEE: 317-320[ DOI: 10.1109/ICIP.2006.313157 http://dx.doi.org/10.1109/ICIP.2006.313157 ]

Simonyan K and Zisserman A. 2015. Very deep convolutional networks for large-scale image recognition[EB/OL ] . [2021-01-30 ] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf

Stegmann M B, Ersboll B K and Larsen R. 2003. FAME——a flexible appearance modeling environment. IEEE Transactions on Medical Imaging, 22(10): 1319-1331[DOI: 10.1109/TMI.2003.817780]

Sun G M, Zhang J J, Zheng K and Fu X H. 2020. Eye tracking and ROI detection within a computer screen using a monocular camera. Journal of Web Engineering, 19(7/8): 1117-1146[DOI: 10.13052/jwe1540-9589.19789]

Sun Z N and Tan T N. 2009. Ordinal measures for iris recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(12): 2211-2226[DOI: 10.1109/TPAMI.2008.240]

Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 1-9[ DOI: 10.1109/CVPR.2015.7298594 http://dx.doi.org/10.1109/CVPR.2015.7298594 ]

Tan T N, He Z F and Sun Z N. 2010. Efficient and robust segmentation of noisy iris images for non-cooperative iris recognition. Image and Vision Computing, 28(2): 223-230[DOI: 10.1016/j.imavis.2009.05.008]

Tonsen M, Zhang X C, Sugano Y and Bulling A. 2016. Labelled pupils in the wild: a dataset for studying pupil detection in unconstrained environments//The 9th Biennial ACM Symposium on Eye Tracking Research and Applications. Charleston, USA: ACM: 139-142[ DOI: 10.1145/2857491.2857520 http://dx.doi.org/10.1145/2857491.2857520 ]

Wong H K, Epps J and Chen S Y. 2019. Automatic pupillary light reflex detection in eyewear computing. IEEE Transactions on Cognitive and Developmental Systems, 11(4): 560-572[DOI: 10.1109/TCDS.2018.2880664]

Wood E, Baltrušaitis T, Morency L P, Robinson P and Bulling A. 2016. Learning an appearance-based gaze estimator from one million synthesised images//The 9th Biennial ACM Symposium on Eye Tracking Research and Applications. Charleston, USA: ACM: 131-138[ DOI: 10.1145/2857491.2857492 http://dx.doi.org/10.1145/2857491.2857492 ]

Wu W Y, Qian C, Yang S, Wang Q, Cai Y C and Zhou Q. 2018. Look at boundary: a boundary-aware face alignment algorithm//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Salt Lake City, USA: IEEE: 2129-2138[ DOI: 10.1109/CVPR.2018.00227 http://dx.doi.org/10.1109/CVPR.2018.00227 ]

Zhang J J, Sun G M and Zheng K. 2020. Review of gaze tracking and its application in intelligent education. Journal of Computer Applications, 40(11): 3346-3356

张俊杰, 孙光民, 郑鲲. 2020. 视线跟踪及其在智能教育中的应用研究综述. 计算机应用, 40(11): 3346-3356[DOI: 10.11772/j.issn.1001-9081.2020040443]

文章被引用时，请邮件提醒。

提交