Automatic facial feature points location based on deep learning: a review

Yali Xu; Junli Zhao; Zhihan Lyu; Zhimei Zhang; Jinhua Li; Zhenkuan Pan

doi:10.11834/jig.200278

Review | Views : 0 下载量: 0 CSCD: 0

PDF
Export
Share
Collection
Album

Automatic facial feature points location based on deep learning: a review
Vol. 26, Issue 11, Pages: 2630-2644(2021)
Published： 16 November 2021 ，

Accepted： 25 December 2020
DOI： 10.11834/jig.200278
稿件说明：

移动端阅览

Yali Xu, Junli Zhao, Zhihan Lyu, Zhimei Zhang, Jinhua Li, Zhenkuan Pan. Automatic facial feature points location based on deep learning: a review. [J]. Journal of Image and Graphics 26(11):2630-2644(2021)
DOI：

Yali Xu, Junli Zhao, Zhihan Lyu, Zhimei Zhang, Jinhua Li, Zhenkuan Pan. Automatic facial feature points location based on deep learning: a review. [J]. Journal of Image and Graphics 26(11):2630-2644(2021) DOI： 10.11834/jig.200278.

摘要

人脸特征点定位是根据输入的人脸数据自动定位出预先按人脸生理特征定义的眼角、鼻尖、嘴角和脸部轮廓等面部关键特征点，在人脸识别和分析等系统中起着至关重要的作用。本文对基于深度学习的人脸特征点自动定位进行综述，阐释了人脸特征点自动定位的含义，归纳了目前常用的人脸公开数据集，系统阐述了针对2维和3维数据特征点的自动定位方法，总结了各方法的研究现状及其应用，分析了当前人脸特征点自动定位技术在深度学习应用中的现状、存在问题及发展趋势。在公开的2维和3维人脸数据集上对不同方法进行了比较。通过研究可以看出，基于深度学习的2维人脸特征点的自动定位方法研究相对比较深入，而3维人脸特征点定位方法的研究在模型表示、处理方法和样本数量上都存在挑战。未来基于深度学习的3维人脸特征点定位方法将成为研究趋势。

Abstract

Face feature point location is to locate the predefined key facial feature points automatically according to the physiological characteristics of the human face

such as eyes

nose tip

mouth corner

and face contour. It is one of the important problems in face registration

face recognition

3D face reconstruction

craniofacial analysis

craniofacial registration

and many other related fields. In recent years

various algorithms for facial feature point localization have emerged constantly

but several problems remain in the calibration of feature points

especially in the calibration of 3D facial feature points

such as manual intervention

low or inaccurate number of feature points

and long calibration time. In recent years

convolutional neural networks have been widely used in face feature point detection. This study focuses on the analysis of automatic feature point location methods based on deep learning for 2D and 3D facial data. Training data with real feature point labels in 2D texture image data are abundant. The research of automatic location method of 2D facial feature points based on deep learning is relatively extensive and indepth. The classical methods for 2D data include cascade convolution neural network methods

end-to-end regression methods

auto encoder network methods

different pose estimation methods

and other improved convolutional neural network (CNN) methods. In cascaded regression methods

rough detection is performed first

and then the feature points are finetuned. The end-to-end method propagates the error between the real results and the predicted results until the model converges. Autoencoder methods can select features automatically through encoding and decoding. Head pose estimation has great importance for face feature point detection because image-based methods are always affected by illumination and pose.Head pose estimation and feature points detection is improved by modifying network structure and loss function. The disadvantage of cascade regression method is that it can update the regressor by independent learning

and the descent direction may cancel each other. The flexibility of the end-to-end model is low. CNN is applied to 2D training data with real feature point tags. However

in the case of a 3D

training data with rich real feature point labels are lacking. Therefore

compared with 2D facial feature points

3D facial feature point location remains a challenge. Several automatic feature point location for 3D data are introduced. The methods for 3D data are mainly based on depth information and 3D morphable model (3DMM). In recent years

with the development of RGB+depth map (RGBD) technology

depth data have attracted more attention. Feature point detection based on depth information has become an important preprocessing step for automatic feature point detection in 3D data. Initialization is crucial for deep data

but information is easily lost. The method based on 3DMM represents 3D face data for locating feature points through deep learning. On the one hand

the shape and expression parameters of 3DMM are highly nonlinear with the image texture information

which makes image mapping difficult to estimate. Compared with 2D face data

3D face data lack training data with remarkable changes in face shape

race

and expression. Face feature point detection still faces great challenges.In summary

this study explains the meaning of automatic location of facial feature points

summarizes the currently open and commonly used face datasets

introduces various methods of automatic location of feature points for 2D and 3D data

summarizes the research status and application of each domestic and international method

analyzes the problems and development trend of automatic location technology of face feature points in deep learning application on 2D and 3D datasets

and compares the experimental results of the latest methods. In conclusion

the research on automatic location method of 2D face feature points based on deep learning is relatively indepth. Challenges in processing 3D data remain. The current solution for locating feature points is to project 3D face data onto 2D images through cylindrical coordinates

depth maps

3DMM

and other methods. Information loss is the main problem of these methods. The method of feature point location directly on 3D model needs further exploration and research. The accuracy and speed of feature point location also need to be improved. In the future

3D facial feature point localization methods based on deep learning will gradually become a trend.

关键词

深度学习2维人脸特征点定位3维人脸特征点定位卷积神经网络(CNN)配准

Keywords

deep learning2D facial feature point location3D facial feature point locationconvolutional neural network (CNN)registration

references

Belhumeur P N, Jacobs D W, Kriegman D J and Kumar N. 2013. Localizing parts of faces using a consensus of exemplars. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(12): 2930-2940[DOI:10.1109/TPAMI.2013.23]

Blanz V and Vetter T. 2002. A morphable model for the synthesis of 3d faces//Proceedings of the 26th Annual Conference on ComputerGraphics and Interactive Techniques. San Antonio, USA: ACM Press/Addison-Wesley Publishing Co.: 187-194[DOI: 10.1145/311535.311556http://dx.doi.org/10.1145/311535.311556]

Browatzki B and Wallraven C. 2020. 3FabRec: fast few-shot face alignment by reconstruction//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 6109-6119[DOI: 10.1109/CVPR42600.2020.00615http://dx.doi.org/10.1109/CVPR42600.2020.00615]

Bulat A and Tzimiropoulos G. 2017. Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources//Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV). Venice, Italy: IEEE: 3726-3734[DOI: 10.1109/ICCV.2017.400http://dx.doi.org/10.1109/ICCV.2017.400]

Cao X D, Wei Y C, Wen F and Sun J. 2014. Face alignment by explicit shape regression. International Journal of Computer Vision, 107(2): 177-190[DOI:10.1007/s11263-013-0667-3]

Chandran P, Bradley D, Gross M and Beeler T. 2020. Attention-driven cropping for very high resolution facial landmark detection//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 5860-5869[DOI: 10.1109/CVPR42600.2020.00590http://dx.doi.org/10.1109/CVPR42600.2020.00590]

Cong W L, Zhao S Y, Tian H and Shen J B. 2017. Improved face detection and alignment using cascade deep convolutional network[EB/OL]. [2020-07-17].https://ui.adsabs.harvard.edu/abs/2017https://ui.adsabs.harvard.edu/abs/2017arXiv170709364C

Cootes T F, Edwards G J and Taylor C J. 2001. Active appearance models. IEEE Transactions on pattern analysis and machine intelligence, 23(6): 681-685[DOI:10.1109/34.927467]

Cootes T F and Taylor C J. 1992. Active shape models-smart snakes//Hogg D and Boyle R, eds. British Machine Vision Conference. Edinburgh, UK: Springer: 266-275[DOI: 10.1007/978-1-4471-3201-1_28http://dx.doi.org/10.1007/978-1-4471-3201-1_28]

Cristinacce D and Cootes T. 2008. Automatic feature localisation with constrained local models. Pattern Recognition, 41(10): 3054-3067[DOI:10.1016/j.patcog.2008.01.024]

Dapogny A, Cord M and Bailly K. 2019. DeCaFA: deep convolutional cascade for face alignment in the wild//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 6892-6900[DOI: 10.1109/ICCV.2019.00699http://dx.doi.org/10.1109/ICCV.2019.00699]

Feng Z H, Kittler J, Awais M, Huber P and Wu X J. 2018. Wing loss for robust facial landmark localization with convolutional neural networks//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 2235-2245[DOI: 10.1109/CVPR.2018.00238http://dx.doi.org/10.1109/CVPR.2018.00238]

Gilani S Z, Mian A and Eastwood P. 2017. Deep, dense and accurate 3d face correspondence for generating population specific deformable models. Pattern Recognition, 69: 238-250[DOI:10.1016/j.patcog.2017.04.013]

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative Adversarial Networks. Advances in Neural Information Processing Systems. 3: 2672-2680[EB/OL]. [2020-07-17].https://arxiv.org/pdf/1406.2661v1.pdfhttps://arxiv.org/pdf/1406.2661v1.pdf

Guo Y D, Zhang J Y, Cai J F, Jiang B Y and Zheng J M. 2019. CNN-Based real-time dense face reconstruction with inverse-rendered photo-realistic face images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(6): 1294-1307[DOI:10.1109/TPAMI.2018.2837742]

He K, Zhang X, Ren S and Sun J. 2016. Deep residual learning for image recognition//Proceedings of Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[DOI: 10.1109/CVPR.2016.90http://dx.doi.org/10.1109/CVPR.2016.90]

Jiang B Y, Zhang J Y, Deng B L, Guo Y D and Liu L G. 2018. Deep face feature for face alignment. Computer Vision and Pattern Recognition[EB/OL]. [2020-07-17].https://arxiv.org/pdf/1708.02721.pdfhttps://arxiv.org/pdf/1708.02721.pdf

Karim F, Majumdar S, Darabi H and Chen S. 2017. LSTM fully convolutional networks for time series classification. IEEE Access, 6: 1662-1669[DOI:10.1109/ACCESS.2017.2779939]

Karras T, Laine S and Aila T. 2019. A style-based generator architecture for generative adversarial networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 4396-4405[DOI: 10.1109/CVPR.2019.00453http://dx.doi.org/10.1109/CVPR.2019.00453]

Kelkboom E J C, Gökberk B, Kevenaar T A M, Akkermans A H M and van der Veen M. 2007. 3d face: biometric template protection for 3d face recognition//Proceedings of International Conference on Biometrics. Seoul: Korea (South): Springer: 4642: 566-573[DOI: 10.1007/978-3-540-74549-5_60http://dx.doi.org/10.1007/978-3-540-74549-5_60]

Köstinger M, Wohlhart P, Roth P M and Bischof H. 2011. Annotated Facial Landmarks in the Wild: A large-scale, real-world database for facial landmark localization//Proceedings of 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). Barcelona, Spain: IEEE: 2144-2151[DOI: 10.1109/ICCVW.2011.6130513http://dx.doi.org/10.1109/ICCVW.2011.6130513]

Kowalski M, Naruniec J and Trzcinski T. 2017. Deep alignment network: a convolutional neural network for robust face alignment//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Honolulu, USA: IEEE: 2034-2043[DOI: 10.1109/CVPRW.2017.254http://dx.doi.org/10.1109/CVPRW.2017.254]

Kumar A and Chellappa R. 2018. Disentangling 3D pose in a dendritic CNN for unconstrained 2D face alignment//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 430-439[DOI: 10.1109/CVPR.2018.00052http://dx.doi.org/10.1109/CVPR.2018.00052]

Kumar A, Marks T K, Mou W X, Feng C and Liu X M. 2019. UGLLI face alignment: estimating uncertainty with Gaussian log-likelihood loss//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Seoul, Korea (South): IEEE: 778-782[DOI: 10.1109/iccvw.2019.00103http://dx.doi.org/10.1109/iccvw.2019.00103]

Kumar A, Marks T K, Mou W X, Wang Y, Jones M, Cherian A, Koike-Akino T, Liu X M and Feng C. 2020. LUVLi face alignment: estimating landmarks' location, uncertainty, and visibility likelihood//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Seattle, USA: IEEE: 8233-8243[DOI: 10.1109/CVPR42600.2020.00826http://dx.doi.org/10.1109/CVPR42600.2020.00826]

Le V, Brandt J, Lin Z, Bourdev L D and Huang T S. 2012. Interactive facial feature localization//Proceedings of European Conference on Computer Vision. Berlin, Germany: Springer: 679-692[DOI: 10.1007/978-3-642-33712-3_49http://dx.doi.org/10.1007/978-3-642-33712-3_49]

Lecun Y, Bottou L, Bengio Y and Haffner P. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11): 2278-2324[DOI:10.1109/5.726791]

Li Q Y, Ji Q G and Hong S D. 2019. FastFace: a real-time robust algorithm for face detection. Journal of Image and Graphics, 24(10): 1761-1771

李启运, 纪庆革, 洪赛丁. 2019. FastFace: 实时鲁棒的人脸检测算法. 中国图象图形学报, 24(10): 1761-1771[DOI:10.11834/jig.180662]

Liu Z W, Luo P, Wang X G and Tang X O. 2015. Deep learning face attributes in the wild//Proceedings of 2015 IEEE International Conference on Computer Vision (ICCV). Santiago, Chile: IEEE: 3730-3738[DOI: 10.1109/ICCV.2015.425http://dx.doi.org/10.1109/ICCV.2015.425]

Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 3431-3440[DOI: 10.1109/CVPR.2015.7298965http://dx.doi.org/10.1109/CVPR.2015.7298965]

Lowe D G. 2004. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2): 91-110[DOI:10.1023/B:VISI.0000029664.99615.94]

Mirjalili V, Raschka S, Namboodiri A and Ross A. 2018. Semi-adversarial networks: convolutional autoencoders for imparting privacy to face images//Proceedings of 2018 International Conference on Biometrics (ICB). Gold Coast, Australia: IEEE: 82-89[DOI: 10.1109/ICB2018.2018.00023http://dx.doi.org/10.1109/ICB2018.2018.00023]

Newell A, Yang K Y and Deng J. 2016. Stacked hourglass networks for human pose estimation//Proceedings of European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 483-499[DOI: 10.1007/978-3-319-46484-8_29http://dx.doi.org/10.1007/978-3-319-46484-8_29]

Ranjan R, Patel V M and Chellappa R. 2019. Hyperface: a deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(1): 121-135[DOI:10.1109/TPAMI.2017.2781233]

Ren S, He K, Girshick R and Sun J. 2017. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6): 1137-1149[DOI:10.1109/TPAMI.2016.2577031]

Sagonas C, Antonakos E, Tzimiropoulos G, Zafeiriou S and Pantic M. 2016. 300 faces in-the-wild challenge: database and results. Image and Vision Computing, 47: 3-18[DOI:10.1016/j.imavis.2016.01.002]

Sun Y, Wang X G and Tang X O. 2013. Deep convolutional network cascade for facial point detection//Proceedings of 2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland, USA: IEEE: 3476-3483[DOI: 10.1109/CVPR.2013.446http://dx.doi.org/10.1109/CVPR.2013.446]

Szegedy C, Liu W, Jia Y Q, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V and Rabinovich A. 2015. Going deeper with convolutions//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Boston, USA: IEEE: 1-9[DOI: 10.1109/CVPR.2015.7298594http://dx.doi.org/10.1109/CVPR.2015.7298594]

Tang Z Q, Peng X, Li K and Metaxas D N. 2020. Towards efficient U-nets: a coupled and quantized approach. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(8): 2038-2050[DOI:10.1109/TPAMI.2019.2907634]

Terada T, Chen Y W and Kimura R. 2018. 3D Facial Landmark Detection Using Deep Convolutional Neural Networks//Proceedings of the 14th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery. Huangshan, China: IEEE: 390-393[DOI: 10.1109/FSKD.2018.8687254http://dx.doi.org/10.1109/FSKD.2018.8687254]

Wu W Y and Yang S. 2017. Leveraging intra and inter-dataset variations for robust face alignment//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Honolulu, USA: IEEE: 2096-2105[DOI: 10.1109/CVPRW.2017.261http://dx.doi.org/10.1109/CVPRW.2017.261]

Xiao S T, Feng J S, Xing J L, Lai H J, Yan S C and Kassim A. 2016. Robust facial landmark detection via recurrent attentive-refinement networks//Proceedings of European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 57-72[DOI: 10.1007/978-3-319-46448-0_4http://dx.doi.org/10.1007/978-3-319-46448-0_4]

Xu X and Kakadiaris I A. 2017. Joint head pose estimation and face alignment framework using global and local CNN features//Proceedings of the 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017). Washington, USA: IEEE: 642-649[DOI: 10.1109/FG.2017.81http://dx.doi.org/10.1109/FG.2017.81]

Yi D, Lei Z, Liao S and Li S Z. 2014. Learning face representation from scratch[EB/OL]. [2020-07-17]. https://arxiv.org/pdf/1411.7923.pdf

Yin B C, Sun Y F, Wang C Z and Gai Y. 2009. BJUT-3D large scale 3D face database and information processing. Journal of Computer Research and Development, 46(6): 1009-1018

尹宝才, 孙艳丰, 王成章, 盖赟. 2009. BJUT-3D三维人脸数据库及其处理技术. 计算机研究与发展, 46(6): 1009-1018

Yu X, Huang J Z, Zhang S T, Yan W and Metaxas D N. 2013. Pose-free facial landmark fitting via optimized part mixtures and cascaded deformable shape model//Proceedings of 2013 IEEE International Conference on Computer Vision. Sydney, Australia: IEEE: 1944-1951[DOI: 10.1109/ICCV.2013.244http://dx.doi.org/10.1109/ICCV.2013.244]

Zhang B X and Su G D. 2013. Studies on human face imaging properties and the goals of face normalization. Journal of Optoelectronics·Laser, 14(4): 406-410

章柏幸, 苏光大. 2013. 人脸成像特性研究及人脸归一化的目标. 光电子·激光, 14(4): 406-410[DOI:10.3321/j.issn:1005-0086.2003.04.020]

Zhang C Q, Liu Y Q and Fu H Z. 2020. AE2-nets: autoencoder in autoencoder networks//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach, USA: IEEE: 2572-2580[DOI: 10.1109/CVPR.2019.00268http://dx.doi.org/10.1109/CVPR.2019.00268]

Zhang J, Shan S G, Kan M N and Chen X L. 2014a. Coarse-to-fine auto-encoder networks (CFAN) for real-time face alignment//Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer: 1-16[DOI: 10.1007/978-3-319-10605-2_1http://dx.doi.org/10.1007/978-3-319-10605-2_1]

Zhang M, Lyu X Q, Wu L and Yu D H. 2018. Multiplicative denoising method based on deep residual learning. Laser and Optoelectronics Progress, 55(3): 197-203

张明, 吕晓琪, 吴凉, 喻大华. 2018. 基于深度残差学习的乘性噪声去噪方法. 激光与光电子学进展, 55(3): 197-203[DOI:10.3788/LOP55.031004]

Zhang Z P, Luo P, Loy C C and Tang X O. 2014b. Facial landmark detection by deep multi-task learning//Proceedings of European Conference on Computer Vision. Zurich, Switzerland: Springer: 94-108[DOI: 10.1007/978-3-319-10599-4_7http://dx.doi.org/10.1007/978-3-319-10599-4_7]

Zheng W M, Tang H, Lin Z C and Huang T S. 2009. A novel approach to expression recognition from non-frontal face images//Proceedings of the 12th IEEE International Conference on Computer Vision. Kyoto, Japan: IEEE: 1901-1908[DOI: 10.1109/ICCV.2009.5459421http://dx.doi.org/10.1109/ICCV.2009.5459421]

Zheng Y H, Wang B Z, Wang J J, Chen L Y and Hong Q Q. 2019. Research on deep convolution neural network with small filter used in facial landmark detection. Computer Engineering and Applications, 55(4): 173-178

郑银环, 王备战, 王嘉珺, 陈凌宇, 洪清启. 2019. 深度卷积神经网络应用于人脸特征点检测研究. 计算机工程与应用, 55(4): 173-178[DOI:10.3778/j.issn.1002-8331.1710-0280]

Zhou E J, Fan H Q, Cao Z M, Jiang Y N and Yin Q. 2013. Extensive facial landmark localization with coarse-to-fine convolutional network cascade//Proceedings of 2013 IEEE International Conference on Computer Vision Workshops. Sydney, Australia: IEEE: 386-391[DOI: 10.1109/ICCVW.2013.58http://dx.doi.org/10.1109/ICCVW.2013.58]

Zhu X X and Ramanan D. 2012. Face detection, pose estimation, and landmark localization in the wild//Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. Providence, USA: IEEE: 2879-2886[DOI: 10.1109/CVPR.2012.6248014http://dx.doi.org/10.1109/CVPR.2012.6248014]

Zhu X Y, Lei Z, Liu X M, Shi H L and Li S Z. 2016. Face alignment across large poses: a 3D solution//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas, USA: IEEE: 146-155[DOI: 10.1109/CVPR.2016.23http://dx.doi.org/10.1109/CVPR.2016.23]

Alert me when the article has been cited

提交

The review of multi-focus image fusion methods based on deep learning

The review of demosaicing methods for Bayer color filter array image

An overview of deep learning based pedestrian detection algorithms

The salient object detection based on attention-guided network

Review on 3D point cloud registration method