面部动态特征描述的抑郁症识别
Automatic depression estimation using facial appearance
- 2020年25卷第11期 页码:2415-2427
收稿:2020-06-22,
修回:2020-8-21,
录用:2020-8-28,
纸质出版:2020-11-16
DOI: 10.11834/jig.200322
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-06-22,
修回:2020-8-21,
录用:2020-8-28,
纸质出版:2020-11-16
移动端阅览
目的
2
抑郁症是一种严重的精神类障碍,会显著影响患者的日常生活和工作。目前的抑郁症临床评估方法几乎都依赖于临床访谈或问卷调查,缺少系统有效地挖掘与抑郁症密切相关模式信息的手段。为了有效帮助临床医生诊断患者的抑郁症严重程度,情感计算领域涌现出越来越多的方法进行自动化的抑郁症识别。为了有效挖掘和编码人们面部含有的具有鉴别力的情感信息,本文提出了一种基于动态面部特征和稀疏编码的抑郁症自动识别框架。
方法
2
在面部特征提取方面,提出了一种新的可以深层次挖掘面部宏观和微观结构信息的动态特征描述符,即中值鲁棒局部二值模式—3D正交平面(median robust local binary patterns from three orthogonal planes,MRELBP-TOP)。由于MRELBP-TOP帧级特征的维度较高,且含有部分冗余信息。为了进一步去除冗余信息和保留关键信息,采用随机映射(random projection,RP)对帧级特征MRELBP-TOP进行降维。此外,为了进一步表征经过降维后的高层模式信息,采用稀疏编码(sparse coding,SC)来抽象紧凑的特征表示。最后,采用支持向量机进行抑郁程度的估计,即预测贝克抑郁分数(the Beck depression inventory-II,BDI-II)。
结果
2
在AVEC 2013(the continuous audiovisual emotion and depression 2013)和AVEC2014测试集上,抑郁程度估计值与真实值之间的均方根误差(root mean square error,RMSE)分别为9.70和9.22,相比基准算法,识别精度分别提高了29%和15%。实验结果表明,本文方法优于当前大多数基于视频的抑郁症识别方法。
结论
2
本文构建了基于面部表情的抑郁症识别框架,实现了抑郁程度的有效估计;提出了帧级特征描述子MRELBP-TOP,有效提高了抑郁症识别的精度。
Objective
2
Depression is a serious mood disorder that causes noticeable problems in day-to-day activities. Current methods for assessing depression depend almost entirely on clinical interviews or questionnaires and lack systematic and efficient ways for utilizing behavioral observations that are strong indicators of psychological disorder. To help clinicians effectively and efficiently diagnose depression severity
the affective computing community has shown a growing interest in developing automated systems using objective and quantifiable data for depression recognition. Based on these developments
we propose a framework for the automatic diagnosis of depression from facial expressions.
Method
2
The method consists of following steps.1) To extract facial dynamic features
we propose a novel dynamic feature descriptor
namely
median robust local binary patterns from three orthogonal planes (MRELBP-TOP)
which can capture the microstructure and macrostructure of facial appearance and dynamics. To extend the MRELBP descriptors to the temporal domain
we follow the procedure of the LBP-TOP algorithm
where an image sequence is regarded as a video volume from the perspective of three different stacks of planes
that is
the
XY
XT
and
YT
planes. The
XY
plane provides spatial domain information
whereas the
XT
and
YT
planes provide temporal information. The robust center intensity based LBP (RELBP_CI) and robust neighborhood intensity based LBP(RELBP_NI)features are extracted independently from three sets of orthogonal planes
and co-occurrence statistics in these three directions are considered. The features are then stacked in a joint histogram. 2) The proposed MRLBP-TOP descriptors are typically high dimensional. Standard methods
such as principle component analysis (PCA) and linear discriminant analysis (LDA)
have been widely used in dimensionality reduction. However
PCA and LDA have some drawbacks. Compared with PCA
random projenction(RP) has a lower computational cost and is easier to implement. 3) To obtain a compact feature representation
sparse coding (SC) is used. SC refers to a general class of techniques that automatically select a sparse set of elements from a large pool of possible bases to encode an input signal. Basically
SC assumes that objects in the world and their relationships are simple and succinct and can be represented by only a small number of prominent elements. 4) Finally
support vector regression(SVR) is adopted to predict Beck depression inventory(BDI) scores over an entire video clip for depression recognition and analysis.
Result
2
The root mean square error between the predicted values and the Beck depression inventory-II (BDI-II) scores is 9.70 and 9.01 on the test sets of the continuous audiovisual emotion and depression 2013 (AVEC 2013)and AVEC2014
respectively.
Conclusion
2
1) We develop an automated framework that effectively captures facial dynamics information for the measurement depression severity. 2) We propose a robust yet dynamic feature descriptor that captures the macrostructure
microstructure
and spatiotemporal motion patterns. The proposed feature descriptor can be adopted for facial expression recognition tasks in the future. Furthermore
we adopt sparse coding to learn overcomplete dictionary and organize MRELBP-TOP feature descriptors into compact behavior patterns.
Al Jazaery M and Guo G D. 2018. Video-based depression level analysis by encoding deep spatiotemporal features. IEEE Transactions on Affective Computing: #8466881[ DOI: 10.1109/TAFFC.2018.2870884 http://dx.doi.org/10.1109/TAFFC.2018.2870884 ]
Alghowinem S, Goecke R, Wagner M, Parkerx G and Breakspear M. 2013a. Head pose and movement analysis as an indicator of depression//Proceedings of 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction (ACII). Geneva: IEEE: 283-288[ DOI: 10.1109/ACII.2013.53 http://dx.doi.org/10.1109/ACII.2013.53 ]
Alghowinem S, Goecke R, Wagner M, Parker G and Breakspear M. 2013b. Eye movement analysis for depression detection//Proceedings of the 20th IEEE International Conference on Image Processing. Melbourne, VIC, Australia: IEEE: 4220-4224[ DOI: 10.1109/ICIP.2013.6738869 http://dx.doi.org/10.1109/ICIP.2013.6738869 ]
Almaev T R and Valstar M F. 2013. Local Gabor binary patterns from three orthogonal planes for automatic facial expression recognition//Proceedings of 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction. Geneva: IEEE: 356-361[ DOI: 10.1109/ACII.2013.65 http://dx.doi.org/10.1109/ACII.2013.65 ]
Baltrušaitis T, Robinson P and Morency L P. 2016. Openface: an open source facial behavior analysis toolkit//Proceedings of 2016 IEEE Winter Conference on Applications of Computer Vision (WACV). Lake Placid: IEEE: 1-10[ DOI: 10.1109/WACV.2016.7477553 http://dx.doi.org/10.1109/WACV.2016.7477553 ]
Beck A T, Steer R A, Ball R and Ranieri W F. 1996. Comparison of beck depression inventories-IA and-II in psychiatric outpatients. Journal of Personality Assessment, 67(3):588-597[DOI:10.1207/s15327752jpa6703_13]
Cohn J F, Kruez T S, Matthews I, Yang Y, Nguyen M H, Padilla M T, Zhou F and De La Torre F. 2009. Detecting depression from facial actions and vocal prosody//Proceedings of the 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops. Amsterdam: IEEE: 1-7[ DOI: 10.1109/ACII.2009.5349358 http://dx.doi.org/10.1109/ACII.2009.5349358 ]
Cummins N, Joshi J, Dhall A, Sethu V, Goecke R and Epps J. 2013. Diagnosis of depression by behavioural signals: a multimodal approach//Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge. Barcelona, Spain: ACM: 11-20[ DOI: 10.1145/2512530.2512535 http://dx.doi.org/10.1145/2512530.2512535 ]
Dhall A and Goecke R. 2015. A temporally piece-wise fisher vector approach for depression analysis//Proceedings of 2015 International Conference on Affective Computing and Intelligent Interaction. Xi'an: IEEE: 255-259[ DOI: 10.1109/ACII.2015.7344580 http://dx.doi.org/10.1109/ACII.2015.7344580 ]
Ellgring H. 1989. Nonverbal Communication in Depression. Cambridge: Cambridge University Press
Guha T and Ward R K. 2012. Learning sparse representations for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(8):1576-1588[DOI:10.1109/TPAMI.2011.253]
He L, Jiang D M and Sahli H. 2015. Multimodal depression recognition with dynamic visual and audio cues//Proceedings of 2015 International Conference on Affective Computing and Intelligent Interaction. Xi'an: IEEE: 260-266[ DOI: 10.1109/ACII.2015.7344581 http://dx.doi.org/10.1109/ACII.2015.7344581 ]
Jain V, Crowley J L, Dey A K and Lux A. 2014. Depression estimation using audiovisual features and fisher vector encoding//Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. Orlando: ACM: 87-91[ DOI: 10.1145/2661806.2661817 http://dx.doi.org/10.1145/2661806.2661817 ]
Jan A, Meng H Y, Gaus Y F A, Zhang F and Turabzadeh S. 2014. Automatic depression scale prediction using facial expression dynamics and regression//Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. Orlando: ACM: 73-80[ DOI: 10.1145/2661806.2661812 http://dx.doi.org/10.1145/2661806.2661812 ]
Joshi J, Goecke R, Alghowinem S, Dhall A, Wagner M, Epps J, Parker G and Breakspear M. 2013. Multimodal assistive technologies for depression diagnosis and monitoring. Journal on Multimodal User Interfaces, 7(3):217-228[DOI:10.1007/s12193-013-0123-2]
Kaya H,ÇIlli F C and Salah A A. 2014. Ensemble CCA for continuous emotion prediction//Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. Orlando: ACM: 19-26[ DOI: 10.1145/2661806.2661814 http://dx.doi.org/10.1145/2661806.2661814 ]
Kroenke K and Spitzer R L. 2002. The PHQ-9:a new depression diagnostic and severity measure. Psychiatric Annals, 32(9):509-515[DOI:10.3928/0048-5713-20020901-06]
Liu L, Lao S Y, Fieguth P W, Guo Y L, Wang X G and Pietikäinen M. 2016. Median robust extended local binary pattern for texture classification. IEEE Transactions on Image Processing, 25(3):1368-1381[DOI:10.1109/TIP.2016.2522378]
Maji S, Berg A C and Malik J. 2008. Classification using intersection kernel support vector machines is efficient//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition. Anchorage: IEEE: 1-8[ DOI: 10.1109/CVPR.2008.4587630 http://dx.doi.org/10.1109/CVPR.2008.4587630 ]
Mathers C. 2008. The global burden of disease: 2004 update. World Health Organization
Meng H Y, Huang D, Wang H, Yang H Y, AI-Shuraifi M and Wang Y H. 2013. Depression recognition based on dynamic facial and vocal expression features using partial least square regression//Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge. Barcelona: ACM: 21-30[ DOI: 10.1145/2512530.2512532 http://dx.doi.org/10.1145/2512530.2512532 ]
Ojansivu V, Rahtu E and Heikkila J. 2008. Rotation invariant local phase quantization for blur insensitive texture analysis//Proceedings of the 19th International Conference on Pattern Recognition. Tampa: IEEE: 1-4[ DOI: 10.1109/ICPR.2008.4761377 http://dx.doi.org/10.1109/ICPR.2008.4761377 ]
Rubinstein R, Faktor T and Elad M. 2012. K-SVD dictionary-learning for the analysis sparse model//Proceedings of 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Kyoto: IEEE: 5405-5408[ DOI: 10.1109/ICASSP.2012.6289143 http://dx.doi.org/10.1109/ICASSP.2012.6289143 ]
Rubinstein R, Zibulevsky M and Elad M. 2008. Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit. Technical Report CS-2008-08. Technion University Haifa[ DOI: 10.1186/s13634-019-0650-4 http://dx.doi.org/10.1186/s13634-019-0650-4 ]
Segal S D. 2013. Diagnostic and stastistical manual of mental disorders. American Psychiatric Association
Sidorov M and Minker W. 2014. Emotion recognition and depression diagnosis by acoustic and visual features: a multimodal approach//Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. Orlando: ACM: 81-86[ DOI: 10.1145/2661806.2661816 http://dx.doi.org/10.1145/2661806.2661816 ]
Song S Y, Jaiswal S, Shen L L and Valstar M. 2020. Spectral representation of behaviour primitives for depression analysis. IEEE Transactions on Affective Computing: #8976305[ DOI: 10.1109/TAFFC.2020.2970712 http://dx.doi.org/10.1109/TAFFC.2020.2970712 ]
Uddin M A, Joolee J B and Lee Y K. 2020. Depression level prediction using deep spatiotemporal features and multilayer Bi-LTSM. IEEE Transactions on Affective Computing: #8976084[ DOI: 10.1109/TAFFC.2020.2970418 http://dx.doi.org/10.1109/TAFFC.2020.2970418 ]
Valstar M, Schuller B, Smith K, Almaev T, Eyben F, Krajewski J, Cowie R and Pantic M. 2014. AVEC 2014: 3D dimensional affect and depressionrecognition challenge//Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge. Orlando: ACM: 3-10[ DOI: 10.1145/2661806.2661807 http://dx.doi.org/10.1145/2661806.2661807 ]
Valstar M, Schuller B, Smith K, Eyben F, Jiang B H, Bilakhia S, Schnieder S, Cowie R and Pantic M. 2013. AVEC 2013: the continuous audio/Visual emotion and depression recognition challenge//Proceedings of the 3rd ACM International Workshop on Audio/Visual Emotion Challenge. Barcelona: ACM: 3-10[ DOI: 10.1145/2512530.2512533 http://dx.doi.org/10.1145/2512530.2512533 ]
Wen L Y, Li X, Guo G D and Zhu Y. 2015. Automated depression diagnosis based on facial dynamic analysis and sparse coding. IEEE Transactions on Information Forensics and Security, 10(7):1432-1441[DOI:10.1109/TIFS.2015.2414392]
Wright J, Yang Y A, Ganesh A, Sastry S S and Ma Y. 2009. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2):210-227[DOI:10.1109/TPAMI.2008.79]
Zhao G Y and Pietikainen M. 2007. Dynamic texture recognition using local binary patterns with an application to facial expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):915-928[DOI:10.1109/TPAMI.2007.1110]
Zhou X Z, Jin K, Shang Y Y and Guo G D. 2020. Visually interpretable representation learning for depression recognition from facial images. IEEE Transactions on Affective Computing, 11(3):542-552[DOI:10.1109/TAFFC.2018.2828819]
Zhu C S, Ding Y, Yuan B H and Cao H G. 2015. Face recognition based on local binary pattern and local phase quantization. Journal of Nanjing Normal University (Natural Science Edition), 38(1):104-107, 112
朱长水, 丁勇, 袁宝华, 曹红根. 2015.融合LBP和LPQ的人脸识别.南京师大学报(自然科学版), 38(1):104-107, 112
Zhu Y, Shang Y Y, Shao Z H and Guo G D. 2018. Automated depression diagnosis based on deep networks to encode facial appearance and dynamics. IEEE Transactions on Affective Computing, 9(4):578-584[DOI:10.1109/TAFFC.2017.2650899]
Mohammad A J and Guo G. 2018. Video-Based Depression Level Analysis by Encoding Deep Spatiotemporal Features. IEEE Transactions on Affective Computing[ DOI: 10.1109/TAFFC.2018.2870884 http://dx.doi.org/10.1109/TAFFC.2018.2870884 ]
相关文章
相关作者
相关机构
京公网安备11010802024621