面向情绪调节的多模态人机交互技术

张凯乐; 刘婷婷; 刘箴; 庄寅; 柴艳杰

doi:10.11834/jig.200251

生理信号和心理状态分析 | 浏览量 : 0 下载量: 0 CSCD: 3

PDF
导出
分享
收藏
专辑

面向情绪调节的多模态人机交互技术
Multimodal human-computer interactive technology for emotion regulation
2020年25卷第11期页码：2451-2464
纸质出版日期： 2020-11-16 ，

录用日期： 2020-09-27
DOI： 10.11834/jig.200251
稿件说明：

移动端阅览

张凯乐, 刘婷婷, 刘箴, 庄寅, 柴艳杰. 面向情绪调节的多模态人机交互技术[J]. 中国图象图形学报, 2020,25(11):2451-2464.

Kaile Zhang, Tingting Liu, Zhen Liu, Yin Zhuang, Yanjie Chai. Multimodal human-computer interactive technology for emotion regulation[J]. Journal of Image and Graphics, 2020,25(11):2451-2464.
张凯乐, 刘婷婷, 刘箴, 庄寅, 柴艳杰. 面向情绪调节的多模态人机交互技术[J]. 中国图象图形学报, 2020,25(11):2451-2464. DOI： 10.11834/jig.200251.

Kaile Zhang, Tingting Liu, Zhen Liu, Yin Zhuang, Yanjie Chai. Multimodal human-computer interactive technology for emotion regulation[J]. Journal of Image and Graphics, 2020,25(11):2451-2464. DOI： 10.11834/jig.200251.

摘要

目的

现代社会存在心理问题的人日趋增多，及时调节其负面情绪对社会和谐稳定具有重要现实意义。传统的情绪调节方法需要花费大量人力，为此提出一种面向情绪调节的多模态人机交互方法，识别用户情绪，采用文本对话和体感交互实现对用户情绪的调节。

方法

综合运用了表情识别、文本对话和手势来实现对用户情绪的识别，构建了具有情绪表现力的智能体。用户的表情识别采用支持向量机方法，文本对话采用基于规则和融入情绪因素的Seq2Seq模型实现。设计了聊天、过生日和互动游戏（打篮球）等交互剧情，采用手势和肢体动作来辅助交互。为了更好地体现情绪调节的作用，在互动游戏中设计了强化学习算法，可根据用户的情绪反馈来自动调整游戏难度，最终让用户情绪调整到积极状态。

结果

通过实验发现，采用单模态交互很难感知到用户的背景信息，因此对用户的情绪识别可能出现误判。而采用多模态的人机交互方式，可以通过文本对话了解用户的背景信息，从而对用户情绪的识别更为合理。在多模态的人机交互剧情中，用户能以更自然的方式实现情景互动，智能体在多模态交互中更能发挥情绪调节作用。

结论

本文提出一种基于多模态人机交互的情绪调节方法，该方法不需要昂贵的硬件设备，便于推广普及，为消极情绪的调节提供了一种可计算方案。

Abstract

Objective

Emotion is closely related to human social life. People increasingly encounter psychological problems because of the aging population and the acceleration of the pace of social life. If people's negative emotions cannot be adjusted in a timely manner

then social harmony and stability may have adverse effects. Face-to-face professional counseling for people with psychological problems can achieve emotional regulation. However

the number of treatment centers and counselors that have the ability to carry out psychotherapy is insufficient. At the same time

some people are unwilling to confide their psychological problems to others to protect their privacy. The development of artificial intelligence

virtual reality

and human computer interaction(HCI) enables emotion regulation in the use of a suitable human-computer emotional interaction system. A multimodal emotional interaction model is proposed in this study to achieve an improved emotional regulation effect because existing human-computer emotional interaction methods are in single mode

and do not consider machine learning algorithms. The proposed model integrates many emotional interaction methods

including text dialogue

somatosensory interaction

and expression recognition

and provides a new strategy for negative emotion regulation.

Method

The proposed model uses expression recognition and text dialogue to detect user emotions and designs a three-dimensional realistic agent

which can express itself through its facial expression

body posture

voice

and text

to interact with users. Traditional support vector machine(SVM) method is used to recognize user expressions

and the data-driven method based on production rules is utilized to realize the text dialogue. Emotion dictionary

syntactic analysis

and production rules are combined to realize the text sentiment analysis for the input text containing emotional words. The Seq2Seq model with emotional factors is used to realize the text sentiment analysis for the input text without emotional words. In addition

multimodal HCI scenarios

including conversations

birthdays

and interactive games (playing basketball)

are used to achieve emotional regulation. Hand gestures and body movements are utilized to assist the interaction. Felzenszwalb histogram of orizented gradients(FHOG) features are used to extract and recognize gestures

and the tracking algorithm MediaFlow is applied to track gestures. The user's body movement can be assessed according to the change in joint point position. As a companion

the agent can improve the user's experience. The collected expression

posture

and text information from users can comprehensively assess the emotion state of users.

Reinforcement learning algorithm is used to regulate emotions further and improve the feelings of users by automatically adjusting the difficulty of the game according to the emotional feedback of users. Accordingly

a prototype multimodal interactive system for emotion regulation is implemented using the computer. A normal camera is used for expression and gesture recognition

Kinect is utilized for body motion recognition

and iFLYTEK is applied to convert the voice input of users into text.

Result

The regulation effect of single-mode and multimode HCIs is compared. Results showed that the interaction between the agent and the user is limited in the single-mode HCI. The agent can neither fully understand why the user has such an emotion nor take appropriate measures to regulate user emotion. Hence

the user

who was not properly regulated

may feel disappointed. By contrast

the agent can fully understand the user emotion through multiple channels and provide a reasonable adjustment scheme in the multimodal interaction. The user will have additional interactions with the agent to participate in the regulation. Emotions can be regulated by language and exercise. This comprehensive and natural regulation interaction is effective in achieving enhanced emotion regulation.

Conclusion

A multimodal HCI method is proposed and a prototype system for emotion regulation is implemented in this study. An agent with autonomous emotion expression is constructed in our method

which can reasonably identify user emotions through expression

text dialogue

and gesture as well as realize the regulation according to this information. Our method can easily be promoted because expensive hardware is unnecessary. The proposed method provides a computable scheme for the regulation of negative emotions and can be useful in monitoring and regulating emotions of people living at home and socially isolated in the postepidemic period.

关键词

人机交互(HCI)情绪调节机器学习情感计算多模态

Keywords

human computer interaction(HCI)emotion regulationmachine learningaffective computingmultimodality

references

Bhattacharya S, Dasgupta A, Sengupta A and Routray A. 2019. Interest indices for human computer interaction applications based on facial image sequences//Proceedings of the 16th India Council International Conference. Rajkot: IEEE: 1-4[DOI: 10.1109/INDICON47234.2019.9029016http://dx.doi.org/10.1109/INDICON47234.2019.9029016]

Boyko N, Basystiuk O and Shakhovska N. 2018. Performance evaluation and comparison of software for face recognition, based on Dlib and Opencv library//Proceedings of 2018 IEEE Second International Conference on Data Stream Mining and Processing. Lviv: IEEE: 478-482[DOI: 10.1109/DSMP.2018.8478556http://dx.doi.org/10.1109/DSMP.2018.8478556]

Bunga M H T and Suyanto S. 2019. Developing a complete dialogue system using long short-term memory//Proceedings of 2019 International Seminar on Research of Information Technology and Intelligent Systems. Yogyakarta: IEEE: 326-329[DOI: 10.1109/ISRITI48646.2019.9034567http://dx.doi.org/10.1109/ISRITI48646.2019.9034567]

Cadayona A M, Cerilla N M S, Jurilla D M M, Balan A K D and De Goma J C. 2019. Emotional state classification: an additional step in emotion classification through face detection//Proceedings of the 6th International Conference on Industrial Engineering and Applications. Tokyo: IEEE: 667-671[DOI: 10.1109/IEA.2019.8715171http://dx.doi.org/10.1109/IEA.2019.8715171]

Chen Y T, Zhao H F and Chen J. 2016. The integration method of multimodal human-computer interaction framework//Proceedings of the 8th International Conference on Intelligent Human-Machine Systems and Cybernetics. Hangzhou: IEEE: 545-550[DOI: 10.1109/IHMSC.2016.245http://dx.doi.org/10.1109/IHMSC.2016.245]

Deng W H, Zhan Z Y, Yu Y and Wang W Z. 2019. Fatigue driving detection based on multi feature fusion//Proceedings of the 4th International Conference on Image, Vision and Computing. Xiamen: IEEE: 407-411[DOI: 10.1109/ICIVC47709.2019.8980929http://dx.doi.org/10.1109/ICIVC47709.2019.8980929]

Felzenszwalb P F, Girshick R B, McAllester D and Ramanan D. 2010. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627-1645[DOI:10.1109/TPAMI.2009.167]

Gao W, Chen S D, Long Q S, Yang J M and Yuan J J. 2018. The progress of emotion regulation methods and paradigms:From voluntary emotion regulation to automatic emotion regulation. Chinese Science Bulletin, 63(4):415-424

高伟, 陈圣栋, 龙泉杉, 杨洁敏, 袁加锦. 2018.情绪调节研究方法的蜕变:从有意情绪调节到自动化情绪调节.科学通报, 63(4):415-424[DOI:10.1360/N972017-00727]

Gu S Y and Lang F. 2017. A Chinese text corrector based on Seq2Seq model//Proceedings of 2017 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. Nanjing: IEEE: 322-325[DOI: 10.1109/CyberC.2017.82http://dx.doi.org/10.1109/CyberC.2017.82]

Li S and Deng W H. 2020. Deep facial expression recognition: a survey. IEEE Transactions on Affective Computing[EB/OL].[2020-05-01].https://arxiv.org/pdf/1804.08348.pdfhttps://arxiv.org/pdf/1804.08348.pdf[DOI: 10.1109/TAFFC.2020.2981446http://dx.doi.org/10.1109/TAFFC.2020.2981446].

Li X Y, Ji L L, Shao J J, Shen J L, Hu W P and Zhang W H. 2017. Is there the positive effect in automatic emotion regulation:evidence from an emotional cued go/nogo task. Journal of Psychological Science, 40(1):22-28

李西营, 姬玲玲, 邵景进, 申继亮, 胡卫平, 张文海. 2017.自动化情绪调节中存在积极效应吗:来自ERPs的证据.心理科学, 40(1):22-28[DOI:10.16719/j.cnki.1671-6981.20170104]

Liu T T, Liu Z, Xu H H, Wang J, Chai Y J and Lu J. 2020. Modeling virtual humans' emotions based on cognitive appraisal theory of emotion:a review. Journal of Psychological Science, 43(1):53-59

刘婷婷, 刘箴, 许辉煌, 王瑾, 柴艳杰, 陆静. 2020.基于情绪认知评价理论的虚拟人情绪模型研究.心理科学, 43(1):53-59[DOI:10.16719/j.cnki.1671-6981.20200108]

Lotfidereshgi R and Gournay P. 2017. Biologically inspired speech emotion recognition//Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing. New Orleans: IEEE: 5135-5139[DOI: 10.1109/ICASSP.2017.7953135http://dx.doi.org/10.1109/ICASSP.2017.7953135]

Marsella S C and Gratch J. 2009. EMA:a process model of appraisal dynamics. Cognitive Systems Research, 10(1):70-90[DOI:10.1016/j.cogsys.2008.03.005]

Picard R W. 1997. Affective Computing. Massachusetts: The MIT Press

Sajjadi P, Hoffmann L, Cimiano P and Kopp S. 2019. A personality-based emotional model for embodied conversational agents: effects on perceived social presence and game experience of users. Entertainment Computing, 32: #100313[DOI: 10.1016/j.entcom.2019.100313http://dx.doi.org/10.1016/j.entcom.2019.100313]

Shajideen S M S and Preetha V H. 2018. Hand gestures-virtual mouse for human computer interaction//Proceedings of 2018 International Conference on Smart Systems and Inventive Technology. Tirunelveli: IEEE: 543-546[DOI: 10.1109/ICSSIT.2018.8748401http://dx.doi.org/10.1109/ICSSIT.2018.8748401]

Sun J H, Ji T T, Zhang S B, Yang J K and Ji G R. 2018. Research on the hand gesture recognition based on deep learning//Proceedings of the 12th International Symposium on Antennas, Propagation and EM Theory. Hangzhou: IEEE: 1-4[DOI: 10.1109/ISAPE.2018.8634348http://dx.doi.org/10.1109/ISAPE.2018.8634348]

Vani R, Vyas T and Tahilramani N. 2018. CBIR using SVM, genetic algorithm, neural network, fuzzy logic, neuro-fuzzy technique: a survey//Proceedings of 2018 International Conference on Communication, Computing and Internet of Things. Chennai: IEEE: 239-242[DOI: 10.1109/IC3IoT.2018.8668197http://dx.doi.org/10.1109/IC3IoT.2018.8668197]

Vasisht V S, Joshi S, Shashidhar, Shreedhar and Gururaj C. 2019. Human computer interaction based eye controlled mouse//Proceedings of the 3rd International conference on Electronics, Communication and Aerospace Technology. Coimbatore: IEEE: 362-367[DOI: 10.1109/ICECA.2019.8822033http://dx.doi.org/10.1109/ICECA.2019.8822033]

Villani D, Carissoli C, Triberti S, Marchetti A, Gilli G and Riva G. 2018. Videogames for emotion regulation:a systematic review. Games for Health Journal, 7(2):85-99[DOI:10.1089/g4 h.2017.0108]

Wang K J, Zheng C Y and Mao Z H. 2019. Human-centered, ergonomic wearable device with computer vision augmented intelligence for VR multimodal human-smart home object interaction//Proceedings of the 14th ACM/IEEE International Conference on Human-Robot Interaction. Daegu: IEEE: 767-768[DOI: 10.1109/HRI.2019.8673156http://dx.doi.org/10.1109/HRI.2019.8673156]

Wu Y J, Yang X L, Li Y J, Li H B and Yang W D. 2018. Brainwave analysis in virtual reality based emotional regulation training//Proceedings of 2018 International Conference on Computational Science and Computational Intelligence. Las Vegas: IEEE: 691-696[DOI: 10.1109/CSCI46756.2018.00139http://dx.doi.org/10.1109/CSCI46756.2018.00139]

Yang M H and Tao J H. 2018. Intelligence methods of multi-modal information fusion in human-computer interaction. SCIENTIA SINICA Informationis, 48(4):433-448

杨明浩, 陶建华. 2018.多通道人机交互信息融合的智能方法.中国科学:信息科学, 48(4):433-448[DOI:10.1360/N112017-00211]

Yao Y S and Huang Z. 2016. Bi-directional LSTM recurrent neural network for Chinese word segmentation//Proceedings of the 23rd International Conference on Neural Information Processing. Cham: Springer: 345-353[DOI: 10.1007/978-3-319-46681-1_42http://dx.doi.org/10.1007/978-3-319-46681-1_42]

Zhang Y, Sun P, Yin Y H, Lin L and Wang X S. 2018. Human-like autonomous vehicle speed control by deep reinforcement learning with double Q-learning//Proceedings of 2018 IEEE Intelligent Vehicles Symposium. Changshu: IEEE: 1251-1256[DOI: 10.1109/IVS.2018.8500630http://dx.doi.org/10.1109/IVS.2018.8500630]

文章被引用时，请邮件提醒。

提交