Current Issue Cover
面向情绪调节的多模态人机交互技术

张凯乐1, 刘婷婷2, 刘箴1, 庄寅1, 柴艳杰1(1.宁波大学信息科学与工程学院, 宁波 315211;2.宁波大学科学技术学院, 慈溪 315300)

摘 要
目的 现代社会存在心理问题的人日趋增多,及时调节其负面情绪对社会和谐稳定具有重要现实意义。传统的情绪调节方法需要花费大量人力,为此提出一种面向情绪调节的多模态人机交互方法,识别用户情绪,采用文本对话和体感交互实现对用户情绪的调节。方法 综合运用了表情识别、文本对话和手势来实现对用户情绪的识别,构建了具有情绪表现力的智能体。用户的表情识别采用支持向量机方法,文本对话采用基于规则和融入情绪因素的Seq2Seq模型实现。设计了聊天、过生日和互动游戏(打篮球)等交互剧情,采用手势和肢体动作来辅助交互。为了更好地体现情绪调节的作用,在互动游戏中设计了强化学习算法,可根据用户的情绪反馈来自动调整游戏难度,最终让用户情绪调整到积极状态。结果 通过实验发现,采用单模态交互很难感知到用户的背景信息,因此对用户的情绪识别可能出现误判。而采用多模态的人机交互方式,可以通过文本对话了解用户的背景信息,从而对用户情绪的识别更为合理。在多模态的人机交互剧情中,用户能以更自然的方式实现情景互动,智能体在多模态交互中更能发挥情绪调节作用。结论 本文提出一种基于多模态人机交互的情绪调节方法,该方法不需要昂贵的硬件设备,便于推广普及,为消极情绪的调节提供了一种可计算方案。
关键词
Multimodal human-computer interactive technology for emotion regulation

Zhang Kaile1, Liu Tingting2, Liu Zhen1, Zhuang Yin1, Chai Yanjie1(1.Faculty of Information Science and Technology, Ningbo University, Ningbo 315211, China;2.College of Science and Technology, Ningbo University, Cixi 315300, China)

Abstract
Objective Emotion is closely related to human social life. People increasingly encounter psychological problems because of the aging population and the acceleration of the pace of social life. If people's negative emotions cannot be adjusted in a timely manner, then social harmony and stability may have adverse effects. Face-to-face professional counseling for people with psychological problems can achieve emotional regulation. However, the number of treatment centers and counselors that have the ability to carry out psychotherapy is insufficient. At the same time, some people are unwilling to confide their psychological problems to others to protect their privacy. The development of artificial intelligence, virtual reality, and human computer interaction(HCI) enables emotion regulation in the use of a suitable human-computer emotional interaction system. A multimodal emotional interaction model is proposed in this study to achieve an improved emotional regulation effect because existing human-computer emotional interaction methods are in single mode, and do not consider machine learning algorithms. The proposed model integrates many emotional interaction methods, including text dialogue, somatosensory interaction, and expression recognition, and provides a new strategy for negative emotion regulation. Method The proposed model uses expression recognition and text dialogue to detect user emotions and designs a three-dimensional realistic agent, which can express itself through its facial expression, body posture, voice, and text, to interact with users. Traditional support vector machine(SVM) method is used to recognize user expressions, and the data-driven method based on production rules is utilized to realize the text dialogue. Emotion dictionary, syntactic analysis, and production rules are combined to realize the text sentiment analysis for the input text containing emotional words. The Seq2Seq model with emotional factors is used to realize the text sentiment analysis for the input text without emotional words. In addition, multimodal HCI scenarios, including conversations, birthdays, and interactive games (playing basketball), are used to achieve emotional regulation. Hand gestures and body movements are utilized to assist the interaction. Felzenszwalb histogram of orizented gradients(FHOG) features are used to extract and recognize gestures, and the tracking algorithm MediaFlow is applied to track gestures. The user's body movement can be assessed according to the change in joint point position. As a companion,the agent can improve the user's experience. The collected expression, posture, and text information from users can comprehensively assess the emotion state of users., Reinforcement learning algorithm is used to regulate emotions further and improve the feelings of users by automatically adjusting the difficulty of the game according to the emotional feedback of users. Accordingly, a prototype multimodal interactive system for emotion regulation is implemented using the computer. A normal camera is used for expression and gesture recognition, Kinect is utilized for body motion recognition, and iFLYTEK is applied to convert the voice input of users into text. Result The regulation effect of single-mode and multimode HCIs is compared. Results showed that the interaction between the agent and the user is limited in the single-mode HCI. The agent can neither fully understand why the user has such an emotion nor take appropriate measures to regulate user emotion. Hence, the user, who was not properly regulated, may feel disappointed. By contrast, the agent can fully understand the user emotion through multiple channels and provide a reasonable adjustment scheme in the multimodal interaction. The user will have additional interactions with the agent to participate in the regulation. Emotions can be regulated by language and exercise. This comprehensive and natural regulation interaction is effective in achieving enhanced emotion regulation. Conclusion A multimodal HCI method is proposed and a prototype system for emotion regulation is implemented in this study. An agent with autonomous emotion expression is constructed in our method, which can reasonably identify user emotions through expression, text dialogue, and gesture as well as realize the regulation according to this information. Our method can easily be promoted because expensive hardware is unnecessary. The proposed method provides a computable scheme for the regulation of negative emotions and can be useful in monitoring and regulating emotions of people living at home and socially isolated in the postepidemic period.
Keywords

订阅号|日报