陶建华1, 龚江涛2, 高楠3, 傅四维4, 梁山5, 喻纯3(1.清华大学自动化系, 北京 100084;2.清华大学智能产业研究院, 北京 100084;3.清华大学计算机科学与技术系, 北京 100084;4.之江实验室, 杭州 311121;5.中国科学院自动化研究所, 北京 100190)
Human-computer interaction for virtual-real fusion
Tao Jianhua1, Gong Jiangtao2, Gao Nan3, Fu Siwei4, Liang Shan5, Yu Chun3(1.Department of Automation, Tsinghua University, Beijing 100084, China;2.Institute for AI Industry Research, Tsinghua University, Beijing 100084, China;3.Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China;4.Zhejiang Laboratory, Hangzhou 311121, China;5.Institute of Automation, Chinese Academy of Science, Beijing 100190, China)
Virtual-real human-computer interaction(VR-HCI) is an interdisciplinary field that encompasses human and computer interactions to address human-related cognitive and emotional needs. This interdisciplinary knowledge integrates domains such as computer science, cognitive psychology, ergonomics, multimedia technology, and virtual reality. With the advancement of big data and artificial intelligence, VR-HCI benefits industries like education, healthcare, robotics, and entertainment, and is increasingly recognized as a key supporting technology for metaverse-related development. In recent years, machine learning-based human cognitive and emotional analysis has evolved, particularly in applications like robotics and wearable interaction devices. As a result, VR-HCI has focused on the challenging issue of creating "intelligent" and "anthropomorphic" interaction systems. This literature review examines the growth of VR-HCI from four aspects:perceptual computing, human-machine interaction and coordination, human-computer dialogue interaction, and data visualization. Perceptual computing aims to model human daily life behavior, cognitive processes, and emotional contexts for personalized and efficient human-computer interactions. This discussion covers three perceptual aspects related to pathways, objects, and scenes. Human-machine interaction scenarios involve virtual and real-world integration and perceptual pathways, which are divided into primary perception types:visual-based, sensor-based, and wireless non-contact. Object-based perception is subdivided into personal and group contexts, while scene-based perception is subdivided into physical behavior and cognitive contexts. Human-machine interaction primarily encompasses technical disciplines such as mechanical and electrical engineering, computer and control science, artificial intelligence, and other related arts or humanistic disciplines like psychology and design. Human-robot interaction can be categorized by functional mechanisms into 1) collaborative operation robots, 2) service and assistance robots, and 3) social, entertainment, and educational robots. Key modules in human-computer dialogue interaction systems include speech recognition, speaker recognition, dialogue system, and speech synthesis. The level of intelligence in these interaction systems can be further enhanced by considering users' inherent characteristics, such as speech pronunciation, preferences, emotions, and other attributes. For human-machine interaction, it mainly involves technical disciplines in relevant to mechanical and electrical engineering, computer and control science, and artificial intelligence, as well as other related arts or humanistic disciplines like psychology and design. Humans-robots interaction can be segmented into three categories in terms of its functional mechanism:1) collaborative operation robots, 2) service and assistance robots, and 3) social, entertainment and educational robots. For human-computer dialogue interaction, the system consists of such key modules like speech recognition, speaker recognition, dialogue system, and speech synthesis. The microphone sensor can pick up the speech signal, which is then converted to text information through the speech recognition module. The dialogue system can process the text information, understand the user's intention, and generates a reply. Finally, the speech-synthesized module can convert the reply information into speech information, completing the interaction process. In recent years, the level of intelligence of the interaction system can be further improved by combining users' inherent characteristics such as speech pronunciation, preferences, emotions, and other characteristics, optimizing the various modules of the interaction system. For data transformation and visualization, it is benched for performing data cleaning tasks on tabular data, and various tools in R and Python can perform these tasks as well. Many software systems have developed graphical user interfaces to assist users in completing data transformation tasks, such as Microsoft Excel, Tableau Prep Builder, and OpenRefine. Current recommendation-based algorithms interactive systems are beneficial for users transform data easily. Researchers have also developed tools that can transform network structures. We analyze its four aspects of 1) interactive data transformation, 2) data transformation visualization, 3) data table visual comparison, and 4) code visualization in human-computer interaction systems. We identify several future research directions in VR-HCI, namely 1) designing generalized and personalized perceptual computing, 2) building human-machine cooperation with a deep understanding of user behavior, and 3) expanding user-adaptive dialogue systems. For perceptual computing, it still lacks joint perception of multiple devices and individual differences in human behavior perception. Furthermore, most perceptual research can use generalized models, neglecting individual differences, resulting in lower perceptual accuracy, making it difficult to apply in actual settings. Therefore, future perceptual computing research trends are required for multimodal, transferable, personalized, and scalable research. For human-machine interaction and coordination, a systematic approach is necessary for constructing a design for human-machine interaction and collaboration. This approach requires in-depth research on user understanding, construction of interaction datasets, and long-term user experience. For human-computer dialogue interaction, current research mostly focuses on open-domain systems, which use pre-trained models to improve modeling accuracy for emotions, intentions, and knowledge. Future research should be aimed at developing more intelligent human-machine conversations that cater to individual user needs. For data transformation and visualization in HCI, the future directions can be composed of two parts:1) the intelligence level of data transformation can be improved through interaction for individual data workers on several aspects, e. g., appropriate algorithms for multiple types of data, recommendations for consistent user behavior and real-time analysis to support massive data. 2) The focus is on the integration of data transformation and visualization among multiple users, including designing collaborative mechanisms, resolving conflicts in data operation, visualizing complex data transformation codes, evaluating the effectiveness of various visualization methods, and recording and displaying multiple human behaviors. In summary, the development of VR-HCI can provide new opportunities and challenges for human-computer interaction towards Metaverse, which has the potential to seamlessly integrate virtual and real worlds.