Current Issue Cover
模仿学习综述:传统与新进展

张超,白文松,杜歆,柳伟杰,周晨浩,钱徽(浙江大学计算机科学与技术学院;浙江大学信息与电子工程学院)

摘 要
模仿学习是强化学习与监督学习的结合,目标是通过观察专家演示,学习专家策略,从而加速强化学习。通过引入任务相关的额外信息,模仿学习相较于强化学习,可以更快地实现策略优化,为缓解低样本效率问题提供了解决方案。近年来,模仿学习已成为解决强化学习问题的一种流行框架,涌现出多种提高学习性能的算法和技术。通过与图形图像学的最新研究成果相结合,模仿学习已经在游戏AI (artificial intelligence)、机器人控制、自动驾驶等领域发挥了重要作用。本综述围绕模仿学习的年度发展,从行为克隆、逆强化学习、对抗式模仿学习、基于观察量的模仿学习和跨领域模仿学习等多个角度进行了深入探讨。综述介绍了模仿学习在实际应用上的最新情况,比较了国内外研究现状,并展望了该领域未来的发展方向。报告旨在为研究人员和从业人员提供模仿学习的最新进展,从而为开展工作提供参考与便利。
关键词
Survey of imitation learning: tradition and new advances

Zhang Chao,Bai Wensong,Du Xin,Liu Weijie,Zhou Chenhao,Qian Hui(College of Computer Science and Technology, Zhejiang University;College of Information Science & Electronic Engineering, Zhejiang University)

Abstract
Imitation learning is a combination of reinforcement learning and supervised learning, with the goal of accelerating reinforcement learning though observing expert demonstrations and learning expert strategies. By introducing additional information related to the task, imitation learning can achieve policy improvement faster than reinforcement learning, which provides the possibility to alleviate the low sample efficiency problem. In recent years, imitation learning has become a popular framework for solving reinforcement learning problems, and a variety of algorithms and techniques have emerged to improve the performance of learning procedure. Combined with the latest research in the field of image processing, imitation learning has played an important role in game AI, robot control, autonomous driving, and other fields. Traditional imitation learning methods mainly include behavioral cloning (BC), inverse reinforcement learning (IRL) and adversarial imitation learning (AIL). These methods have relatively simple technical routes and a relatively single framework, which can usually achieve satisfactory performance on simple tasks. With the substantial improvement of computing power in recent years and the rapid development of upstream graphics and image tasks (such as object recognition, scene understanding, etc.), imitation learning methods that integrate a variety of technologies have also emerged, and have achieved great success in complex tasks. New advances in imitation learning are summarized as imitation learning from observation (ILfO) and cross-domain imitation learning (CDIL) in this survey. ILfO relaxes the requirements for expert demonstration, and carry out learning only from the observable information, but without the need to obtain specific action information from experts. This setting makes imitation learning algorithms more practical, and can be applied to real-life scenes. According to whether it is necessary to model the environment transition dynamics, ILfO algorithms can be divided into two categories: model-based and model-free. Model-based methods can be further assorted into forward dynamic model and inverse dynamic model, according to the way of constructing the model in the process of interaction between the agent and the environment. On the other hand, model-free methods mainly include adversarial methods and reward function engineering methods. Cross-domain imitation learning mainly focuses on the situation that agents and experts are in different domains, such as different Markov decision processes. Current CDIL research mainly focuses on the domain differences among three aspects: 1) transition dynamics discrepancy, 2) morphological discrepancy, 3) view point discrepancy. According to the main technical path that the algorithm depends on, solutions to CDIL problems can be mainly divided into: 1) direct method, 2) mapping method, 3) adversarial method, 4) optimal transport method. The application of imitation learning is mainly concentrated in the fields of game AI, robot control, and automatic driving. Latest research in image processing, such as object detection, video understanding, video classification, video recognition, etc., have greatly improved the recognition and perception capabilities of intelligent agents, which provides an important cornerstone for the new progress and new applications of imitation learning. This survey focuses on the annual development of imitation learning, including behavioral cloning, inverse reinforcement learning, adversarial imitation learning, imitation learning from observation, and cross-domain imitation learning. In-depth discussions are carried out from each angle, the latest research of imitation learning in related application fields is introduced. This survey compares the research statue around the world, and the future development direction of imitation learning is also prospected. The report will provide researchers and practitioners with new advances in imitation learning, along with reference and convenience.
Keywords

订阅号|日报