Survey of imitation learning: tradition and new advances
- Vol. 28, Issue 6, Pages: 1585-1607(2023)
Published: 16 June 2023
DOI: 10.11834/jig.230028
移动端阅览
浏览全部资源
扫码关注微信
Published: 16 June 2023 ,
移动端阅览
张超, 白文松, 杜歆, 柳伟杰, 周晨浩, 钱徽. 2023. 模仿学习综述:传统与新进展. 中国图象图形学报, 28(06):1585-1607
Zhang Chao, Bai Wensong, Du Xin, Liu Weijie, Zhou Chenhao, Qian Hui. 2023. Survey of imitation learning: tradition and new advances. Journal of Image and Graphics, 28(06):1585-1607
模仿学习是强化学习与监督学习的结合,目标是通过观察专家演示,学习专家策略,从而加速强化学习。通过引入任务相关的额外信息,模仿学习相较于强化学习,可以更快地实现策略优化,为缓解低样本效率问题提供了解决方案。模仿学习已成为解决强化学习问题的一种流行框架,涌现出多种提高学习性能的算法和技术。通过与图形图像学的最新研究成果相结合,模仿学习已经在游戏人工智能(artificial intelligence,AI)、机器人控制和自动驾驶等领域发挥了重要作用。本文围绕模仿学习的年度发展,从行为克隆、逆强化学习、对抗式模仿学习、基于观察量的模仿学习和跨领域模仿学习等多个角度进行深入探讨,介绍了模仿学习在实际应用上的最新情况,比较了国内外研究现状,并展望了该领域未来的发展方向。旨在为研究人员和从业人员提供模仿学习的最新进展,从而为开展工作提供参考与便利。
Imitation learning(IL) is focused on the integration of reinforcement learning and supervised learning through observing demonstrations and learning expert strategies. The additional information related imitation learning can be used to optimize and implement its strategy, which can provide the possibility to alleviate low efficiency of sample problem. In recent years, imitation learning has become a popular framework for solving reinforcement learning problems, and a variety of algorithms and techniques have emerged to improve the performance of learning procedure. Combined with the latest research in the field of image processing, imitation learning has played an important role in such domains like game artificial intelligence (AI), robot control, autonomous driving. Traditional imitation learning methods are mainly composed of behavioral cloning (BC), inverse reinforcement learning (IRL), and adversarial imitation learning (AIL). Thanks to the computing ability and upstream graphics and image tasks (such as object recognition and scene understanding), imitation learning methods can be used to integrate a variety of technologies-emerged for complex tasks. We summarize and analyze imitation learning further, which is composed of imitation learning from observation (ILfO) and cross-domain imitation learning (CDIL). The ILfO can be used to optimize the requirements for expert demonstration, and information-observable can be learnt only without specific action information from experts. This setting makes imitation learning algorithms more practical, and it can be applied to real-life scenes. To alter the environment transition dynamics modeling, ILfO algorithms can be divided into two categories: model-based and model-free. For model-based methods, due to path-constructed of the model in the process of interaction between the agent and the environment, it can be assorted into forward dynamic model and inverse dynamic model further. The other related model-free methods are mainly composed of adversarial-based and function-rewarded engineering. Cross-domain imitation learning are mainly focused on the status of different domains for agents and experts, such as multiple Markov decision processes. Current CDIL research are mainly focused on the domain differences of three aspects of discrepancy in relevant to: transition dynamics, morphological, and view point. The technical solutions to CDIL problems can be mainly divided into such methods like: direct, mapping, adversarial, and optimal transport. The application of imitation learning is mainly on such aspects like game AI, robot control, and automatic driving. The recognition and perception capabilities of intelligent agents are optimized further in image processing, such as object detection, video understanding, video classification, and video recognition. Our critical analysis can be focused on the annual development of imitation learning from the five aspects: behavioral cloning, inverse reinforcement learning, adversarial imitation learning, imitation learning from observation, and cross-domain imitation learning.
模仿学习(IL)强化学习基于观察量的模仿学习(ILfO)跨领域模仿学习(CDIL)模仿学习应用
imitation learning (IL)reinforcement learningimitation learning form observation (ILfO)cross domain imitation learning (CDIL)application of imitation learning
Abbeel P, Coates A, Quigley M and Ng A Y. 2006. An application of reinforcement learning to aerobatic helicopter flight//Proceedings of the 19th International Conference on Neural Information Processing Systems. Vancouver, Canada: MIT Press: 1-8
Agarwal A, Jiang N, Kakade S M and Sun W. 2022. Reinforcement learning: theory and algorithms [EB/OL]. [2022-01-31]. https://rltheorybook.github.io/rltheorybook_AJKS.pdfhttps://rltheorybook.github.io/rltheorybook_AJKS.pdf
Argall B D, Chernova S, Veloso M and Browning B. 2009. A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5): 469-483 [DOI: 10.1016/j.robot.2008.10.024http://dx.doi.org/10.1016/j.robot.2008.10.024]
Arora S and Doshi P. 2021. A survey of inverse reinforcement learning: challenges, methods and progress. Artificial Intelligence, 297: #103500 [DOI: 10.1016/j.artint.2021.103500http://dx.doi.org/10.1016/j.artint.2021.103500]
Attia A and Dayan S. 2018. Global overview of imitation learning [EB/OL]. [2023-01-14]. https://arxiv.org/pdf/1801.06503.pdfhttps://arxiv.org/pdf/1801.06503.pdf
Aytar Y, Pfaff T, Budden D, Le Paine T, Wang Z Y and de Freitas N. 2018. Playing hard exploration games by watching YouTube//Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, Canada: Curran Associates Inc.: 2935-2945
Bain M and Sammut C. 1999. A framework for behavioural cloning//Furukawa K and Michie D, eds. Machine Intelligence 15, Intelligent Agents. Oxford, UK: Oxford University: 103-129
Bertasius G, Wang H and Torresani L. 2021. Is space-time attention all you need for video understanding?//Proceedings of the 38th International Conference on Machine Learning. [s.l.]: PMLR: 813-824
Bhattacharyya R P, Phillips D J, Wulfe B, Morton J, Kuefler A and Kochenderfer M J. 2018. Multi-agent imitation learning for driving simulation//Proceedings of 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Madrid, Spain: IEEE: 1534-1539 [DOI: 10.1109/IROS.2018.8593758http://dx.doi.org/10.1109/IROS.2018.8593758]
Bottou L and Bousquet O. 2007. The tradeoffs of large scale learning//Proceedings of the 20th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 161-168
Brantley K, Sun W and Henaff M. 2020. Disagreement-regularized imitation learning//Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: OpenReview.net
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J and Zaremba W. 2016. OpenAI gym [EB/OL]. [2023-01-14]. https://arxiv.org/pdf/1606.01540.pdfhttps://arxiv.org/pdf/1606.01540.pdf
Buccino G, Vogt S, Ritzl A, Fink G R, Zilles K, Freund H J and Rizzolatti G. 2004. Neural circuits underlying imitation learning of hand actions: an event-related fMRI study. Neuron, 42(2): 323-334 [DOI: 10.1016/s0896-6273(04)00181-3http://dx.doi.org/10.1016/s0896-6273(04)00181-3]
Calinon S and Billard A. 2007. Incremental learning of gestures by imitation in a humanoid robot//Proceedings of the 2nd ACM/IEEE International Conference on Human-robot Interaction. Arlington, USA: IEEE: 255-262 [DOI: 10.1145/1228716.1228751http://dx.doi.org/10.1145/1228716.1228751]
Chen J Y, Yuan B D and Tomizuka M. 2019. Deep imitation learning for autonomous driving in generic urban scenarios with enhanced safety//Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Macau, China: IEEE: 2884-2890 [DOI: 10.1109/IROS40897.2019.8968225http://dx.doi.org/10.1109/IROS40897.2019.8968225]
Codevilla F, Müller M, López A, Koltun V and Dosovitskiy A. 2018. End-to-end driving via conditional imitation learning//Proceedings of 2018 IEEE International Conference on Robotics and Automation. Brisbane, Australia: IEEE: 4693-4700 [DOI: 10.1109/ICRA.2018.8460487http://dx.doi.org/10.1109/ICRA.2018.8460487]
Codevilla F, Santana E, Lopez A and Gaidon A. 2019. Exploring the limitations of behavior cloning for autonomous driving//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 9329-9338 [DOI: 10.1109/ICCV.2019.00942http://dx.doi.org/10.1109/ICCV.2019.00942]
Dadashi R, Hussenot L, Geist M and Pietquin O. 2021. Primal wasserstein imitation learning//Proceedings of the 9th International Conference on Learning Representations. [s.l.]: OpenReview.net
Dai X Y, Lin J H, Zhang W N, Li S, Liu W W, Tang R M, He X Q, Hao J Y, Wang J and Yu Y. 2021. An adversarial imitation click model for information retrieval//Proceedings of the Web Conference 2021. Ljubljana, Slovenia: ACM: 1809-1820 [DOI: 10.1145/3442381.3449913http://dx.doi.org/10.1145/3442381.3449913]
de Haan P, Jayaraman D and Levine S. 2019. Causal confusion in imitation learning//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 11698-11709 [DOI: 10.5555/3454287.3455336http://dx.doi.org/10.5555/3454287.3455336]
Edwards A D, Sahni H, JrSchroecker Y and Isbell C L. 2019. Imitating latent policies from observation//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR: 1755-1763
Fang B, Jia S D, Guo D, Xu M H, Wen S H and Sun F C. 2019. Survey of imitation learning for robotic manipulation. International Journal of Intelligent Robotics and Applications, 3(4): 362-369 [DOI: 10.1007/s41315-019-00103-5http://dx.doi.org/10.1007/s41315-019-00103-5]
Fei C, Wang B, Zhuang Y Z, Zhang Z Z, Hao J Y, Zhang H B, Ji X W and Liu W L. 2020. Triple-GAIL: a multi-modal imitation learning framework with generative adversarial nets//Proceedings of the 29th International Joint Conference on Artificial Intelligence. Yokohama, Japan: Morgan Kaufmann: IJCAI.org:2929-2935 [DOI: 10.24963/ijcai.2020/405http://dx.doi.org/10.24963/ijcai.2020/405]
Feichtenhofer C. 2020. X3D: expanding architectures for efficient video recognition//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 203-213 [DOI: 10.1109/cvpr42600.2020.00028http://dx.doi.org/10.1109/cvpr42600.2020.00028]
Feng C J, Zhong Y J, Gao Y, Scott M R and Huang W L. 2021. TOOD: task-aligned one-stage object detection//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montréal, Canada: IEEE: 3490-3499 [DOI: 10.1109/ICCV48922.2021.00349http://dx.doi.org/10.1109/ICCV48922.2021.00349]
Ferrari P F, Rozzi S and Fogassi L. 2005. Mirror neurons responding to observation of actions made with tools in monkey ventral premotor cortex. Journal of Cognitive Neuroscience, 17(2): 212-226 [DOI: 10.1162/0898929053124910http://dx.doi.org/10.1162/0898929053124910]
Fickinger A, Cohen S, Russell S and Amos B. 2022. Cross-domain imitation learning via optimal transport//Proceedings of the 10th International Conference on Learning Representations. [s.l.]: OpenReview.net
Field M, Stirling D, Naghdy F and Pan Z X. 2009. Motion capture in robotics review//Proceedings of 2009 IEEE International Conference on Control and Automation. Christchurch, New Zealand: IEEE: 1697-1702 [DOI: 10.1109/icca.2009.5410185http://dx.doi.org/10.1109/icca.2009.5410185]
Finn C, Christiano P, Abbeel P and Levine S. 2016a. A connection between generative adversarial networks, inverse reinforcement learning, and energy-based models [EB/OL]. [2023-01-14]. https://arxiv.org/pdf/1611.03852.pdfhttps://arxiv.org/pdf/1611.03852.pdf
Finn C, Levine S and Abbeel P. 2016b. Guided cost learning: deep inverse optimal control via policy optimization//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA: JMLR.org: 49-58
Fu J, Luo K and Levine S. 2017. Learning robust rewards with adversarial inverse reinforcement learning [EB/OL]. [2023-01-14]. https://arxiv.org/pdf/1710.11248.pdfhttps://arxiv.org/pdf/1710.11248.pdf
Ghavamzadeh M, Mannor S, Pineau J and Tamar A. 2015. Bayesian reinforcement learning: a survey. Foundations and Trends® in Machine Learning, 8(5/6): 359-483 [DOI: 10.1561/2200000049http://dx.doi.org/10.1561/2200000049]
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A and Bengio Y. 2014. Generative adversarial nets//Proceedings of the 27th International Conference on Neural Information Processing Systems. Quebec, Canada: MIT Press: 2672–2680 [DOI: 10.5555/2969033.2969125http://dx.doi.org/10.5555/2969033.2969125]
Grigorescu S, Trasnea B, Cocias T and Macesanu G. 2020. A survey of deep learning techniques for autonomous driving. Journal of Field Robotics, 37(3): 362-386 [DOI: 10.1002/rob.21918http://dx.doi.org/10.1002/rob.21918]
Guo X, Chang S, Yu M, Tesauro G and Campbell M. 2019. Hybrid reinforcement learning with expert state sequences//Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Hawaii, USA: AAAI: 3739-3746 [DOI: 10.1609/aaai.v33i01.33013739http://dx.doi.org/10.1609/aaai.v33i01.33013739]
Gupta A, Devin C, Liu Y X, Abbeel P and Levine S. 2017. Learning invariant feature spaces to transfer skills with reinforcement learning//Proceedings of the 5th International Conference on Learning Representations. Toulon, France: OpenReview.net
Haarnoja T, Tang H R, Abbeel P and Levine S. 2017. Reinforcement learning with deep energy-based policies//Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: JMLR.org: 1352-1361
Haarnoja T, Zhou A, Abbeel P and Levine S. 2018. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor//Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: PMLR: 1861-1870
Hanna J and Stone P. 2017. Grounded action transformation for robot learning in simulation//Proceedings of the 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI: 3834-3840 [DOI: 10.1609/aaai.v31i1.11124http://dx.doi.org/10.1609/aaai.v31i1.11124]
Hao X T, Wang W X, Hao J Y and Yang Y D. 2019. Independent generative adversarial self-imitation learning in cooperative multiagent systems//Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems. Montréal, Canada: International Foundation for Autonomous Agents and Multiagent Systems: 1315-1323
Henderson P, Chang W D, Bacon P L, Meger D, Pineau J and Precup D. 2018. Optiongan: learning joint reward-policy options using generative adversarial inverse reinforcement learning//Proceedings of the 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI: 3199-3206 [DOI: 10.1609/aaai.v32i1.11775http://dx.doi.org/10.1609/aaai.v32i1.11775]
Ho J and Ermon S. 2016. Generative adversarial imitation learning//Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, Spain: Curran Associates Inc.: 4572-4580
Ijspeert A J, Nakanishi J and Schaal S. 2001. Trajectory formation for imitation with nonlinear dynamical systems//Proceedings of 2001 IEEE/RSJ International Conference on Intelligent Robots and Systems. Maui, USA: IEEE: 752-757 [DOI: 10.1109/iros.2001.976259http://dx.doi.org/10.1109/iros.2001.976259]
Ijspeert A J, Nakanishi J and Schaal S. 2002. Movement imitation with nonlinear dynamical systems in humanoid robots//Proceedings of 2002 IEEE International Conference on Robotics and Automation. Washington, USA: IEEE: 1398-1403 [DOI: 10.1109/robot.2002.1014739http://dx.doi.org/10.1109/robot.2002.1014739]
Ingimundardottir H and Runarsson T P. 2018. Discovering dispatching rules from data using imitation learning: a case study for the job-shop problem. Journal of Scheduling, 21(4): 413-428 [DOI: 10.1007/s10951-017-0534-0http://dx.doi.org/10.1007/s10951-017-0534-0]
Jiang S Y, Pang J C and Yu Y. 2020. Offline imitation learning with a misspecified simulator//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: #713
Jin Z J, Qian H, Chen S Y and Zhu M L. 2011. Convergence analysis of an incremental approach to online inverse reinforcement learning. Journal of Zhejiang University Science C, 12(1): 17-24 [DOI: 10.1631/jzus.c1010010http://dx.doi.org/10.1631/jzus.c1010010]
Jing M X, Huang W B, Sun F C, Ma X J, Kong T, Gan C and Li L. 2021. Adversarial option-aware hierarchical imitation learning//Proceedings of the 38th International Conference on Machine Learning. [s.l.]: PMLR: 5097-5106
Karnan H, Warnell G, Xiao X S and Stone P. 2022. VOILA: visual-observation-only imitation learning for autonomous navigation//Proceedings of 2022 International Conference on Robotics and Automation. Philadelphia, USA: IEEE: 2497-2503 [DOI: 10.1109/icra46639.2022.9812316http://dx.doi.org/10.1109/icra46639.2022.9812316]
Ke L, Choudhury S, Barnes M, Sun W, Lee G and Srinivasa S. 2021. Imitation learning as f-divergence minimization//LaValle S M, Lin M, Ojala T, Shell D and Yu J J, eds. Algorithmic Foundations of Robotics XIV. Cham: Springer: 313-329 [DOI: 10.1007/978-3-030-66723-8_19http://dx.doi.org/10.1007/978-3-030-66723-8_19]
Kebria P M, Khosravi A, Salaken S M and Nahavandi S. 2020. Deep imitation learning for autonomous vehicles based on convolutional neural networks. IEEE/CAA Journal of Automatica Sinica, 7(1): 82-95 [DOI: 10.1109/jas.2019.1911825http://dx.doi.org/10.1109/jas.2019.1911825]
Kidambi R, Chang J and Sun W. 2021. MobILE: model-based imitation learning from observation alone//Proceedings of the 35th Advances in Neural Information Processing Systems. [s.l.]: [s.n.]: 28598-28611
Kim K, Gu Y H, Song J M, Zhao S J and Ermon S. 2020. Domain adaptive imitation learning//Proceedings of the 37th International Conference on Machine Learning. [s.l.]: PMLR: 5286-5295
Kim K, Lee M W, Kim Y, Ryu J H, Lee M and Zhang B T. 2021. Goal-aware cross-entropy for multi-target reinforcement learning//Proceedings of the 35th Advances in Neural Information Processing Systems. [s.l.]: [s.n.]: 2783-2795
Kiran B R, Sobh I, Talpaert V, Mannion P, Al Sallab A A, Yogamani S and Pérez P. 2022. Deep reinforcement learning for autonomous driving: a survey. IEEE Transactions on Intelligent Transportation Systems, 23(6): 4909-4926 [DOI: 10.1109/TITS.2021.3054625http://dx.doi.org/10.1109/TITS.2021.3054625]
Kläser K, Varsavsky T, Markiewicz P, Vercauteren T, Hammers A, Atkinson D, Thielemans K, Hutton B, Cardoso M J and Ourselin S. 2021. Imitation learning for improved 3D PET/MR attenuation correction. Medical Image Analysis, 71: #102079 [DOI: 10.1016/j.media.2021.102079http://dx.doi.org/10.1016/j.media.2021.102079]
Klein E, Geist M, Piot B and Pietquin O. 2012. Inverse reinforcement learning through structured classification//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA: Curran Associates Inc.: 1007-1015
Klein E, Piot B, Geist M and Pietquin O. 2013. A cascaded supervised learning approach to inverse reinforcement learning//Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Prague, Czech Republic: Springer: 1-16 [DOI: 10.1007/978-3-642-40988-2_1http://dx.doi.org/10.1007/978-3-642-40988-2_1]
Konidaris G and Barto A. 2006. Autonomous shaping: knowledge transfer in reinforcement learning//Proceedings of the 23rd International Conference on Machine learning. Pittsburgh, USA: Association for Computing Machinery: 489-496 [DOI: 10.1145/1143844.1143906http://dx.doi.org/10.1145/1143844.1143906]
Kuniyoshi Y, Inaba M and Inoue H. 1994. Learning by watching: extracting reusable task knowledge from visual observation of human performance. IEEE Transactions on Robotics and Automation, 10(6): 799-822 [DOI: 10.1109/70.338535http://dx.doi.org/10.1109/70.338535]
Levine S. 2018. Reinforcement learning and control as probabilistic inference: tutorial and review [EB/OL]. [2023-01-15]. https://arxiv.org/pdf/1805.00909.pdfhttps://arxiv.org/pdf/1805.00909.pdf
Levine S, Finn C, Darrell T and Abbeel P. 2016. End-to-end training of deep visuomotor policies. The Journal of Machine Learning Research, 17(1): 1334-1373
Levine S and Koltun V. 2013. Guided policy search//Proceedings of the 30th International Conference on Machine Learning. Atlanta, USA: JMLR.org: 1-9
Levine S, Popović Z and Koltun V. 2010. Feature construction for inverse reinforcement learning//Proceedings of the 23rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 1342-1350
Li C Y, Li L L, Jiang H L, Weng K H, Geng Y F, Li L, Ke Z D, Li Q Y, Cheng M, Nie W Q, Li Y D, Zhang B, Liang Y F, Zhou L Y, Xu X M, Chu X X, Wei X M and Wei X L. 2022a. YOLOv6: a single-stage object detection framework for industrial applications [EB/OL]. [2023-01-15]. https://arxiv.org/pdf/2209.02976.pdfhttps://arxiv.org/pdf/2209.02976.pdf
Li J C, Wang X, Tang S L, Shi H Z, Wu F, Zhuang Y T and Wang W Y. 2020. Unsupervised reinforcement learning of transferable meta-skills for embodied navigation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, USA: IEEE: 12123-12132 [DOI: 10.1109/cvpr42600.2020.01214http://dx.doi.org/10.1109/cvpr42600.2020.01214]
Li J Y, Lu T, Cao X G, Cai Y H and Wang S. 2022b. Meta-imitation learning by watching video demonstrations//Proceedings of the 10th International Conference on Learning Representations. [s.l.]: OpenReview.net
Li Y, Qin F B, Du S F, Xu D and Zhang J Q. 2021. Vision-based imitation learning of needle reaching skill for robotic precision manipulation. Journal of Intelligent and Robotic Systems, 101(1): #22 [DOI: 10.1007/s10846-020-01290-1http://dx.doi.org/10.1007/s10846-020-01290-1]
Liang X D, Wang T R, Yang L N and Xing E. 2018. CIRL: controllable imitative reinforcement learning for vision-based self-driving//Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: Springer: 584-599 [DOI: 10.1007/978-3-030-01234-2_36http://dx.doi.org/10.1007/978-3-030-01234-2_36]
Lin J, Gan C and Han S. 2019. TSM: temporal shift module for efficient video understanding//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 7083-7093 [DOI: 10.1109/iccv.2019.00718http://dx.doi.org/10.1109/iccv.2019.00718]
Liu B Y, Wang L J, Liu M and Xu C Z. 2020a. Federated imitation learning: a novel framework for cloud robotic systems with heterogeneous sensor data. IEEE Robotics and Automation Letters, 5(2): 3509-3516 [DOI: 10.1109/lra.2020.2976321http://dx.doi.org/10.1109/lra.2020.2976321]
Liu E Z, Hashemi M, Swersky K, Ranganathan P and Ahn J. 2020b. An imitation learning approach for cache replacement//Proceedings of the 37th International Conference on Machine Learning. [s.l.]: JMLR.org: #579
Liu F C, Ling Z, Mu T Z and Su H. 2020c. State alignment-based imitation learning//Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: OpenReview.net
Liu M H, He T R, Xu M K and Zhang W N. 2021a. Energy-based imitation learning//Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems. UK: ACM: 809-817
Liu M H, Zhao H Y, Yang Z Y, Shen J, Zhang W N, Zhao L and Liu T Y. 2021b. Curriculum offline imitating learning//Proceedings of the 35th Advances in Neural Information Processing Systems. [s.l.]: [s.n.]: 6266-6277
Liu M H, Zhu Z B, Zhuang Y Z, Zhang W N, Hao J Y, Yu Y and Wang J. 2022. Plan your target and learn your skills: transferable state-only imitation learning via decoupled policy optimization//Proceedings of 2022 International Conference on Machine Learning. Baltimore, USA: PMLR: 14173-14196
Liu Y X, Gupta A, Abbeel P and Levine S. 2018. Imitation from observation: learning to imitate behaviors from raw video via context translation//Proceedings of 2018 IEEE International Conference on Robotics and Automation (ICRA). Brisbane, Australia: IEEE: 1118-1125 [DOI: 10.1109/icra.2018.8462901http://dx.doi.org/10.1109/icra.2018.8462901]
Mandal S K, Bhat G, Patil C A, Doppa J R, Pande P P and Ogras U Y. 2019. Dynamic resource management of heterogeneous mobile platforms via imitation learning. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 27(12): 2842-2854 [DOI: 10.1109/tvlsi.2019.2926106http://dx.doi.org/10.1109/tvlsi.2019.2926106]
Meltzoff A N. 1999. Born to learn: what infants learn from watching us//Fox N and Worhol J G, eds. The Role of Early Experience in Infant Development. Skillman: Pediatric Institute Publications: 1-10
Mémoli F. 2011. Gromov-Wasserstein distances and the metric approach to object matching. Foundations of Computational Mathematics, 11(4): 417-487 [DOI: 10.1007/s10208-011-9093-5http://dx.doi.org/10.1007/s10208-011-9093-5]
Merel J, Tassa Y, Dhruva T B, Srinivasan S, Lemmon J, Wang Z Y, Wayne G and Heess N. 2017. Learning human behaviors from motion capture by adversarial imitation [EB/OL]. [2023-01-14]. https://arxiv.org/pdf/1707.02201.pdfhttps://arxiv.org/pdf/1707.02201.pdf
Nair A, Chen D, Agrawal P, Isola P, Abbeel P, Malik J and Levine S. 2017. Combining self-supervised learning and imitation for vision-based rope manipulation//Proceedings of 2017 IEEE International Conference on Robotics and Automation. Singapore, Singapore: IEEE: 2146-2153 [DOI: 10.1109/icra.2017.7989247http://dx.doi.org/10.1109/icra.2017.7989247]
Ng A Y and Russell S. 2000. Algorithms for inverse reinforcement learning//Proceedings of the 17th International Conference on Machine Learning. Stanford, USA: Morgan Kaufmann Publishers Inc.: 663-670
Nguyen T, Le T, Zhao H, Tran Q H, Nguyen T and Phung D Q. 2021. Most: multi-source domain adaptation via optimal transport for student-teacher learning//Proceedings of the 37th Conference on Uncertainty in Artificial Intelligence. [s.l.]: AUAI: 225-235
Novak M and Dragicevic T. 2021. Supervised imitation learning of finite-set model predictive control systems for power electronics. IEEE Transactions on Industrial Electronics, 68(2): 1717-1723 [DOI: 10.1109/tie.2020.2969116http://dx.doi.org/10.1109/tie.2020.2969116]
Orsini M, Raichuk A, Hussenot L, Vincent D, Dadashi R, Girgin S, Geist M, Bachem O, Pietquin O and Andrychowicz M. 2021. What matters for adversarial imitation learning?//Proceedings of the 35th Advances in Neural Information Processing Systems. [s.l.]: MIT Press: 14656-14668
Osa T, Pajarinen J, Neumann G, Bagnell J A, Abbeel P and Peters J. 2018. An algorithmic perspective on imitation learning. Foundations and Trends® in Robotics, 7(1/2): 1-179 [DOI: 10.1561/2300000053http://dx.doi.org/10.1561/2300000053]
Pan Y P, Cheng C A, Saigol K, Lee K, Yan X Y, Theodorou E A and Boots B. 2020. Imitation learning for agile autonomous driving. The International Journal of Robotics Research, 39(2/3): 286-302 [DOI: 10.1177/0278364919880273http://dx.doi.org/10.1177/0278364919880273]
Papagiannis G and Li Y P. 2020. Imitation learning with sinkhorn distances [EB/OL]. [2023-01-14]. https://arxiv.org/pdf/2008.09167.pdfhttps://arxiv.org/pdf/2008.09167.pdf
Park J, Seo Y, Liu C, Zhao, L, Qin T, Shin J and Liu T Y. 2021. Object-aware regularization for addressing causal confusion in imitation learning//Proceedings of the 35th Advances in Neural Information Processing Systems. [s.l.]: [s.n.]: 3029-3042
Pavse B S, Torabi F, Hanna J, Warnell G and Stone P. 2020. RIDM: reinforced inverse dynamics modeling for learning from a single observed demonstration. IEEE Robotics and Automation Letters, 5(4): 6262-6269 [DOI: 10.1109/lra.2020.3010750http://dx.doi.org/10.1109/lra.2020.3010750]
Radosavovic I, Wang X L, Pinto L and Malik J. 2021. State-only imitation learning for dexterous manipulation//Proceedings of 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems. Prague, Czech Republic: IEEE: 7865-7871 [DOI: 10.1109/iros51168.2021.9636557http://dx.doi.org/10.1109/iros51168.2021.9636557]
Ramachandran D and Amir E. 2007. Bayesian inverse reinforcement learning//Proceedings of the 20th International Joint Conference on Artificial Intelligence. Hyderabad, India: Morgan Kaufmann Publishers Inc.: 2586-2591
Rawlik K, Toussaint M and Vijayakumar S. 2013. On stochastic optimal control and reinforcement learning by approximate inference//Roy N, Newman P and Srinivasa S, eds. Robotics: Science and Systems VIII. Cambridge, USA: The MIT Press: 1-8 [DOI: 10.7551/mitpress/9816.003.0050http://dx.doi.org/10.7551/mitpress/9816.003.0050]
Raychaudhuri D S, Paul S, van Baar J and Roy-Chowdhury A K. 2021. Cross-domain imitation from observations//Proceedings of the 38th International Conference on Machine Learning. [s.l.]: PMLR: 8902-8912
Reddy S, Dragan A D and Levine S. 2020. SQIL: imitation learning via reinforcement learning with sparse rewards//Proceedings of the 8th International Conference on Learning Representations. Addis Ababa, Ethiopia: OpenReview.net
Ross S and Bagnell D. 2010. Efficient reductions for imitation learning//Proceedings of the 13th International Conference on Artificial Intelligence and Statistics. Sardinia, Italy: JMLR: 661-668
Ross S and Bagnell J A. 2014. Reinforcement and imitation learning via interactive no-regret learning [EB/OL]. [2023-01-14]. https://arxiv.org/pdf/1406.5979.pdfhttps://arxiv.org/pdf/1406.5979.pdf
Ross S, Gordon G J and Bagnell D. 2011. A reduction of imitation learning and structured prediction to no-regret online learning//Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. Fort Lauderdale, USA: JMLR: 627-635
Schmeckpeper K, Rybkin O, Daniilidis K, Levine S and Finn C. 2020. Reinforcement learning with videos: combining offline observations with interaction//Proceedings of the 4th Conference on Robot Learning. Cambridge, USA: PMLR
Sermanet P, Lynch C, Chebotar Y, Hsu J, Jang E, Schaal S, Levine S and Brain G. 2018. Time-contrastive networks: self-supervised learning from video//Proceedings of 2018 IEEE International Conference on Robotics and Automation (ICRA). Brisbane, Australia: IEEE: 1134-1141 [DOI: 10.1109/icra.2018.8462891http://dx.doi.org/10.1109/icra.2018.8462891]
Sharma P, Pathak D and Gupta A. 2019. Third-person visual imitation learning via decoupled hierarchical controller//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: #233
Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T and Hassabis D. 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587): 484-489 [DOI: 10.1038/nature16961http://dx.doi.org/10.1038/nature16961]
Stadie B C, Abbeel P and Sutskever I. 2017. Third-person imitation learning//Proceedings of the 5th International Conference on Learning Representations. Toulon, France: OpenReview.net
Sun W, Vemula A, Boots B and Bagnell D. 2019. Provably efficient imitation learning from observation alone//Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: PMLR: 6036-6045
Taylor M E, Jong N K and Stone P. 2008. Transferring instances for model-based reinforcement learning//Proceedings of 2008 Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Antwerp, Belgium: Springer: 488-505 [DOI: 10.1007/978-3-540-87481-2_32http://dx.doi.org/10.1007/978-3-540-87481-2_32]
Taylor M E, Stone P and Liu Y X. 2007. Transfer learning via inter-task mappings for temporal difference learning. The Journal of Machine Learning Research, 8(9): 2125-2167 [DOI: 10.5555/1314498.1314569http://dx.doi.org/10.5555/1314498.1314569]
Todorov E, Erez T and Tassa Y. 2012. MuJoCo: a physics engine for model-based control//Proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Vilamoura-Algarve, Portugal: IEEE: 5026-5033 [DOI: 10.1109/iros.2012.6386109http://dx.doi.org/10.1109/iros.2012.6386109]
Torabi F, Warnell G and Stone P. 2018. Behavioral cloning from observation//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm, Sweden: Morgan Kaufmann: IJCAI.org: 4950-4957
Torabi F, Warnell G and Stone P. 2019. Recent advances in imitation learning from observation//Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao, China: IJCAI.org: 6325-6331 [DOI: 10.24963/ijcai.2019/882http://dx.doi.org/10.24963/ijcai.2019/882]
Toth D, Miao S, Kurzendorfer T, Rinaldi C A, Liao R, Mansi T, Rhode K and Mountney P. 2018. 3D/2D model-to-image registration by imitation learning for cardiac procedures. International Journal of Computer Assisted Radiology and Surgery, 13(8): 1141-1149 [DOI: 10.1007/s11548-018-1774-yhttp://dx.doi.org/10.1007/s11548-018-1774-y]
Toussaint M. 2009. Robot trajectory optimization using approximate inference//Proceedings of the 26th Annual International Conference on Machine Learning. Montréal, Canada: Association for Computing Machinery: 1049-1056 [DOI: 10.1145/1553374.1553508http://dx.doi.org/10.1145/1553374.1553508]
Tran D, Wang H, Feiszli M and Torresani L. 2019. Video classification with channel-separated convolutional networks//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, Korea (South): IEEE: 5552-5561 [DOI: 10.1109/iccv.2019.00565http://dx.doi.org/10.1109/iccv.2019.00565]
Uchibe E. 2018. Model-free deep inverse reinforcement learning by logistic regression. Neural Processing Letters, 47(3): 891-905 [DOI: 10.1007/s11063-017-9702-7http://dx.doi.org/10.1007/s11063-017-9702-7]
Vinyals O, Babuschkin I, Czarnecki W M, Mathieu M, Dudzik A, Chung J, Choi D H, Powell R, Ewalds T, Georgiev P, Oh J, Horgan D, Kroiss M, Danihelka I, Huang A, Sifre L, Cai T, Agapiou J P, Jaderberg M, Vezhnevets A S, Leblond R, Pohlen T, Dalibard V, Budden D, Sulsky Y, Molloy J, Paine T L, Gulcehre C, Wang Z Y, Pfaff T, Wu Y H, Ring R, Yogatama D, Wünsch D, McKinney K, Smith O, Schaul T, Lillicrap T, Kavukcuoglu K, Hassabis D, Apps C and Silver D. 2019. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782): 350-354 [DOI: 10.1038/s41586-019-1724-zhttp://dx.doi.org/10.1038/s41586-019-1724-z]
Wang X J, Ning Z L, Guo S, Wen M W and Poor H V. 2022. Minimizing the age-of-critical-information: an imitation learning-based scheduling approach under partial observations. IEEE Transactions on Mobile Computing, 21(9): 3225-3238 [DOI: 10.1109/tmc.2021.3053136http://dx.doi.org/10.1109/tmc.2021.3053136]
Wang Y K, Zhang D K, Wang J K, Chen Z X, Li Y H, Wang Y and Xiong R. 2021. Imitation learning of hierarchical driving model: from continuous intention to continuous trajectory. IEEE Robotics and Automation Letters, 6(2): 2477-2484 [DOI: 10.1109/lra.2021.3061336http://dx.doi.org/10.1109/lra.2021.3061336]
Wu H, Song S J, You K Y and Wu C. 2019. Depth control of model-free AUVs via reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 49(12): 2499-2510 [DOI: 10.1109/tsmc.2017.2785794http://dx.doi.org/10.1109/tsmc.2017.2785794]
Wulfmeier M, Ondrúška P and Posner I. 2015. Deep inverse reinforcement learning [EB/OL]. [2023-01-14]. https://www.cs.utexas.edu/users/sniekum/classes/RLFD-F15/papers/Wulfmeier15.pdfhttps://www.cs.utexas.edu/users/sniekum/classes/RLFD-F15/papers/Wulfmeier15.pdf
Wulfmeier M, Rao D, Wang D Z, Ondruska P and Posner I. 2017. Large-scale cost function learning for path planning using deep inverse reinforcement learning. The International Journal of Robotics Research, 36(10): 1073-1087 [DOI: 10.1177/0278364917722396http://dx.doi.org/10.1177/0278364917722396]
Xu M, Yang L, Tao X M, Duan Y P and Wang Z L. 2021. Saliency prediction on omnidirectional image with generative adversarial imitation learning. IEEE Transactions on Image Processing, 30: 2087-2102 [DOI: 10.1109/tip.2021.3050861http://dx.doi.org/10.1109/tip.2021.3050861]
Xu T, Li Z N and Yu Y. 2020. Error bounds of imitating policies and environments//Proceedings of the 34th Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 15737-15749
Yang C, Ma X J, Huang W B, Sun F C, Liu H P, Huang J Z and Gan C. 2019. Imitation learning from observations by minimizing inverse dynamics disagreement//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 239-249
Ye D H, Liu Z, Sun M F, Shi B, Zhao P L, Wu H, Yu H S, Yang S J, Wu X P, Guo Q W, Chen Q B, Yin Y Y T, Zhang H, Shi T F, Wang L, Fu Q, Yang W and Huang L X. 2020. Mastering complex control in MOBA games with deep reinforcement learning//Proceedings of the 34th AAAI Conference on Artificial Intelligence. New York, USA: AAAI: 6672-6679 [DOI: 10.1609/aaai.v34i04.6144http://dx.doi.org/10.1609/aaai.v34i04.6144]
You C X, Lu J B, Filev D and Tsiotras P. 2019. Advanced planning for autonomous vehicles using reinforcement learning and deep inverse reinforcement learning. Robotics and Autonomous Systems, 114: 1-18 [DOI: 10.1016/j.robot.2019.01.003http://dx.doi.org/10.1016/j.robot.2019.01.003]
Zeestraten M J A, Havoutis I, Silvério J, Calinon S and Caldwell D G. 2017. An approach for imitation learning on riemannian manifolds. IEEE Robotics and Automation Letters, 2(3): 1240-1247 [DOI: 10.1109/lra.2017.2657001http://dx.doi.org/10.1109/lra.2017.2657001]
Zhang S Y, Cao Z J, Sadigh D and Sui Y N. 2021. Confidence-aware imitation learning from demonstrations with varying optimality//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: 12340-12350
Zhang Y F, Luo F M and Yu Y. 2022. Improve generated adversarial imitation learning with reward variance regularization. Machine Learning, 111(3): 977-995 [DOI: 10.1007/s10994-021-06083-7http://dx.doi.org/10.1007/s10994-021-06083-7]
Zhou J Y, Wang R, Liu X, Jiang Y F, Jiang S, Tao J M, Miao J H and Song S Y. 2021. Exploring imitation learning for autonomous driving with feedback synthesizer and differentiable rasterization//Proceedings of 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Prague, Czech Republic: IEEE: 1450-1457 [DOI: 10.1109/iros51168.2021.9636795http://dx.doi.org/10.1109/iros51168.2021.9636795]
Zhu Z D, Lin K X, Dai B and Zhou J Y. 2020. Off-policy imitation learning from observations//Proceedings of the 34th International Conference on Neural Information Processing Systems. Vancouver, Canada: Curran Associates Inc.: #1040
Zhu Z Y and Zhao H J. 2022. A survey of deep RL and IL for autonomous driving policy learning. IEEE Transactions on Intelligent Transportation Systems, 23(9): 14043-14065 [DOI: 10.1109/tits.2021.3134702http://dx.doi.org/10.1109/tits.2021.3134702]
Ziebart B D, Bagnell J A and Dey A K. 2010. Modeling interaction via the principle of maximum causal entropy//Proceedings of the 27th International Conference on International Conference on Machine Learning. Haifa, Israel: Omnipress: 1255-1262
Ziebart B D, Maas A L, Bagnell J A and Dey A K. 2008. Maximum entropy inverse reinforcement learning//Proceedings of the 23rd AAAI Conference on Artificial Intelligence. Chicago Illinois, USA: AAAI: 1433-1438
Zweig A and Bruna J. 2020. Provably efficient third-person imitation from offline observation//Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence. [s.l.]: AUAI: 1228-1237
相关作者
相关机构