基于双层DQN的多智能体路径规划
Multi-agent path planning based on improved double DQN
- 2023年28卷第7期 页码:2167-2181
纸质出版日期: 2023-07-16
DOI: 10.11834/jig.211239
移动端阅览
浏览全部资源
扫码关注微信
纸质出版日期: 2023-07-16 ,
移动端阅览
张晨, 蒋文英, 陈思源, 周文, 闫丰亭. 2023. 基于双层DQN的多智能体路径规划. 中国图象图形学报, 28(07):2167-2181
Zhang Chen, Jiang Wenying, Chen Siyuan, Zhou Wen, Yan Fengting. 2023. Multi-agent path planning based on improved double DQN. Journal of Image and Graphics, 28(07):2167-2181
目的
2
随着虚拟现实技术的发展,在虚拟场景中,基于多智能体的逃生路径规划已成为关键技术之一。与传统的火灾演习相比,采用基于虚拟现实的方法完成火灾逃生演练具有诸多优势,如成本低、代价小、可靠性高等,但仍有一定的局限性,为此,提出一种改进的双层深度Q网络(deep Q network,DQN)架构的路径规划算法。
方法
2
基于两个结构相同的双Q网络,优化了经验池的生成方法和探索策略,并在奖励中增加火灾这样的环境因素对智能体的影响。同时,为了提高疏散的安全性和效率,提出了一种基于改进的K-medoids算法的多智能体分组策略方法。
结果
2
相关实验表明提出的改进的双层深度Q网络架构收敛速度更快,学习更加稳定,模型性能得到有效提升。综合考虑火灾场景下智能体的疏散效率和疏散安全性,使用指标平均健康疏散值(average health evacuation value, AHEP)评估疏散效果,相较于传统的路径规划方法A-STAR(a star search algorithm)和DIJKSTRA(Dijkstra’s algorithm)分别提高了84%和104%;与基于火灾场景改进的扩展A-STAR和Dijkstra-ACO(Dijkstra and ant colony optimization)混合算法比较,分别提高了30%和21%;与考虑火灾影响的DQN算法相比,提高了20%,疏散效率和安全性都得到提高,规划的路径疏散效果更好。通过比较不同分组模式下的疏散效果,验证了对多智能体合适分组可以提高智能体疏散效率。
结论
2
提出的算法优于目前大多数常用的方法,显著提高了疏散的效率和安全性。
Objective
2
Rescue-oriented evacuation drills like fire escape drills have often been structured to optimize rehearsal training effect and firefighting awareness. To get sufficient evacuation experience, multiple drills are costly for related organizers. The requirement of that is based on evacuation drills, emergency drill venue, the physical condition of participants, and position information in real-time. The emerging virtual reality technology can be used to guide virtual fire escape in relevance to lower cost and risk and higher reliability. Moreover, to simulate its emergency drills in virtual scenarios, multi-agent path planning has been recognized and developed nowadays.
Method
2
We develop an improved double deep Q network (DQN) framework. Specifically, this virtual scenario analysis is developed through collecting enough campus information, including multiple agents, obstacles, exits, fire affected areas, and other related factors. Since all agents are assumed on the same plane, we can convert them into two-dimensional grid diagrams via transformation gridding and coordination. Furthermore, different grids are colored and utilized in two-dimensional grid plane
<math id="M1"><mi mathvariant="bold-italic">m</mi></math>
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760452&type=
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760451&type=
2.45533323
2.28600001
to represent obstacles, fire affected areas, exits and locations of agents. According to the location of the agent in the virtual scene, the grid plane
<math id="M2"><mi mathvariant="bold-italic">m</mi></math>
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760452&type=
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760451&type=
2.45533323
2.28600001
is layered, and the grid plane
<math id="M3"><msub><mrow><mi mathvariant="bold-italic">m</mi></mrow><mrow><mn mathvariant="normal">1</mn></mrow></msub></math>
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760463&type=
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760468&type=
3.47133350
3.21733332
and the grid plane
<math id="M4"><msub><mrow><mi mathvariant="bold-italic">m</mi></mrow><mrow><mn mathvariant="normal">2</mn></mrow></msub></math>
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760472&type=
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760470&type=
3.47133350
3.21733332
can be obtained in terms of the sizes of 64 × 100 and 48 × 100 of each. In the double deep Q network, we use two double Q networks with the same structure, i.e.,
Q
1 and
Q
2, which consists of two category of convolution and full connection layers. Furthermore, input size can be interlinked to the grid planes with the same size as
<math id="M5"><msub><mrow><mi mathvariant="bold-italic">m</mi></mrow><mrow><mn mathvariant="normal">1</mn></mrow></msub></math>
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760476&type=
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760473&type=
3.47133350
3.21733332
and
<math id="M6"><msub><mrow><mi mathvariant="bold-italic">m</mi></mrow><mrow><mn mathvariant="normal">2</mn></mrow></msub></math>
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760466&type=
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760478&type=
3.47133350
3.21733332
after environmental stratification. For the grid planes with the same size as
<math id="M7"><msub><mrow><mi mathvariant="bold-italic">m</mi></mrow><mrow><mn mathvariant="normal">1</mn></mrow></msub></math>
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760483&type=
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760467&type=
3.47133350
3.21733332
and
<math id="M8"><msub><mrow><mi mathvariant="bold-italic">m</mi></mrow><mrow><mn mathvariant="normal">2</mn></mrow></msub></math>
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760485&type=
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760484&type=
3.47133350
3.21733332
, trainable grid planes
<math id="M9"><msubsup><mrow><mi mathvariant="bold-italic">m</mi></mrow><mrow><mn mathvariant="normal">1</mn><mi mathvariant="bold-italic">t</mi></mrow><mrow><mi mathvariant="bold-italic">'</mi></mrow></msubsup></math>
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760487&type=
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760486&type=
3.89466691
3.47133350
and
<math id="M10"><msubsup><mrow><mi mathvariant="bold-italic">m</mi></mrow><mrow><mn mathvariant="normal">2</mn><mi mathvariant="bold-italic">t</mi></mrow><mrow><mi mathvariant="bold-italic">'</mi></mrow></msubsup></math>
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760489&type=
http://html.publish.founderss.cn/rc-pub/api/common/picture?pictureId=45760488&type=
3.89466691
3.47133350
can be obtained by randomly assigning the same number of black blocks with size of 1×1 to represent the duplicable location of the obstacle, and generating planes corresponding to all different starting positions to represent all status of the agent in the scene, which are used to initialize experience pools
D
1
and
D
2
and train networks
Q
1 and
Q
2. For the actual evacuation drills, the evacuation of the crowd is not completely independent and discrete. Nevertheless, due to the sociality of people, there is a certain social relationship between the people involved in evacuation, and there is often a certain phenomenon of “gathering and following” in crowd evacuation. In addition, to achieve the evacuation process of the crowd better in an actual evacuation drill, the organizer often arrange a certain number of guiders at different locations to assist the participants to complete the process of evacuation. Hence, our framework can add this guide into the virtual scenario and an improved k-medoids algorithm based multi-agent grouping strategy method is implemented. Agent-based location and relationship are involved in and the related grouping of the agents are accomplished as well, i.e., the selection of corresponding guiding agents, and the evacuation-led of other agents in the group, and the improved path planning algorithm of double deep Q network architecture mentioned above. A reliability and efficiency of evacuation are improved further.
Result
2
Extensive experiment is carried out to validate our proposed methods. In the training process, the network
Q
3 of the traditional DQN method converge 24 000 batch sizes, while the
Q
1 and
Q
2 networks converge about 3 000 batch size as well. In detail, it demonstrates that the convergence performance of proposed method is significantly faster than the traditional DQN method and more stable. Additionally, to improve the evacuation efficiency and evacuation safety of the agent in fire scenarios, average health evacuation value (AHEP) is used to evaluate the evacuation effect. In AHEP criterion, it is about 84% and 104% higher than each traditional path planning methods of A-STAR, DIJKSTRA. Compared to the extended A-STAR and Dijkstra-ACO hybrid algorithm based on changeable fire scene, hybrid algorithm can be improved by 30% and 21%; Compared to DQN algorithm, it can be reached 20% higher. What is more, evacuation efficiency and safety are improved more, and evacuation effect of the planned path is much better. Furthermore, to verify the evacuation effect under different groups, we compared the AHEP values under the four groups of 4, 5, 6 and 7. When the group is 6, its value is the highest, which is 17%, 13% and 6% higher than those three cases of 4, 5 and 7. Finally, the results show that the appropriate grouping of multi-agent can improve the evacuation efficiency of agent.
Conclusion
2
The proposed method has its potentials to improve the evacuation efficiency and security to a certain extent.
虚拟现实火灾逃生演练多智能体深度强化学习分组策略
virtual realityfire drillmulti-agentdeep reinforcement learninggrouping strategy
Ai Z H, Hu Y H, Yan F T, Zhang H J, Wang D Q, Qing S L, Zhu H H and Jia J Y. 2019. Key technology of lightweight Web3D online planning of metro fire escape. Scientia Sinica Information, 49(4): 405-421
艾子豪, 胡永豪, 闫丰亭, 张惠娟, 王冬青, 青胜蓝, 朱合华, 贾金原. 2019. 轻量级Web3D地铁火灾逃生在线规划关键技术. 中国科学: 信息科学, 49(4): 405-421 [DOI: 10.1360/N112018-00275http://dx.doi.org/10.1360/N112018-00275]
Aubé F and Shield R. 2004. Modeling the effect of leadership on crowd flow dynamics//Proceedings of the 6th International Conference on Cellular Automata. Amsterdam, the Netherlands: Springer: 601-611 [DOI: 10.1007/978-3-540-30479-1_62http://dx.doi.org/10.1007/978-3-540-30479-1_62]
Cao X H, Li X Y, Wei X G, Li S, Huang M X and Li D L. 2020. Dynamic programming of emergency evacuation path based on Dijkstra-ACO hybrid algorithm. Journal of Electronics and Information Technology, 42(6): 1502-1509
曹祥红, 李欣妍, 魏晓鸽, 李森, 黄梦溪, 李栋禄. 2020. 基于Dijkstra-ACO混合算法的应急疏散路径动态规划. 电子与信息学报, 42(6): 1502-1509 [DOI: 10.11999/JEIT190854http://dx.doi.org/10.11999/JEIT190854]
Cheng P J, Wu N, Meng F K and Li S. 2020. Fire escape path planning based on extended A* algorithm. Communications Technology, 53(12): 3012-3016
程鹏举, 吴楠, 孟凡坤, 李爽. 2020. 扩展A*算法的火灾逃生路径规划研究. 通信技术, 53(12): 3012-3016 [DOI: 10.3969/j.issn.1002-0802.2020.12.021http://dx.doi.org/10.3969/j.issn.1002-0802.2020.12.021]
Fang J S, El-Tawil S and Aguirre B. 2016. Leader–follower model for agent based simulation of social collective behavior during egress. Safety Science, 83: 40-47 [DOI: 10.1016/j.ssci.2015.11.015http://dx.doi.org/10.1016/j.ssci.2015.11.015]
Haghani M and Sarvi M. 2016. Pedestrian crowd tactical‐level decision making during emergency evacuations. Journal of Advanced Transportation, 50(8): 1870-1895 [DOI: 10.1002/atr.1434http://dx.doi.org/10.1002/atr.1434]
Han Y B and Liu H. 2018. Research on route choice model based on evacuation route set and its application in crowd evacuation simulation. Chinese Journal of Computers, 41(12): 2653-2669
韩延彬, 刘弘. 2018. 一种基于疏散路径集合的路径选择模型在人群疏散仿真中的应用研究. 计算机学报, 41(12): 2653-2669 [DOI: 10.11897/SP.J.1016.2018.02653http://dx.doi.org/10.11897/SP.J.1016.2018.02653]
Helbing D, Buzna L, Johansson A and Werner T. 2005. Self-organized pedestrian crowd dynamics: Experiments, simulations, and design solutions. Transportation science, 39(1): 1-24 [DOI: 10.1287/trsc.1040.0108http://dx.doi.org/10.1287/trsc.1040.0108]
Jin H L, Wang Y L, Yuan M and Chen M L. 2019. Research on escape path planning algorithm for high-rise buildings based on A*. Bulletin of Surveying and Mapping, 11: 17-21, 25
靳海亮, 王赢乐, 袁鸣, 陈梦龙. 2019. 改进A*的高层建筑逃生路径规划算法研究. 测绘通报, (11): 17-21, 25 [DOI: 10.13474/j.cnki.11-2246.2019.0344http://dx.doi.org/10.13474/j.cnki.11-2246.2019.0344]
Konar A, Chakraborty I G, Singh S J, Jain L C and Nagar A K. 2013. A deterministic improved Q-learning for path planning of a mobile robot. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 43(5): 1141-1153 [DOI: 10.1109/TSMCA.2012.2227719http://dx.doi.org/10.1109/TSMCA.2012.2227719]
Le V M, Vinh H T and Zucker J D. 2017. Reinforcement learning approach for adapting complex agent-based model of evacuation to fast linear model//Proceedings of the 7th International Conference on Information Science and Technology (ICIST). Da Nang, Vietnam: IEEE: 369-375 [DOI: 10.1109/ICIST.2017.7926787http://dx.doi.org/10.1109/ICIST.2017.7926787]
Li X L, Kuang H and Fan Y H. 2012. Lattice hydrodynamic model of pedestrian flow considering the asymmetric effect. Communications in Nonlinear Science and Numerical Simulation, 17(3): 1258-1263 [DOI: 10.1016/j.cnsns.2011.07.034http://dx.doi.org/10.1016/j.cnsns.2011.07.034]
Li Y, Chen M Y, Dou Z, Zheng X P, Cheng Y and Mebarki A. 2019. A review of cellular automata models for crowd evacuation. Physica A: Statistical Mechanics and its Applications, 526: #120752 [DOI: 10.1016/j.physa.2019.03.117http://dx.doi.org/10.1016/j.physa.2019.03.117]
Li Z W, Huang H, Li N, Chu M L C and Law K. 2020. An agent-based simulator for indoor crowd evacuation considering fire impacts. Automation in Construction, 120: #103395 [DOI: 10.1016/j.autcon.2020.103395http://dx.doi.org/10.1016/j.autcon.2020.103395]
Lissovoi A and Witt C. 2015. Runtime analysis of ant colony optimization on dynamic shortest path problems. Theoretical Computer Science, 561, 73-85 [DOI: 10.1016/j.tcs.2014.06.035http://dx.doi.org/10.1016/j.tcs.2014.06.035]
Liu H, Lu D J, Zhang G J, Hong X and Liu H. 2021. Recurrent emotional contagion for the crowd evacuation of a cyber-physical society. Information Sciences, 575: 155-172 [DOI: 10.1016/j.ins.2021.06.036http://dx.doi.org/10.1016/j.ins.2021.06.036]
Lyu L H, Zhang S J, Ding D R and Wang Y X. 2019. Path planning via an improved DQN-based learning policy. IEEE Access, 7: 67319-67330 [DOI: 10.1109/ACCESS.2019.2918703http://dx.doi.org/10.1109/ACCESS.2019.2918703]
Mac T T, Copot C, Tran D T and De Keyser R. 2017. A hierarchical global path planning approach for mobile robots based on multi-objective particle swarm optimization. Applied Soft Computing, 59: 68-76 [DOI: 10.1016/j.asoc.2017.05.012http://dx.doi.org/10.1016/j.asoc.2017.05.012]
Miyagawa D and Ichinose G. 2020. Cellular automaton model with turning behavior in crowd evacuation. Physica A: Statistical Mechanics and its Applications, 549: #124376 [DOI: 10.1016/j.physa.2020.124376http://dx.doi.org/10.1016/j.physa.2020.124376]
Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, Graves A, Riedmiller M, Fidjeland A K, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S and Hassabis D. 2015. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533 [DOI: 10.1038/nature14236http://dx.doi.org/10.1038/nature14236]
Pelechano N and Badler N I. 2006. Modeling crowd and trained leader behavior during building evacuation. IEEE Computer Graphics and Applications, 26(6): 80-86 [DOI: 10.1109/MCG.2006.133http://dx.doi.org/10.1109/MCG.2006.133]
Spartalis E, Georgoudas I G and Sirakoulis G C. 2014. CA crowd modeling for a retirement house evacuation with guidance//Proceedings of the 11th International Conference on Cellular Automata. Krakow, Poland: Springer: 481-491 [DOI: 10.1007/978-3-319-11520-7_50http://dx.doi.org/10.1007/978-3-319-11520-7_50]
Tan L, Hu M Y and Lin H. 2015. Agent-based simulation of building evacuation: combining human behavior with predictable spatial accessibility in a fire emergency. Information Sciences, 295: 53-66 [DOI: 10.1016/j.ins.2014.09.029http://dx.doi.org/10.1016/j.ins.2014.09.029]
Tian Z N, Zhang G J, Hu C Y, Lu D J and Liu H. 2020. Knowledge and emotion dual-driven method for crowd evacuation. Knowledge-Based Systems, 208: #106451 [DOI: 10.1016/j.knosys.2020.106451http://dx.doi.org/10.1016/j.knosys.2020.106451]
Vihas C, Georgoudas I G and Sirakoulis G C. 2012. Follow-the-leader cellular automata based model directing crowd movement//Proceedings of the 10th International Conference on Cellular Automata. Santorini Island, Greece: Springer: 752-762 [DOI: 10.1007/978-3-642-33350-7_78http://dx.doi.org/10.1007/978-3-642-33350-7_78]
Wagner N and Agrawal V. 2014. An agent-based simulation system for concert venue crowd evacuation modeling in the presence of a fire disaster. Expert Systems with Applications, 41(6): 2807-2815 [DOI: 10.1016/j.eswa.2013.10.013http://dx.doi.org/10.1016/j.eswa.2013.10.013]
Wang S N, Liu H, Gao K Z and Zhang J X. 2019. A multi-species artificial bee colony algorithm and its application for crowd simulation. IEEE Access, 7: 2549-2558 [DOI: 10.1109/ACCESS.2018.2886629http://dx.doi.org/10.1109/ACCESS.2018.2886629]
Wang X L, Zheng X P and Cheng Y. 2012. Evacuation assistants: an extended model for determining effective locations and optimal numbers. Physica A: Statistical Mechanics and Its Applications, 391(6): 2245-2260 [DOI: 10.1016/j.physa.2011.11.051http://dx.doi.org/10.1016/j.physa.2011.11.051]
Xie W, Lee E W M, Li T, Shi M, Cao R F and Zhang Y C. 2021. A study of group effects in pedestrian crowd evacuation: experiments, modelling and simulation. Safety Science, 133: #105029 [DOI: 10.1016/j.ssci.2020.105029http://dx.doi.org/10.1016/j.ssci.2020.105029]
Yan F T, Hu Y H, Jia J Y, Ai Z H, Tang K, Shi Z C and Liu X. 2020. Interactive WebVR visualization for online fire evacuation training. Multimedia Tools and Applications, 79(41/42): 31541-31565 [DOI: 10.1007/s11042-020-08863-0http://dx.doi.org/10.1007/s11042-020-08863-0]
Yan F T, Hu Y H, Jia J Y, Guo Q H, Zhu H H and Pan Z G. 2019b. RFES: a real-time fire evacuation system for Mobile Web3D. Frontiers of Information Technology and Electronic Engineering, 20(8): 1061-1074 [DOI: 10.1631/FITEE.1700548http://dx.doi.org/10.1631/FITEE.1700548]
Yan F T, Jia J Y, Hu Y H, Guo Q H and Zhu H H. 2019a. Smart fire evacuation service based on internet of things computing for Web3D. Journal of Internet Technology, 20(2): 521-532 [DOI: 10.3966/160792642019032002019http://dx.doi.org/10.3966/160792642019032002019]
Yang Y C, Dimarogonas D V and Hu X M. 2013. Optimal leader-follower control for crowd evacuation//Proceedings of the 52nd IEEE Conference on Decision and Control. Firenze, Italy: IEEE: 2769-2774 [DOI: 10.1109/CDC.2013.6760302http://dx.doi.org/10.1109/CDC.2013.6760302]
Yuan T and Liu Y. 2019. Potential energy field based pedestrian behavior model for crowd evacuation simulation in airport terminal. Journal of Physics Conference Series, 1345(4): #042023 [DOI: 10.1088/1742-6596/1345/4/042023http://dx.doi.org/10.1088/1742-6596/1345/4/042023]
Yuan W F and Tan K H. 2009. Cellular automata model for simulation of effect of guiders and visibility range. Current Applied Physics, 9(5): 1014-1023 [DOI: 10.1016/j.cap.2008.10.007http://dx.doi.org/10.1016/j.cap.2008.10.007]
Zhang G J, Lu D J and Liu H. 2020. Strategies to utilize the positive emotional contagion optimally in crowd evacuation. IEEE Transactions on Affective Computing, 11(4): 708-721 [DOI: 10.1109/TAFFC.2018.2836462http://dx.doi.org/10.1109/TAFFC.2018.2836462]
Zhang H, Liu H, Qin X and Liu B X. 2018. Modified two-layer social force model for emergency earthquake evacuation. Physica A: Statistical Mechanics and Its Applications, 492: 1107-1119 [DOI: 10.1016/j.physa.2017.11.041http://dx.doi.org/10.1016/j.physa.2017.11.041]
Zhou M, Dong H R, Zhao Y B, Ioannou P A and Wang F Y. 2019. Optimization of crowd evacuation with leaders in urban rail transit stations. IEEE Transactions on Intelligent Transportation Systems, 20(12): 4476-4487 [DOI: 10.1109/TITS.2018.2886415http://dx.doi.org/10.1109/TITS.2018.2886415]
相关作者
相关机构