面向智能驾驶的平行视觉感知:基本概念、框架与应用
Parallel visual perception for intelligent driving: basic concept, framework and application
- 2021年26卷第1期 页码:67-81
收稿:2020-07-20,
修回:2020-10-22,
录用:2020-10-29,
纸质出版:2021-01-16
DOI: 10.11834/jig.200402
移动端阅览

浏览全部资源
扫码关注微信
收稿:2020-07-20,
修回:2020-10-22,
录用:2020-10-29,
纸质出版:2021-01-16
移动端阅览
目的
2
视觉感知技术是智能车系统中的一项关键技术,但是在复杂挑战下如何有效提高视觉性能已经成为智能驾驶领域的重要研究内容。本文将人工社会(artificial societies)、计算实验(computational experiments)和平行执行(parallel execution)构成的ACP方法引入智能驾驶的视觉感知领域,提出了面向智能驾驶的平行视觉感知,解决了视觉模型合理训练和评估问题,有助于智能驾驶进一步走向实际应用。
方法
2
平行视觉感知通过人工子系统组合来模拟实际驾驶场景,构建人工驾驶场景使之成为智能车视觉感知的“计算实验室”;借助计算实验两种操作模式完成视觉模型训练与评估;最后采用平行执行动态优化视觉模型,保障智能驾驶对复杂挑战的感知与理解长期有效。
结果
2
实验表明,目标检测的训练阶段虚实混合数据最高精度可达60.9%,比单纯用KPC(包括:KITTI(Karlsruhe Institute of Technology and Toyota Technological Institute),PASCAL VOC(pattern analysis,statistical modelling and computational learning visual object classes)和MS COCO(Microsoft common objects in context))数据和虚拟数据分别高出17.9%和5.3%;在评估阶段相较于基准数据,常规任务(-30°且垂直移动)平均精度下降11.3%,环境任务(雾天)平均精度下降21.0%,困难任务(所有挑战)平均精度下降33.7%。
结论
2
本文为智能驾驶设计和实施了在实际驾驶场景难以甚至无法进行的视觉计算实验,对复杂视觉挑战进行分析和评估,具备加强智能车在行驶过程中感知和理解周围场景的意义。
Objective
2
As a promising solution to traffic congestion and accidents
intelligent vehicles are receiving increasing attention. Efficient visual perception technology can meet the safety
comfortable
and convenience requirements of intelligent vehicles. Therefore
visual perception is a key technology in intelligent vehicle systems. Intelligent driving focuses on improving visual performance under complex tasks. However
the complex imaging conditions bring significant challenges to visual perception research. As we know
vision models rely on diverse datasets to ensure performance. Unfortunately
obtaining annotations by hand is cumbersome
labor intensive
and error prone. Moreover
the cost of data collection and annotation is high. As a result of the limitation of model design and data diversity
general visual tasks still face problems such as weather and illumination changes
and occlusions. A critical question arises naturally: How could we ensure that an intelligent vehicle is able to drive safely in complex and challenging traffic? In this paper
the artificial systems
computational experiments
and parallel execution (ACP) method is introduced into the field of visual perception. We propose parallel visual perception for intelligent driving. The purpose of this paper is to solve the problem of reasonable training and evaluation of the vision model of intelligent driving
which is helpful for the further application of intelligent vehicles.
Method
2
Parallel visual perception consists of three parts: artificial driving scene
computational experiments
and parallel execution. Specifically
artificial driving scene is a scene defined by software
which is completed by modern 3D model software
computer graphics
and virtual reality. Artificial driving scene modeling adopts the combination of artificial subsystems
which is helpful for intelligent driving to perceive and understand the experiment of complex conditions. In the artificial scene
we use computer graphics to automatically generate accurate ground-truth labels
including semantic/instance segmentation
object bounding box
object tracking
optical flow
and depth. According to the imaging conditions
we design 19 challenging tasks divided into normal
environmental
and difficult tasks. The reliability of the vision model requires repeatable computational experiments to obtain the optimal solution. Two models of computational experiments are used
namely
learning and training
and experiment and evaluation. In the training stage
the artificial driving scene provides a large variety of virtual images
which
combined with the real images
can improve the performance of the vision model. Therefore
the experiment can be conducted in an artificial driving scene at a low cost and with high efficiency. In the evaluation stage
complex imaging conditions (weather
illumination
and occlusion) in an artificial driving scene can be used to comprehensively evaluate the performance of the vision model. The vision algorithm can be specially tested
which is helpful to improve the visual perception performance of intelligent driving. The parallel execution in artificial and real driving scenes can ensure dynamic and long-term vision model training and evaluation. Through the virtual and real interaction method
the experimental results of the vision model in the artificial driving scene can become a possible result of the real system.
Result
2
This paper presents a systematic method to design driving scene tasks and generate virtual datasets for vehicle intelligence testing research. Currently
the virtual dataset consists of 39 010 frames (virtual training data with 27 970 frames
normal tasks with 5 520 frames
environmental tasks with 2 760 frames
and difficult tasks with 2 760 frames) taken from our constructed artificial scenes. In addition
we conduct a series of comparative experiments for visual object detection. In the training stage
the experimental results show that the training data with large scale and diversity can greatly improve the performance of object detection. In addition
the data augmentation method can significantly improve the accuracy of the vision models. For instance
the highest accuracy of the mixed training sets is 60.9%
and that of KPC(KITTI(Karlsruhe Institute of Technology and Toyta Technological Institute)
PASCAL VOC(pattern analysis
statistical modelling and computational learning visual object classes)
MS COCO(Microsoft common objects in context)) and pure virtual data decreased by 17.9% and 5.3%
respectively. In the evaluation stage
compared with the baseline model
the average accuracy of normal tasks (-30° and up-down) decreased by 11.3%
environmental tasks (fog) by 21.0%
and difficult tasks (all challenges) by 33.7%. Experimental results suggest that 1) object detectors are slightly disturbed under different camera angles and are more challenged when the height and angle of the camera are changed simultaneously. The vision model of intelligent vehicle is prone to overfitting
which is why object detection can be performed under limited conditions only; 2) the vision model cannot obtain the features of different environments from the training data. Therefore
bad weather (e.g.
fog and rain) causes a stronger degradation of performance than normal tasks; and 3) the performance of object detection will be greatly influenced in difficult tasks
which is mainly caused by the poor generalization performance of the vision model.
Conclusion
2
In this study
we use computer graphics
virtual reality technology
and machine learning theory to build artificial driving scenes and generate a realistic and challenging virtual driving dataset. On this basis
we conduct visual perception experiments under complex imaging conditions. The vision models of intelligent vehicle are effectively trained and evaluated in artificial and real driving scenes. In the future
we plan to add more visual challenges to the artificial driving scene.
Bai T X, Wang S, Shen Z, Cao D P, Zheng N N and Wang F Y. 2017. Parallel robotics and parallel unmanned systems:framework, structure, process, platform and applications. Acta Automatica Sinica, 43(2):161-175
白天翔, 王帅, 沈震, 曹东璞, 郑南宁, 王飞跃. 2017.平行机器人与平行无人系统:框架、结构、过程、平台及其应用.自动化学报, 43(2):161-175)[DOI:10.16383/j.aas.2017.y000002]
Bainbridge W S. 2007. The scientific research potential of virtual worlds. Science, 317(5837):472-476[DOI:10.1126/science.1146930]
Everingham M, Van Gool L, Williams C K I, Winn J and Zisserman A. 2010. The PASCAL visual object classes (VOC) challenge. International Journal of Computer Vision, 88(2):303-338[DOI:10.1007/s11263-009-0275-4]
Fang J, Zhou D F, Yan F L, Zhao T T, Zhang F H, Ma Y, Wang L and Yang R G. 2020. Augmented LiDAR simulator for autonomous driving. IEEE Robotics and Automation Letters, 5(2):1931-1938[DOI:10.1109/LRA.2020.2969927]
Felzenszwalb P F, Girshick R B, McAllester D and Ramanan D. 2010. Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627-1645[DOI:10.1109/TPAMI.2009.167]
Gaidon A, Wang Q, Cabon Y and Vig E. 2016. Virtual worlds as proxy for multi-object tracking analysis//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 4340-4349[ DOI: 10.1109/CVPR.2016.470 http://dx.doi.org/10.1109/CVPR.2016.470 ]
Gatys L A, Ecker A S and Bethge M. 2016. Image style transfer using convolutional neural networks//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 2414-2423[ DOI: 10.1109/CVPR.2016.265 http://dx.doi.org/10.1109/CVPR.2016.265 ]
Geiger A, Lenz P, Stiller C and Urtasun R. 2013. Vision meets robotics:the KITTI dataset. The International Journal of Robotics Research, 32(11):1231-1237[DOI:10.1177/02783649134-91297]
Hattori H, Naresh Boddeti V, Kitani K and Kanade T. 2015. Learning scene-specific pedestrian detectors without real data//Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3819-3827[ DOI: 10.1109/CVPR.2015.7299006 http://dx.doi.org/10.1109/CVPR.2015.7299006 ]
He K M, Zhang X Y, Ren S Q and Sun J. 2016. Deep residual learning for image recognition//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 770-778[ DOI: 10.1109/CVPR.2016.90 http://dx.doi.org/10.1109/CVPR.2016.90 ]
Hertzmann A. 1998. Painterly rendering with curved brush strokes of multiple sizes//Proceedings of the 25th Annual Conference on Computer Graphics and Interactive Techniques. New York, USA: ACM: 453-460[ DOI: 10.1145/280814.280951 http://dx.doi.org/10.1145/280814.280951 ]
Hertzmann A, Jacobs C E and Oliver N. 2001. Image analogies//Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques. Los Angeles, USA: ACM: 327-340[ DOI: 10.1145/383259.383295 http://dx.doi.org/10.1145/383259.383295 ]
Isola P, Zhu J Y, Zhou T H and Efros A A. 2017. Image-to-image translation with conditional adversarial networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolul u, USA: IEEE: 5967-5976[ DOI: 10.1109/CVPR.2017.632 http://dx.doi.org/10.1109/CVPR.2017.632 ]
Jaipuria N, Zhang X L, Bhasin R, Arafa M, Chakravarty P, Shrivastava S, Manglani S and Murali V N. 2020. Deflating dataset bias using synthetic data augmentation//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Seattle, USA: IEEE: 3344-3353[ DOI: 10.1109/CVPRW50498.2020.00394 http://dx.doi.org/10.1109/CVPRW50498.2020.00394 ]
Johnson-Roberson M, Barto C, Mehta R, Sridhar S N, Rosaen K and Vasudevan R. 2017. Driving in the matrix: can virtual worlds replace human-generated annotations for real world tasks?//Proceedings of 2017 IEEE International Conference on Robotics and Automation. Singapore, Singapore: IEEE: 746-753[ DOI: 10.1109/ICRA.2017.7989092 http://dx.doi.org/10.1109/ICRA.2017.7989092 ]
Kar A, Prakash A, Liu M Y, Cameracci E, Yuan J, Rusiniak M, Acuna D, Torralba A and Fidler S. 2019. Meta-Sim: learning to generate synthetic datasets//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul, South Korea: IEEE: 4550-4559[ DOI: 10.1109/ICCV.2019.00465 http://dx.doi.org/10.1109/ICCV.2019.00465 ]
Kolliopoulos A. 2005. Image Segmentation for Stylized Non-Photorealistic Rendering and Animation. Toronto: University of Toronto
Li L, Lin Y L, Cao D P, Zheng N N and Wang F Y. 2017. Parallel learning-a new framework for machine learning. Acta Automatica Sinica, 43(1):1-8
李力, 林懿伦, 曹东璞, 郑南宁, 王飞跃. 2017.平行学习-机器学习的一个新型理论框架.自动化学报, 43(1):1-8)[DOI:10.16383/j.aas.2017.y000001]
Li W, Pan C W, Zhang R, Ren J P, Ma Y X, Fang J, Yan F L, Geng Q C, Huang X Y, Gong H J, Xu W W, Wang G P, Manocha D and Yang R G. 2019a. AADS:augmented autonomous driving simulation using data-driven algorithms. Science Robotics, 4(28):#eaaw0863[DOI:10.1126/scirobotics.aaw0863]
Li X, Wang K T, Tian Y L, Yan L, Deng F and Wang F Y. 2019b. The ParallelEye dataset:a large collection of virtual images for traffic vision research. IEEE Transactions on Intelligent Transportation Systems, 20(6):2072-2084[DOI:10.1109/TITS.2018.2857566]
Li X, Wang Y T, Wang K F, Yan L and Wang F Y. 2018. The ParallelEye-CS dataset: constructing artificial scenes for evaluating the visual intelligence of intelligent vehicles//2018 IEEE Intelligent Vehicles Symposium. Changshu, China: IEEE: 37-42[ DOI: 10.1109/IVS.2018.8500459 http://dx.doi.org/10.1109/IVS.2018.8500459 ]
Li X, Wang Y T, Yan L, Wang K F, Deng F and Wang F Y. 2019c. ParallelEye-CS:a new dataset of synthetic images for testing the visual intelligence of intelligent vehicles. IEEE Transactions on Vehicular Technology, 68(10):9619-9631[DOI:10.1109/TVT.2019.2936227]
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P and Zitnick C L. 2014. Microsoft COCO: common objects in context//Proceedings of the 13th European Conference on Computer Vision. Zurich, Switzerland: Springer: 740-755[ DOI: 10.1007/978-3-319-10602-1_48 http://dx.doi.org/10.1007/978-3-319-10602-1_48 ]
Liu M Y, Breuel T and Kautz J. 2017. Unsupervised image-to-image translation networks//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: ACM: 700-708
Liu X, Wang X, Zhang W S, Wang J J and Wang F Y. 2017. Parallel data:from big data to data intelligence. Pattern Recognition and Artificial Intelligence, 30(8):673-681
刘昕, 王晓, 张卫山, 汪建基, 王飞跃. 2017.平行数据:从大数据到数据智能.模式识别与人工智, 30(8):673-681)[DOI:10.16451/j.cnki.issn1003-6059.201708001]
Long J, Shelhamer E and Darrell T. 2015. Fully convolutional networks for semantic segmentation//Proceedings of 2015 IEE E Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE: 3431-3440[ DOI: 10.1109/CVPR.2015.7298965 http://dx.doi.org/10.1109/CVPR.2015.7298965 ]
Manhardt F, Kehl W and Gaidon A. 2019. ROI-10D: monocular lifting of 2D detection to 6D pose and metric shape//Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE: 2064-2073[ DOI: 10.1109/CVPR.2019.00217 http://dx.doi.org/10.1109/CVPR.2019.00217 ]
Marr D. 1976. Early processing of visual information. Philosophical Transactions of the Royal Society B, Biological Sciences, 275(942):483-519[DOI:10.1098/rstb.1976.0090]
Movshovitz-Attias Y, Kanade T and Sheikh Y. 2016. How useful is photo-realistic rendering for visual learning?//Proceedings of 2016 European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 202-217[ DOI: 10.1007/978-3-319-49409-8_18 http://dx.doi.org/10.1007/978-3-319-49409-8_18 ]
Peng X C, Sun B C, Ali K and Saenko K. 2015. Learning deep object detectors from 3D models//Proceedings of 2015 IEEE International Conference on Computer Visio. Santiago, Chile: IEEE: 1278-1286[ DOI: 10.1109/ICCV.2015.151 http://dx.doi.org/10.1109/ICCV.2015.151 ]
Prakash A, Boochoon S, Brophy M, Acuna D, Cameracci E, State G, Shapira O and Birchfield S. 2019. Structured domain randomization: bridging the reality gap by context-aware synthetic data//Proceedings of 2019 IEEE International Conference on Robotics and Automation. Montreal, Canada: IEEE: 7249-7255[ DOI: 10.1109/ICRA.2019.8794443 http://dx.doi.org/10.1109/ICRA.2019.8794443 ]
Ren S Q, He K M, Girshick R and Sun J. 2017. Faster R-CNN:towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence. 39(6):1137-1149[DOI:10.1109/TPAMI.2016.2577031]
Richter S R, Hayder Z and Koltun V. 2017. Playing for benchmarks//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE: 2232-2241[ DOI: 10.1109/ICCV.2017.243 http://dx.doi.org/10.1109/ICCV.2017.243 ]
Richter S R, Vineet V, Roth S and Koltun V. 2016. Playing for data: ground truth from computer games//Proceedings of the 14th European Conference on Computer Vision. Amsterdam, the Netherlands: Springer: 102-118[ DOI: 10.1007/978-3-319-46475-6_7 http://dx.doi.org/10.1007/978-3-319-46475-6_7 ]
Ros G, SellartL, Materzynska J, Vazquez D and Lopez A M. 2016. The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE: 3234-3243[ DOI: 10.1109/CVPR.2016.35 http://dx.doi.org/10.1109/CVPR.2016.35 ]
Rozantsev A, Lepetit V and Fua P. 2015. On rendering synthetic images for training an object detector. Computer Vision and Image Understanding, 137:24-37[DOI:10.1016/j.cviu.2014.12.006]
Simonyan K and Zisserman A. 2014. Very deep convolutional networks for large-scale image recognition[EB/OL ] .[2020-07-12 ] . https://arxiv.org/pdf/1409.1556.pdf https://arxiv.org/pdf/1409.1556.pdf
Stark M, Goesele M and Schiele B. 2010. Back to the future: learning shape models from 3D CAD data//Proceedings of the British Machine Vision Conference. Aberystwyth, UK: BMVA Press: 5-15[ DOI: 10.5244/C.24.106 http://dx.doi.org/10.5244/C.24.106 ]
Subrahmanian V S and Dickerson J. 2009. What can virtual worlds and games do for national security? Science, 326(5957):1201-1202[DOI:10.1126/science.1182660]
Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, To T, Cameracci E, Boochoon S and Birchfield S. 2018. Training deep networks with synthetic data: Bridging the reality gap by domain randomization//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. Salt Lake City, USA: IEEE: 969-977[ DOI: 10.1109/CVPRW.2018.00143 http://dx.doi.org/10.1109/CVPRW.2018.00143 ]
Tsirikoglou A, Kronander J, Wrenninge M and Unger J. 2017. Procedural modeling and physically based rendering for synthetic data generation in automotive applications[EB/OL ] .[2020-07-12 ] . https://arxiv.org/pdf/1710.06270.pdf https://arxiv.org/pdf/1710.06270.pdf
Wang F Y. 2004. Parallel system methods for management and control of complex systems. Control and Decision, 19(5):485-489, 514
王飞跃. 2004.平行系统方法与复杂系统的管理和控制.控制与决策, 19(5):485-489, 514)[DOI:10.13195/j.cd.2004.05.6.wangfy.002]
Wang F Y. 2010. Parallel control and management for intelligent transportation systems:concepts, architectures, and applications. IEEE Transactions on Intelligent Transportation Systems, 11(3):630-638[DOI:10.1109/TITS.2010.2060218]
Wang F Y. 2013. Parallel control:a method for data-driven and computational control. Acta Automatica Sinica, 39(4):293-302
王飞跃. 2013.平行控制:数据驱动的计算控制方法.自动化学报, 39(4):293-302)[DOI:10.3724/SP.J.1004.2013.00293]
Wang F Y. 2015. Scanning the issue and beyond:five transportations in one-a new direction for ITS from Qingdao. IEEE Transactions on Intelligent Transportation Systems, 16(5):2310-2317[DOI:10.1109/TITS.2015.2478319]
Wang K F, Gou C and Wang F Y. 2016. Parallel vision an ACP-based approach to intelligent vision computing. Acta Automatica Sinica, 42(10):1490-1500
王坤峰, 苟超, 王飞跃. 2016.平行视觉:基于ACP的智能视觉计算方法.自动化学报, 42(10):1490-1500)[DOI:10.16383/j.aas.2016.c160604]
Wang T C, Liu M Y, Zhu J Y, Tao A, Kautz J and Catanzaro B. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs//Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City, USA: IEEE: 8798-8807[ DOI: 10.1109/CVPR.2018.00917 http://dx.doi.org/10.1109/CVPR.2018.00917 ]
Wang X, Yao T T, Han S S, Cao D P and Wang F Y. 2018. Parallel internet of vehicles:the ACP-based networked management and control for intelligent vehicles. Acta Automatica Sinica, 44(8):1391-1404
王晓, 要婷婷, 韩双双, 曹东璞, 王飞跃. 2018.平行车联网:基于ACP的智能车辆网联管理与控制.自动化学报, 44(8):1391-1404)[DOI:10.16383/j.aas.2018.c170463]
Winnemöller H, Olsen S C and Gooch B. 2006. Real-time video abstraction. ACM Transactions on Graphics, 25(3):1221-1226[DOI:10.1145/1141911.1142018]
Yue X Y, Wu B C, Seshia S A, Keutzer K and Sangiovanni-Vincentelli A L. 2018. A LiDAR point cloud generator: from a virtual world to autonomous driving//Proceedings of 2018 ACM on International Conference on Multimedia Retrieval. Yokohama, Japan: ACM: 458-464[ DOI: 10.1145/3206025.3206080 http://dx.doi.org/10.1145/3206025.3206080 ]
Zhang Y, Qiu W C, Chen Q, Hu X L and Yuille A. 2018. UnrealStereo: controlling hazardous factors to analyze stesreo vision//Proceedings of 2018 International Conference on 3D Vision. Verona, Italy: IEEE: 228-237[ DOI: 10.1109/3DV.2018.00035 http://dx.doi.org/10.1109/3DV.2018.00035 ]
Zhang Y D, Song S R, Yumer E, Savva M, Lee J Y, Jin H L and Funkhouser T. 2017. Physically-based rendering for indoor scene understanding using convolutional neural networks//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE: 5057-5065[ DOI: 10.1109/CVPR.2017.537 http://dx.doi.org/10.1109/CVPR.2017.537 ]
Zhu J Y, Park T, Isola P and Efros A A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks//Proceedings of 2017 IEEE International conference on computer vision. Venice, Italy: IEEE: 2242-2251[ DOI: 10.1109/ICCV.2017.244 http://dx.doi.org/10.1109/ICCV.2017.244 ]
相关作者
相关机构
京公网安备11010802024621