logo

SCIENCE CHINA Information Sciences, Volume 62 , Issue 8 : 081101(2019) https://doi.org/10.1007/s11432-018-9850-9

Autonomous driving: cognitive construction and situation understanding

More info
  • ReceivedAug 1, 2018
  • AcceptedMar 15, 2019
  • PublishedJul 12, 2019

Abstract

Autonomous vehicle is a kind of typical complex artificial intelligence system. In current research of autonomous driving, the most widely adopted technique is to use a basic framework of serial information processing and computations, which consists of four modules: perception, planning, decision-making, and control. However, this framework based on data-driven computing performs low computational efficiency, poor environmental understanding and self-learning ability. A neglected problem has long been how to understand and process environmental perception data from the sensors referring to the cognitive psychology level of the human driving process. The key to solving this problem is to construct a computing model with selective attention and self-learning ability for autonomous driving, which is supposed to possess the mechanism of memorizing, inferring and experiential updating, enabling it to cope with traffic scenarios with high noise, dynamic, and randomness. In addition, for the process of understanding traffic scenes, the efficiency of event-related mechanism is more significant than single-attribute scenario perception data. Therefore, an effective self-driving method should not be confined to the traditional computing framework of `perception, planning, decision-making, and control'. It is necessary to explore a basic computing framework that conforms to human driver's attention, reasoning, learning, and decision-making mechanism with regard to traffic scenarios and build an autonomous system inspired by biological intelligence.In this article, we review the basic methods and main progress in current data-driven autonomous driving technologies, deeply analyze the limitations and major problems faced by related algorithms. Then, combined with authors' research, this study discusses how to implement a basic cognitive computing framework of self-driving with selective attention and an event-driven mechanism from the basic viewpoint of cognitive science. It further describes how to use multi-sensor and graph data with semantic information (such as traffic maps and a spatial correlation of events) to realize the associative representations of objects and drivable areas, as well as the intuitive reasoning method applied to understanding the situations in different traffic scenarios.The computing framework of autonomous driving based on a selective attention mechanism and intuitive reasoning discussed in this study can adapt to a more complex, open, and dynamic traffic environment.


Acknowledgment

This work was partially supported by National Natural Science Foundation of China (Grant Nos. 61773312, 61790563).


References

[1] Thrun S, Montemerlo M, Dahlkamp H. Stanley: The robot that won the DARPA Grand Challenge. J Field Robotics, 2006, 23: 661-692 CrossRef Google Scholar

[2] Miller G A. The magical number seven, plus or minus two: some limits on our capacity for processing information.. Psychological Rev, 1956, 63: 81-97 CrossRef Google Scholar

[3] Kahneman D, Treisman A, Gibbs B J. The reviewing of object files: Object-specific integration of information. Cognitive Psychology, 1992, 24: 175-219 CrossRef Google Scholar

[4] Kahneman D, Frederick S. Representativeness revisited: Attribute substitution in intuitive judgment. In: Heuristics and Biases: the Psychology of Intuitive Judgment. New York: Cambridge University Press, 2002. 49--81. Google Scholar

[5] Jang Y, Song Y, Yu Y, et al. TGIF-QA: toward spatio-temporal reasoning in visual question answering. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, 2017. 2758--2766. Google Scholar

[6] Ojala T, Pietikainen M, Harwood D. Performance evaluation of texture measures with classification based on kullback discrimination of distributions. In: Proceedings of the 12th IAPR International Conference on Pattern Recognition, Jerusalem, 1994. 582--585. Google Scholar

[7] Ojala T, Pietik?inen M, Harwood D. A comparative study of texture measures with classification based on featured distributions. Pattern Recognition, 1996, 29: 51-59 CrossRef Google Scholar

[8] Zhao L, Thorpe C E. Stereo- and neural network-based pedestrian detection. IEEE Trans Intell Transp Syst, 2000, 1: 148-154 CrossRef Google Scholar

[9] Yuan Y, Xiong Z, Wang Q. An Incremental Framework for Video-Based Traffic Sign Detection, Tracking, and Recognition. IEEE Trans Intell Transp Syst, 2017, 18: 1918-1929 CrossRef Google Scholar

[10] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014. 580--587. Google Scholar

[11] Girshick R. Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, 2015. 1440--1448. Google Scholar

[12] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems. 2015. 91--99. Google Scholar

[13] Liu W, Anguelov D, Erhan D, et al. Ssd: single shot multibox detector. In: Proceedings of European Conference on Computer Vision. Berlin: Springer, 2016. 21--37. Google Scholar

[14] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 779--788. Google Scholar

[15] Redmon J, Farhadi A. Yolo9000: better, faster, stronger. 2017,. arXiv Google Scholar

[16] Wu B C, Iandola F, Jin P H, et al. Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, 2017. Google Scholar

[17] Chen X Z, Ma H M, Wan J, et al. Multi-view 3D object detection network for autonomous driving. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 3. Google Scholar

[18] Ku J, Mozifian M, Lee J, et al. Joint 3D proposal generation and object detection from view aggregation. 2017,. arXiv Google Scholar

[19] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? the kitti vision benchmark suite. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012. 3354--3361. Google Scholar

[20] Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015. 3431--3440. Google Scholar

[21] Badrinarayanan V, Kendall A, Cipolla R. Segnet: a deep convolutional encoder-decoder architecture for image segmentation. 2015,. arXiv Google Scholar

[22] Chen L-C, Papandreou G, Kokkinos I, et al. Semantic image segmentation with deep convolutional nets and fully connected crfs. 2014,. arXiv Google Scholar

[23] Chen L C, Papandreou G, Kokkinos I. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs.. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 834-848 CrossRef PubMed Google Scholar

[24] Wang Q, Gao J, Yuan Y. A Joint Convolutional Neural Networks and Context Transfer for Street Scenes Labeling. IEEE Trans Intell Transp Syst, 2018, 19: 1457-1470 CrossRef Google Scholar

[25] Oliveira G L, Burgard W, Brox T. Efficient deep models for monocular road segmentation. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2016. 4885--4891. Google Scholar

[26] Dai J F, He K M, Sun J. Instance-aware semantic segmentation via multi-task network cascades. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 3150--3158. Google Scholar

[27] He K M, Gkioxari G, Dollár P, et al. Mask r-cnn. In: Proceedings of IEEE International Conference on Computer Vision (ICCV), 2017. 2980--2988. Google Scholar

[28] Li Y, Qi H Z, Dai J F, et al. Fully convolutional instance-aware semantic segmentation. 2016,. arXiv Google Scholar

[29] Bosse M, Zlot R. Continuous 3D scan-matching with a spinning 2D laser. In: Proceedings of IEEE International Conference on Robotics and Automation, 2009. 4312--4319. Google Scholar

[30] Baldwin I, Newman P. Laser-only road-vehicle localization with dual 2d push-broom lidars and 3d priors. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012. 2490--2497. Google Scholar

[31] Pfrunder A, Borges P V K, Romero A R, et al. Real-time autonomous ground vehicle navigation in heterogeneous environments using a 3d lidar. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017. 2601--2608. Google Scholar

[32] Liu Z Y, Yu S Y, Wang X, et al. Detecting drivable area for self-driving cars: an unsupervised approach. 2017,. arXiv Google Scholar

[33] Satzoda R K, Sathyanarayana S, Srikanthan T. Hierarchical Additive Hough Transform for Lane Detection. IEEE Embedded Syst Lett, 2010, 2: 23-26 CrossRef Google Scholar

[34] Huang Y H, Chen S T, Chen Y, et al. Spatial-temproal based lane detection using deep learning. In: Proceedings of IFIP International Conference on Artificial Intelligence Applications and Innovations, 2018. 143--154. Google Scholar

[35] Lee S, Kweon I S, Kim J, et al. Vpgnet: vanishing point guided network for lane and road marking detection and recognition. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), 2017. 1965--1973. Google Scholar

[36] Pan X G, Shi J P, Luo P, et al. Spatial as deep: spatial cnn for traffic scene understanding. 2017,. arXiv Google Scholar

[37] Zhang G, Zheng N N, Cui C, et al. An efficient road detection method in noisy urban environment. In: Proceedings of 2009 IEEE Intelligent Vehicles Symposium. New York: IEEE, 2009. 556--561. Google Scholar

[38] Lv X, Liu Z Y, Xin J M, et al. A novel approach for detecting road based on two-stream fusion fully convolutional network. In: Intelligent Vehicles. New York: IEEE, 2018. Google Scholar

[39] Chen Z, Chen Z J. Rbnet: a deep neural network for unified road and road boundary detection. In: Proceedings of International Conference on Neural Information Processing. Berlin: Springer, 2017. 677--687. Google Scholar

[40] Munoz-Bulnes J, Fernandez C, Parra I, et al. Deep fully convolutional networks with random data augmentation for enhanced generalization in road detection. In: Proceedings of IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), 2017. 366--371. Google Scholar

[41] Lv X, Liu Z Y, Xin J M, et al. A novel approach for detecting road based on two-stream fusion fully convolutional network. In: Proceedings of 2018 IEEE Intelligent Vehicles Symposium (IV). New York: IEEE, 2018. 1464--1469. Google Scholar

[42] Warren C W. Fast path planning using modified a* method. In: Proceedings of IEEE International Conference on Robotics and Automation, 1993. 662--667. Google Scholar

[43] Zeng W, Church R L. Finding shortest paths on real road networks: the case for A*. Int J Geographical Inf Sci, 2009, 23: 531-543 CrossRef Google Scholar

[44] Šišlák D, Volf P, Pvechouček M. Accelerated A* path planning. In: Proceedings of the 8th International Conference on Autonomous Agents and Multiagent Systems, Budapest, 2009. 1133--1134. Google Scholar

[45] LaValle S M. Rapidly-exploring random trees: a new tool for path planning. 1998. Google Scholar

[46] Kuffner J J, LaValle S M. Rrt-connect: an efficient approach to single-query path planning. In: IEEE International Conference on Robotics and Automation, 2000. 995--1001. Google Scholar

[47] Bohlin R, Kavraki L E. Path planning using lazy prm. In: Proceedings of IEEE International Conference on Robotics and Automation, 2000. 521--528. Google Scholar

[48] Barraquand J, Langlois B, Latombe J C. Numerical potential field techniques for robot path planning. IEEE Trans Syst Man Cybern, 1992, 22: 224-241 CrossRef Google Scholar

[49] Yang S X, Luo C. A Neural Network Approach to Complete Coverage Path Planning. IEEE Trans Syst Man Cybern B, 2004, 34: 718-724 CrossRef Google Scholar

[50] Ferrer G, Sanfeliu A. Bayesian Human Motion Intentionality Prediction in urban environments. Pattern Recognition Lett, 2014, 44: 134-140 CrossRef Google Scholar

[51] Ghori O, Mackowiak R, Bautista M, et al. Learning to forecast pedestrian intention from pose dynamics. In: Proceedings of 2018 IEEE Intelligent Vehicles Symposium (IV), 2018. Google Scholar

[52] Ma W-C, Huang D-A, Lee N, et al. Forecasting interactive dynamics of pedestrians with fictitious play. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 4636--4644. Google Scholar

[53] Pfeiffer M, Schaeuble M, Nieto J, et al. From perception to decision: a data-driven approach to end-to-end motion planning for autonomous ground robots. In: Proceedings of 2017 IEEE International Conference on Robotics and Automation (ICRA), 2017. 1527--1533. Google Scholar

[54] Kim B, Kang C M, Lee S H, et al. Probabilistic vehicle trajectory prediction over occupancy grid map via recurrent neural network. 2017,. arXiv Google Scholar

[55] Takahashi A, Hongo T, Ninomiya Y, et al. Local path planning and motion control for agv in positioning. In: Proceedings of IEEE/RSJ International Workshop on Intelligent Robots and Systems. The Autonomous Mobile Robots and Its Applications, 1989. 392--397. Google Scholar

[56] Piazzi A, Bianco C G L. Quintic g/sup 2/-splines for trajectory planning of autonomous vehicles. In: Proceedings of the IEEE Intelligent Vehicles Symposium, 2000. 198--203. Google Scholar

[57] Komoriya K, Tanie K. Trajectory design and control of a wheel-type mobile robot using b-spline curve. In: Proceedings of IEEE/RSJ International Workshop on Intelligent Robots and Systems' 89. The Autonomous Mobile Robots and Its Applications, 1989. 398--405. Google Scholar

[58] Holger B, Dennis N, Marius Z J, et al. From G2 to G3 continuity: continuous curvature rate steering functions for sampling-based nonholonomic motion planning. In: Proceedings of Intelligent Vehicles. New York: IEEE, 2018. Google Scholar

[59] Petereit J, Emter T, Frey C W, et al. Application of hybrid A* to an autonomous mobile robot for path planning in unstructured outdoor environments. In:. Google Scholar

[60] Veres S M, Molnar L, Lincoln N K, et al. Autonomous vehicle control systemsa reviewof decision making. In: Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and ControlEngineering, 2011. 225: 155--195. Google Scholar

[61] Lee D, Yannakakis M. Principles and methods of testing finite state machines-a survey. Proc IEEE, 1996, 84: 1090-1123 CrossRef Google Scholar

[62] Montemerlo M, Becker J, Bhat S. Junior: The Stanford entry in the Urban Challenge. J Field Robotics, 2008, 25: 569-597 CrossRef Google Scholar

[63] Feinberg E A, Shwartz A. Handbook of Markov Decision Processes: Methods and Applications. Berlin: Springer Science & Business Media, 2012. Google Scholar

[64] Ulbrich S, Maurer M. Probabilistic online pomdp decision making for lane changes in fully automated driving. In: the 16th International IEEE Conference on Intelligent Transportation Systems-(ITSC), 2013. 2063--2067. Google Scholar

[65] Brechtel S, Gindele T, Dillmann R. Probabilistic decision-making under uncertainty for autonomous driving using continuous pomdps. In: In: Proceedings of IEEE 17th International Conference on Intelligent Transportation Systems (ITSC), 2014. 392--399. Google Scholar

[66] van Otterlo M, Wiering M. Reinforcement learning and markov decision processes. In: Proceedings of Reinforcement Learning. Belin: Springer, 2012. 3--42. Google Scholar

[67] Morton J, Wheeler T A, Kochenderfer M J. Analysis of Recurrent Neural Networks for Probabilistic Modeling of Driver Behavior. IEEE Trans Intell Transp Syst, 2017, 18: 1289-1298 CrossRef Google Scholar

[68] Xu L H, Wang Y Z, Sun H B. Integrated Longitudinal and Lateral Control for Kuafu-II Autonomous Vehicle. IEEE Trans Intell Transp Syst, 2016, 17: 2032-2041 CrossRef Google Scholar

[69] Coulter R C. Implementation of the Pure Pursuit Path Tracking Algorithm. Technical Report, Carnegie-Mellon UNIV Pittsburgh PA Robotics INST, 1992. Google Scholar

[70] Camacho E F, Alba C B. Model Predictive Control. Berlin: Springer Science & Business Media, 2013. Google Scholar

[71] Rasekhipour Y, Khajepour A, Chen S K. A Potential Field-Based Model Predictive Path-Planning Controller for Autonomous Road Vehicles. IEEE Trans Intell Transp Syst, 2017, 18: 1255-1267 CrossRef Google Scholar

[72] Varshney P K. Multisensor data fusion. Electron Communication Eng J, 1997, 9: 245-253 CrossRef Google Scholar

[73] Hall D L, Llinas J. An introduction to multisensor data fusion. Proc IEEE, 1997, 85: 6-23 CrossRef Google Scholar

[74] Zhang Q, Liu Y, Blum R S. Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: A review. Inf Fusion, 2018, 40: 57-75 CrossRef Google Scholar

[75] Liu Y H, Fan X Q, Lv C. An innovative information fusion method with adaptive Kalman filter for integrated INS/GPS navigation of autonomous vehicles. Mech Syst Signal Processing, 2018, 100: 605-616 CrossRef ADS Google Scholar

[76] Behrendt K, Novak L, Botros R. A deep learning approach to traffic lights: Detection, tracking, and classification. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2017. 1370--1377. Google Scholar

[77] Haberjahn M, Kozempel K. Multi level fusion of competitive sensors for automotive environment perception. In: Proceedings of 16th International Conference on Information Fusion (FUSION), 2013, 2013. 397--403. Google Scholar

[78] Scheunert U, Lindner P, Richter E, e al. Early and multi level fusion for reliable automotive safety systems. In: Proceedings of Intelligent Vehicles Symposium. New York: IEEE, 2007. 196--201. Google Scholar

[79] Rodr'ıguez-Garavito C H, Ponz A, Garc'ıa F, e al. Automatic laser and camera extrinsic calibration for data fusion using road plane. In: Proceedings of the 17th International Conference On Information Fusion (FUSION), 2014. 1--6. Google Scholar

[80] Park Y, Yun S, Won C S. Calibration between color camera and 3D LIDAR instruments with a polygonal planar board.. Sensors, 2014, 14: 5333-5353 CrossRef PubMed Google Scholar

[81] Wang X, Xu L H, Sun H B. On-Road Vehicle Detection and Tracking Using MMW Radar and Monovision Fusion. IEEE Trans Intell Transp Syst, 2016, 17: 2075-2084 CrossRef Google Scholar

[82] Wang T, Xin J M, Zheng N N. A method integrating human visual attention and consciousness of radar and vision fusion for autonomous vehicle navigation. In: Proceedings of IEEE 4th International Conference on Space Mission Challenges for Information Technology (SMC-IT), 2011. 192--197. Google Scholar

[83] Zhu Z, Liu J L. Unsupervised extrinsic parameters calibration for multi-beam lidars. In: Proceedings of the 2nd International Conference on Computer Science and Electronics Engineering, Paris, 2013. 1110--1113. Google Scholar

[84] Jiang J J, Xue P X, Chen S T, et al. Line feature based extrinsic calibration of lidar and camera. In: Proceedings of 2018 IEEE International Conference on Vehicular Electronics and Safety (ICVES), 2018. 1--6. Google Scholar

[85] Sun S L, Deng Z L. Multi-sensor optimal information fusion Kalman filter. Automatica, 2004, 40: 1017-1023 CrossRef Google Scholar

[86] S\"{a}rkk\"{a} S, Vehtari A, Lampinen J. Rao-Blackwellized particle filter for multiple target tracking. Inf Fusion, 2007, 8: 2-15 CrossRef Google Scholar

[87] Yang G S, Lin Y Z, Bhattacharya P. A driver fatigue recognition model based on information fusion and dynamic Bayesian network. Inf Sci, 2010, 180: 1942-1954 CrossRef Google Scholar

[88] Li Y B, Chen J, Ye F. The Improvement of DS Evidence Theory and Its Application in IR/MMW Target Recognition. J Senss, 2016, 2016(6): 1-15 CrossRef Google Scholar

[89] Wu H D, Siegel M, Stiefelhagen R, et al. Sensor fusion using dempster-shafer theory [for context-aware hci]. In: Proceedings of the 19th IEEE Instrumentation and Measurement Technology Conference, 2002. 7--12. Google Scholar

[90] Murphy R R. Dempster-Shafer theory for sensor fusion in autonomous mobile robots. IEEE Trans Robot Automat, 1998, 14: 197-206 CrossRef Google Scholar

[91] V Subramanian , Burks T F, Dixon W E. Sensor Fusion Using Fuzzy Logic Enhanced Kalman Filter for Autonomous Vehicle Guidance in Citrus Groves. Trans ASABE, 2009, 52: 1411-1422 CrossRef Google Scholar

[92] Klein L A, Klein L A. Sensor and data fusion: a tool for information assessment and decision making. In: Proceedings of SPIE, 2004. Google Scholar

[93] Eslami S M A, Jimenez Rezende D, Besse F. Neural scene representation and rendering. Science, 2018, 360: 1204-1210 CrossRef PubMed ADS Google Scholar

[94] Chen S T, Shang J H, Zhang S Y, et al. Cognitive map-based model: toward a developmental framework for self-driving cars. In: Proceedings of IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), 2017. 1--8. Google Scholar

[95] Chen S T, Zhang S Y, Shang J H. Brain-Inspired Cognitive Model With Attention for Self-Driving Cars. IEEE Trans Cogn Dev Syst, 2019, 11: 13-25 CrossRef Google Scholar

[96] Li D Y, Gao H B. A Hardware Platform Framework for an Intelligent Vehicle Based on a Driving Brain. Engineering, 2018, 4: 464-470 CrossRef Google Scholar

[97] Chen L. The topological approach to perceptual organization. Visual Cognition, 2005, 12: 553-637 CrossRef Google Scholar

[98] Eslami S M A, Heess N, Weber T, et al. Attend, infer, repeat: fast scene understanding with generative models. In: Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, 2016. 3225--3233. Google Scholar

[99] Bar-Shalom Y, Daum F, Huang J. The probabilistic data association filter. IEEE Control Syst, 2009, 29: 82-100 CrossRef Google Scholar

[100] Svensson L, Svensson D, Guerriero M. Set JPDA Filter for Multitarget Tracking. IEEE Trans Signal Process, 2011, 59: 4677-4691 CrossRef ADS Google Scholar

[101] Blackman S S. Multiple hypothesis tracking for multiple target tracking. IEEE Aerosp Electron Syst Mag, 2004, 19: 5-18 CrossRef Google Scholar

[102] Kim C, Li F X, Ciptadi A, et al. Multiple hypothesis tracking revisited. In: Proceedings of the IEEE International Conference on Computer Vision, 2015. 4696--4704. Google Scholar

[103] Kuhn H W. The Hungarian method for the assignment problem. Naval Res Logistics, 1955, 2: 83-97 CrossRef Google Scholar

[104] Cho H, Seo Y-W, Kumar B V K V, et al. A multi-sensor fusion system for moving object detection and tracking in urban driving environments. In: Proceedings of IEEE International Conference on Robotics and Automation (ICRA), 2014. 1836--1843. Google Scholar

[105] Chavez-Garcia R O, Aycard O. Multiple Sensor Fusion and Classification for Moving Object Detection and Tracking. IEEE Trans Intell Transp Syst, 2016, 17: 525-534 CrossRef Google Scholar

[106] Göhring D, Wang M, Schnürmacher M, et al. Radar/lidar sensor fusion for car-following on highways. In: Proceedings of the 5th International Conference on Automation, Robotics and Applications (ICARA), 2011. 407--412. Google Scholar

[107] Fayad F, Cherfaoui V. Object-level fusion and confidence management in a multi-sensor pedestrian tracking system. In: Proceedings of IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems, 2008. 58--63. Google Scholar

[108] Kim D Y, Jeon M. Data fusion of radar and image measurements for multi-object tracking via Kalman filtering. Inf Sci, 2014, 278: 641-652 CrossRef Google Scholar

[109] Govaers F, Koch W. An Exact Solution to Track-to-Track-Fusion at Arbitrary Communication Rates. IEEE Trans Aerosp Electron Syst, 2012, 48: 2718-2729 CrossRef ADS Google Scholar

[110] Zhang Z Y, Fidler S, Urtasun R. Instance-level segmentation for autonomous driving with deep densely connected mrfs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. 669--677. Google Scholar

[111] Sun Y X, Liu M, Meng M Q H. Improving RGB-D SLAM in dynamic environments: A motion removal approach. Robotics Autonomous Syst, 2017, 89: 110-122 CrossRef Google Scholar

[112] Sun Y X, Liu M, Meng M Q H. Motion removal for reliable RGB-D SLAM in dynamic environments. Robotics Autonomous Syst, 2018, 108: 115-128 CrossRef Google Scholar

[113] Caron F, Duflos E, Pomorski D. GPS/IMU data fusion using multisensor Kalman filtering: introduction of contextual aspects. Inf Fusion, 2006, 7: 221-230 CrossRef Google Scholar

[114] Suhr J K, Jang J, Min D. Sensor Fusion-Based Low-Cost Vehicle Localization System for Complex Urban Environments. IEEE Trans Intell Transp Syst, 2017, 18: 1078-1086 CrossRef Google Scholar

[115] Wan G W, Yang X L, Cai R L, et al. Robust and precise vehicle localization based on multi-sensor fusion in diverse city scenes. 2017,. arXiv Google Scholar

[116] Tamar A, Wu Y, Thomas G, et al. Value iteration networks. In: Advances in Neural Information Processing Systems, 2016. 2154--2162. Google Scholar

[117] Katsuki F, Constantinidis C. Bottom-up and top-down attention: different processes and overlapping neural systems.. Neuroscientist, 2014, 20: 509-521 CrossRef PubMed Google Scholar

[118] Miller E K. Neurobiology: Straight from the top. Nature, 1999, 401: 650-651 CrossRef PubMed ADS Google Scholar

[119] Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Machine Intell, 1998, 20: 1254-1259 CrossRef Google Scholar

[120] Kadir T, Brady M. Saliency, scale and image description. Int J Comput Vision, 2001, 45: 83-105 CrossRef Google Scholar

[121] Ba J, Mnih V, Kavukcuoglu K. Multiple object recognition with visual attention. 2014,. arXiv Google Scholar

[122] Hu J, Shen L, Sun G. Squeeze-and-excitation networks,. arXiv Google Scholar

[123] Fu J L, Zheng H L, Mei T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 3. Google Scholar

[124] Kuo W J, Sjostrom T, Chen Y P. Intuition and Deliberation: Two Systems for Strategizing in the Brain. Science, 2009, 324: 519-522 CrossRef PubMed ADS Google Scholar

[125] Zheng N N, Liu Z Y, Ren P J. Hybrid-augmented intelligence: collaboration and cognition. Front Inf Technol Electron Eng, 2017, 18: 153-179 CrossRef Google Scholar

[126] Zhao D B, Hu Z H, Xia Z P. Full-range adaptive cruise control based on supervised adaptive dynamic programming. Neurocomputing, 2014, 125: 57-67 CrossRef Google Scholar

[127] Mnih V, Kavukcuoglu K, Silver D. Human-level control through deep reinforcement learning. Nature, 2015, 518: 529-533 CrossRef PubMed ADS Google Scholar

[128] Silver D, Huang A, Maddison C J. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529: 484-489 CrossRef PubMed ADS Google Scholar

[129] Gupta S, Davidson J, Levine S, et al. Cognitive mapping and planning for visual navigation. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. 7272--7281. Google Scholar

[130] Rusu A A, Rabinowitz N C, Desjardins G, et al. Progressive neural networks. 2016,. arXiv Google Scholar

[131] Rusu A A, Matej Vecerik, Thomas Rothorl, et al. Sim-to-real robot learning from pixels with progressive nets. 2016,. arXiv Google Scholar

  • Figure 1

    Data-driven computing framework of self-driving cars. The scene perception obtained by the sensors is used in localization, after planning and decision-making step, the control signal received by the self-driving control unit is utilized to control the self-driving cars. Meanwhile, environments like road conditions are feedback into the perception module for further processing.

  • Figure 2

    Cognition and understanding of traffic situation based on the information from the results of multi-sensor fusion. Using a variety of sensors to obtain data of different feature spaces and using different algorithms for analysis, and finally get cognitive results for the situation.

  • Figure 3

    The generation process from sensation to perception. Perception is the cognitive result of the external things formed by the brain. It is the combination of two parts of information. The first part is the input information of the outside world, and the second part is the memory.

  • Figure 4

    Objects detection and image segmentation in traffic scenarios. Three types of traffic scene perception algorithms are listed here, the input of which is mainly images and LiDAR point cloud. Different types of algorithms have different outputs for specific tasks, and these outputs will also play different roles in the perception of autonomous driving systems.

  • Figure 5

    Unsupervised method for drivable area detection. The scene is analyzed in different feature spaces, and the different features obtained are merged. Finally belief propagation is used to analyze the fused features to obtain probability of the drivable area.

  • Figure 6

    Illustration of decision-making finite state machine. The finite state machine is consist of several driving behavioral states. With the different input information, it defines the transition relationships of the current and the next behavioral state.

  • Figure 7

    Longitudinal and lateral control computing framework. (a) describes the longitudinal control computing framework, which consists of speed sub-controller and space sub-controller. Speed controller takes speed as the input and transforms the command to the control of throttle and brake, whereas the space-sub controller acts as a basic distance insurance (autonomous emergency braking). (b) shows the details of the lateral control computing framework. It uses the sliding mode control and nonlinear feedback proportional control with an amplitude limited unit to obtain the steering command, and the performance monitor is operated as the constraint of the longitudinal speed.

  • Figure 8

    Cognitive process of understanding traffic scenes. It is made up of pre-processing, feature extraction, post-processing such as classification, regression steps and finally obtains the description of the scene.

  • Figure 9

    From data-driven scene perception to event-driven situational cognition. This is our assumption for an event-driven model. The computing framework of the model is to extract the events among them through the correlation between the data, and describe the scene through the events. It corresponds to the psychological cognitive process of humans as the cognitive framework's description.

  • Figure 10

    Localization method with perceptual objects. Through the similarity analysis of the perceptual objects in the map and the scene, we can get the correspondence between the map and the point cloud in the scene, so as to obtain the localization information by using the point cloud registration method.

  • Figure 11

    Visualization of the localization results using perceptual objects. The purple point cloud is the map, and the blue point cloud is the scene currently perceived by the vehicle. It can be seen that the two have a high degree of coincidence, which proves the accuracy of the positioning.

  • Figure 12

    Decision-level multi-source fusion-based object detection framework. It is made up of single source processing and information fusion stages, where each single-source sensors obtain the object separately at the single source processing step and followed by the fusing process which finally obtains the complete description of the scene.

  • Figure 13

    Cognitive computing framework of self-driving cars with selective attention mechanism and intuitive reasoning. The cognitive process is divided into four parts. The first part is the feature extraction by convolutional neural network, and the second part is the combination of features and prior knowledge to form a cognitive map. The third part is to filter the cognitive map by implementing the attention mechanism through LSTM. The last part is to reflect the cognitive map to the behavior space through the value iteration model, and output the final behavior decision.

  • Figure 14

    Reinforcement learning framework for adaptive cruise control. The desired speed and distance and the current speed and distance are input into the trained reinforcement learning model, and the acceleration is output to the control module to achieve vehicle following.

  • Figure 15

    The generation of autonomous vehicle's intuition based on “reinforcement-transfer" learning. The autonomous driving model is trained in the simulation environment using the deep reinforcement learning, and then the model is transplanted to autonomous vehicles through migration learning.

  • Table 1   Number of point clouds for the original map, the sampled map, and the perceived object
    Type Number of points
    Raw map 1344843
    Sampled map 22483
    Perceptual objects 17756

Copyright 2020 Science China Press Co., Ltd. 《中国科学》杂志社有限责任公司 版权所有

京ICP备17057255号       京公网安备11010102003388号