logo

SCIENTIA SINICA Informationis, Volume 50 , Issue 12 : 1798(2020) https://doi.org/10.1360/SSI-2020-0065

A selected review of reinforcement learning-based control for autonomous underwater vehicles

More info
  • ReceivedMar 19, 2020
  • AcceptedApr 23, 2020
  • PublishedOct 20, 2020

Abstract

Recently, reinforcement learning (RL) has been studied extensively and achieved promising results in a wide range of control tasks. In addition, autonomous underwater vehicles (AUVs) are important tools for executing complex and challenging tasks in marine environments. The advances in RL offers ample opportunities to develop intelligent AUVs. This paper provides a selected review of RL-based control for AUVs with a focus on applications of RL in low-level control tasks for underwater regulation and tracking. We first present a concise introduction to the RL-based control framework. We then provide an overview of RL methods for AUV control problems, where primary challenges and recent progress are discussed. Finally, two representative RL-based controller cases are presented in detail for model-free RL methods for AUVs.


Funded by

国家重点研发计划(2016YFC0300801)

国家自然科学基金(41576101,41427806)


References

[1] Li Z, You K, Song S. AUV Based Source Seeking with Estimated Gradients. J Syst Sci Complex, 2018, 31: 262-275 CrossRef Google Scholar

[2] Xiang X, Jouvencel B, Parodi O. Coordinated Formation Control of Multiple Autonomous Underwater Vehicles for Pipeline Inspection. Int J Adv Robotic Syst, 2010, 7: 3 CrossRef Google Scholar

[3] Ribas D, Palomeras N, Ridao P. Girona 500 AUV: From Survey to Intervention. IEEE/ASME Trans Mechatron, 2012, 17: 46-53 CrossRef Google Scholar

[4] Kiumarsi B, Vamvoudakis K G, Modares H. Optimal and Autonomous Control Using Reinforcement Learning: A Survey. IEEE Trans Neural Netw Learning Syst, 2018, 29: 2042-2062 CrossRef Google Scholar

[5] Kim H J, Jordan M I, Sastry S, et al. Autonomous helicopter flight via reinforcement learning. In: Proceedings of the 16th International Conference on Neural Information Processing Systems, Vancouver, 2003. 799--806. Google Scholar

[6] Bagnell J A, Schneider J G. Autonomous helicopter control using reinforcement learning policy search methods. In: Proceedings of IEEE International Conference on Robotics and Automation, Seoul, 2001. 1615--1620. Google Scholar

[7] Waslander S L, Hoffmann G M, Jang J S, et al. Multi-agent quadrotor testbed control design: Integral sliding mode vs. reinforcement learning. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Edmonton, 2005. 3712--3717. Google Scholar

[8] Abbeel P, Coates A, Ng A Y. Autonomous Helicopter Aerobatics through Apprenticeship Learning. Int J Robotics Res, 2010, 29: 1608-1639 CrossRef Google Scholar

[9] Hester T, Quinlan M, Stone P. A real-time model-based reinforcement learning architecture for robot control. 2011,. arXiv Google Scholar

[10] Kendall A, Hawke J, Janz D, et al. Learning to drive in a day. In: Proceedings of International Conference on Robotics and Automation, Montreal, 2019. 8248--8254. Google Scholar

[11] Sutton R S, Barto A G. Reinforcement Learning: An Introduction. 2nd ed. Cambridge: MIT press, 2018. Google Scholar

[12] Bertsekas D P. Dynamic Programming and Optimal Control. Nashua: Athena scientific Belmont, 1995. Google Scholar

[13] Sutton R S, McAllester D A, Singh S P, et al. Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of the 12th International Conference on Neural Information Processing Systems, Denver, 1999. 1057--1063. Google Scholar

[14] Williams R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach Learn, 1992, 8: 229-256 CrossRef Google Scholar

[15] Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms. In: Proceedings of the 31th International Conference on Machine Learning, Beijing, 2014. 387--395. Google Scholar

[16] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. 2015,. arXiv Google Scholar

[17] Fujimoto S, Hoof H, Meger D. Addressing function approximation error in actor-critic methods. In: Proceedings of the 35th International Conference on Machine Learning, Stockholm, 2018. 1582--1591. Google Scholar

[18] Amari S. Natural Gradient Works Efficiently in Learning. Neural Computation, 1998, 10: 251-276 CrossRef Google Scholar

[19] Schulman J, Levine S, Abbeel P, et al. Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning, Lille, 2015. 1889--1897. Google Scholar

[20] Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms. 2017,. arXiv Google Scholar

[21] Polydoros A S, Nalpantidis L. Survey of Model-Based Reinforcement Learning: Applications on Robotics. J Intell Robot Syst, 2017, 86: 153-173 CrossRef Google Scholar

[22] Anthony T, Tian Z, Barber D. Thinking fast and slow with deep learning and tree search. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, Long Beach, 2017. 5360--5370. Google Scholar

[23] Racanière S, Weber T, Reichert D, et al. Imagination-augmented agents for deep reinforcement learning. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, Long Beach, 2017. 5690--5701. Google Scholar

[24] Feinberg V, Wan A, Stoica I, et al. Model-based value estimation for efficient model-free reinforcement learning. 2018,. arXiv Google Scholar

[25] Khodayari M H, Balochian S. Modeling and control of autonomous underwater vehicle (AUV) in heading and depth attitude via self-adaptive fuzzy PID controller. J Mar Sci Technol, 2015, 20: 559-578 CrossRef Google Scholar

[26] Narasimhan M, Singh S N. Adaptive optimal control of an autonomous underwater vehicle in the dive plane using dorsal fins. Ocean Eng, 2006, 33: 404-416 CrossRef Google Scholar

[27] Lapierre L, Soetanto D. Nonlinear path-following control of an AUV. Ocean Eng, 2007, 34: 1734-1744 CrossRef Google Scholar

[28] Elmokadem T, Zribi M, Youcef-Toumi K. Trajectory tracking sliding mode control of underactuated AUVs. NOnlinear Dyn, 2016, 84: 1079-1091 CrossRef Google Scholar

[29] Antonelli G. Underwater Robots. 4th ed. Berlin: Springer, 2018. Google Scholar

[30] Kober J, Bagnell J A, Peters J. Reinforcement learning in robotics: A survey. Int J Robotics Res, 2013, 32: 1238-1274 CrossRef Google Scholar

[31] El-Fakdi A, Carreras M. Two-step gradient-based reinforcement learning for underwater robotics behavior learning. Robotics Autonomous Syst, 2013, 61: 271-282 CrossRef Google Scholar

[32] Ferreira F, Machado D, Ferri G, et al. Underwater optical and acoustic imaging: a time for fusion? a brief overview of the state-of-the-art. In: Proceedings of Oceans 2016 MTS/IEEE, Monterey, 2016. Google Scholar

[33] Lu H, Li Y, Zhang L. Contrast enhancement for images in turbid water. J Opt Soc Am A, 2015, 32: 886-893 CrossRef ADS Google Scholar

[34] Heidemann J, Stojanovic M, Zorzi M. Underwater sensor networks: applications, advances and challenges. Proc R Soc A, 2012, 370: 158-175 CrossRef ADS Google Scholar

[35] Yoo B, Kim J. Path optimization for marine vehicles in ocean currents using reinforcement learning. J Mar Sci Technol, 2016, 21: 334-343 CrossRef Google Scholar

[36] Wang C, Wei L, Wang Z. Reinforcement Learning-Based Multi-AUV Adaptive Trajectory Planning for Under-Ice Field Estimation. Sensors, 2018, 18: 3859 CrossRef Google Scholar

[37] Hu H, Song S, Chen C L P. Plume Tracing via Model-Free Reinforcement Learning Method. IEEE Trans Neural Netw Learning Syst, 2019, 30: 2515-2527 CrossRef Google Scholar

[38] Wu H, Song S, You K. Depth Control of Model-Free AUVs via Reinforcement Learning. IEEE Trans Syst Man Cybern Syst, 2019, 49: 2499-2510 CrossRef Google Scholar

[39] Carlucho I, De Paula M, Wang S. Adaptive low-level control of autonomous underwater vehicles using deep reinforcement learning. Robotics Autonomous Syst, 2018, 107: 71-86 CrossRef Google Scholar

[40] Carlucho I, de Paula M, Wang S, et al. AUV position tracking control using end-to-end deep reinforcement learning. In: Proceedings of Oceans 2018 MTS/IEEE, Charleston, 2018. Google Scholar

[41] Ahmadzadeh S R, Kormushev P, Caldwell D G. Multi-objective reinforcement learning for AUV thruster failure recovery. In: Proceedings of IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Orlando, 2014. Google Scholar

[42] Stokey R P, Roup A, von Alt C, et al. Development of the remus 600 autonomous underwater vehicle. In: Proceedings of Oceans 2005 MTS/IEEE, Washington, 2005. 1301--1304. Google Scholar

[43] Fernandez-Gauna B, Osa J L, Gra na M. Effect of initial conditioning of reinforcement learning agents on feedback control tasks over continuous state and action spaces. In: Proceedings of International Joint Conference SOCO'14-CISIS'14-ICEUTE'14, Bilbao, 2014. 125--133. Google Scholar

[44] Walters P, Kamalapurkar R, Voight F. Online Approximate Optimal Station Keeping of a Marine Craft in the Presence of an Irrotational Current. IEEE Trans Robot, 2018, 34: 486-496 CrossRef Google Scholar

[45] Knudsen K B, Nielsen M C, Schjølberg I. Deep learning for station keeping of auvs. In: Proceedings of Oceans 2019 MTS/IEEE, Seattle, 2019. Google Scholar

[46] Frost G, Lane D M. Evaluation of Q-learning for search and inspect missions using underwater vehicles. In: Proceedings of Oceans 2014 MTS/IEEE, St. John's, 2014. Google Scholar

[47] Jamali N, Kormushev P, Ahmadzadeh S R, et al. Covariance analysis as a measure of policy robustness. In: Proceedings of Oceans 2014 MTS/IEEE, Taipei, 2014. Google Scholar

[48] Leonetti M, Ahmadzadeh S R, Kormushev P. On-line learning to recover from thruster failures on autonomous underwater vehicles. In: Proceedings of Oceans 2013 MTS/IEEE, San Diego, 2013. Google Scholar

[49] Ahmadzadeh S R, Leonetti M, Carrera A, et al. Online discovery of AUV control policies to overcome thruster failures. In: Proceedings of IEEE International Conference on Robotics and Automation, HongKong, 2014. 6522--6528. Google Scholar

[50] Palomeras N, El-Fakdi A, Carreras M. COLA2: A Control Architecture for AUVs. IEEE J Ocean Eng, 2012, 37: 695-716 CrossRef ADS Google Scholar

[51] Sun T T, He B, Nian R, et al. Target following for an autonomous underwater vehicle using regularized ELM-based reinforcement learning. In: Proceedings of Oceans 2015 MTS/IEEE, Washington, 2015. Google Scholar

[52] Shi W J, Song S J, Wu C. High-level tracking of autonomous underwater vehicles based on pseudo averaged Q-learning. In: Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, Miyazaki, 2018. 4138--4143. Google Scholar

[53] Shi W, Song S, Wu C. Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles. IEEE Trans Neural Netw Learning Syst, 2019, 30: 3534-3546 CrossRef Google Scholar

[54] Zhang Q, Lin J, Sha Q. Deep Interactive Reinforcement Learning for Path Following of Autonomous Underwater Vehicle. IEEE Access, 2020, 8: 24258-24268 CrossRef Google Scholar

[55] Yu R S, Shi Z Y, Huang C X, et al. Deep reinforcement learning based optimal trajectory tracking control of autonomous underwater vehicle. In: Proceedings of the 36th Chinese Control Conference, Dalian, 2017. 4958--4965. Google Scholar

[56] Cui R X, Yang C G, Li Y, et al. Neural network based reinforcement learning control of autonomous underwater vehicles with control input saturation. In: Proceedings of UKACC International Conference on Control, Loughborough, 2014. 50--55. Google Scholar

[57] Cui R, Yang C, Li Y. Adaptive Neural Network Control of AUVs With Control Input Nonlinearities Using Reinforcement Learning. IEEE Trans Syst Man Cybern Syst, 2017, 47: 1019-1029 CrossRef Google Scholar

[58] Guo X, Yan W, Cui R. Integral Reinforcement Learning-Based Adaptive NN Control for Continuous-Time Nonlinear MIMO Systems With Unknown Control Directions. IEEE Trans Syst Man Cybern Syst, 2019, : 1-10 CrossRef Google Scholar

[59] Guo X, Yan W, Cui R. Event-Triggered Reinforcement Learning-Based Adaptive Tracking Control for Completely Unknown Continuous-Time Nonlinear Systems. IEEE Trans Cybern, 2020, 50: 3231-3242 CrossRef Google Scholar

[60] Lin L X, Xie H B, Shen L C. Application of reinforcement learning to autonomous heading control for bionic underwater robots. In: Proceedings of IEEE International Conference on Robotics and Biomimetics, Bangkok, 2009. 2486--2490. Google Scholar

[61] Lin L, Xie H, Zhang D. Supervised Neural Q_learning based Motion Control for Bionic Underwater Robots. J Bionic Eng, 2010, 7: S177-S184 CrossRef Google Scholar

[62] Wang J, Kim J. Optimization of fish-like locomotion using hierarchical reinforcement learning. In: Proceedings of the 12th International Conference on Ubiquitous Robots and Ambient Intelligence, Goyang, 2015. 465--469. Google Scholar

[63] Yan S Z, Wang J, Wu Z X, et al. Motion optimization for a robotic fish based on adversarial structured control. In: Proceedings of IEEE International Conference on Robotics and Biomimetics, Dali, 2019. 346--351. Google Scholar

[64] Prahacs C, Saudners A, Smith M K, et al. Towards legged amphibious mobile robotics. In: Proceedings of Canadian Design Engineering Network Conference, Montreal, 2004. Google Scholar

[65] Meger D, Higuera J C G, Xu A, et al. Learning legged swimming gaits from experience. In: Proceedings of IEEE International Conference on Robotics and Automation, Seattle, 2015. 2332--2338. Google Scholar

[66] Higuera J C G, Meger D, Dudek G. Synthesizing neural network controllers with probabilistic model-based reinforcement learning. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Madrid, 2018. 2538--2544. Google Scholar

[67] Zhang X L, Li B, Chang J, et al. Gliding control of underwater gliding snake-like robot based on reinforcement learning. In: Proceedings of the 8th Annual International Conference on CYBER Technology in Automation, Control, and Intelligent Systems, Tianjin, 2018. 323--328. Google Scholar

[68] Zang W C, Nie Y L, Song D L, et al. Research on constraining strategies of underwater glider's trajectory under the influence of ocean currents based on DQN algorithm. In: Proceedings of Oceans 2019 MTS/IEEE, Seatle, 2019. Google Scholar

[69] Schaul T, Quan J, Antonoglou I, et al. Prioritized experience replay. 2015,. arXiv Google Scholar

[70] Wu H, Song S J, Hsu Y C, et al. End-to-end sensorimotor control problems of AUVs with deep reinforcement learning. In: Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, Macau, 2019. 5869--5874. Google Scholar