Abstract
Inverse reinforcement learning features a reward function based on reward features and positive demonstrations. When complex learning tasks are performed, the entire state space is used to form the set of reward features, but this large set results in a long computational time. Retrieving important states from the full state space addresses this problem. This study formulated a method that extracts critical features by combining negative and positive demonstrations of searching for critical states from the entire state space to increase learning efficiency. In this method, two types of demonstrations are used: positive demonstrations, which are given by experts and agents imitate, and negative demonstrations, which demonstrate incorrect motions to be avoided by agents. All significant features are extracted by identifying the critical states over the entire state space. This is achieved by comparing the difference between the negative and positive demonstrations. When these critical states are identified, they form the set of reward features, and a reward function is derived that enables agents to learn a policy using reinforcement learning quickly. A speeding car simulation was used to verify the proposed method. The simulation results demonstrate that the proposed approach allows an agent to search for a positive strategy and that the agent then displays intelligent expert-like behavior.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Cambridge
Shi H, Sun G, Wang Y, Hwang KS (2018) Adaptive image-based visual servoing with temporary loss of the visual signal. IEEE Trans Industr Inf 15(4):1956–1965
Hwang KS, Jiang WC, Chen YJ (2014) Model learning and knowledge sharing for a multiagent system with Dyna-Q learning. IEEE Trans Cybern 45(5):978–990
Shi H, Li X, Hwang KS, Pan W, Xu G (2016) Decoupled visual servoing with fuzzy Q-learning. IEEE Trans Industr Inf 14(1):241–252
Liu B, Singh S, Lewis RL, Qin S (2014) Optimal rewards for cooperative agents. IEEE Trans Auton Ment Dev 6(4):286–297
Abbeel P, Dolgov D, Ng AY, Thrun S (2008) Apprenticeship learning for motion planning with application to parking lot navigation. In: 2008 IEEE/RSJ international conference on intelligent robots and systems, IEEE (pp 1083–1090)
Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on Machine learning, p 1
Hwang M, Jiang WC, Chen YJ, Hwang KS, Tseng YC (2019) An efficient unified approach using demonstrations for inverse reinforcement learning. IEEE Trans Cogn Develop Syst. https://doi.org/10.1109/TCDS.2019.2957831
Michini B, Walsh TJ, Agha-Mohammadi AA, How JP (2015) Bayesian nonparametric reward learning from demonstration. IEEE Trans Rob 31(2):369–386
Choi J, Kim KE (2014) Hierarchical bayesian inverse reinforcement learning. IEEE Trans Cybern 45(4):793–805
Daskalakis C, Foster DJ, Golowich N (2021) Independent policy gradient methods for competitive reinforcement learning. arXiv:2101.04233
Moerland TM, Broekens J, Jonker CM (2020) Model-based reinforcement learning: a survey. arXiv:2006.16712
Kiran BR, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst 2021:1
Haydari A, Yilmaz Y (2020) Deep reinforcement learning for intelligent transportation systems: a survey. IEEE Trans Intell Transp Syst 2020:2
Levine S, Popovic Z, Koltun V (2010) Feature construction for inverse reinforcement learning. In: NIPS, vol 23, p 1342
Abbeel P, Coates A, Ng AY (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Rob Res 29(13):1608–1639
Arora S, Doshi P (2021) A survey of inverse reinforcement learning: challenges, methods and progress. Artif Intell 2021:103500
Tang J, Singh A, Goehausen N, Abbeel P (2010) Parameterized maneuver learning for autonomous helicopter flight. In 2010 IEEE international conference on robotics and automation (pp 1142–1148), IEEE
Grollman DH, Billard A (2011) Donut as i do: learning from failed demonstrations. In 2011 IEEE international conference on robotics and automation (pp 3804–3809), IEEE
Zhang T, Liu Y, Hwang M, Hwang KS, Ma C, Cheng J (2020) An end-to-end inverse reinforcement learning by a boosting approach with relative entropy. Inf Sci 520:1–14
Lopes M, Melo F, Montesano L (2009) Active learning for reward estimation in inverse reinforcement learning. In: Joint European conference on machine learning and knowledge discovery in databases (pp 31–46). Springer, Berlin, Heidelberg
Kolter JZ, Abbeel P, Ng AY (2008) Hierarchical apprenticeship learning with application to quadruped locomotion. In: Advances in neural information processing systems (pp 769–776)
Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Icml (Vol 1, p2)
Schapire RE (1999) A brief introduction to boosting. In: Ijcai (Vol 99, pp 1401–1406)
Pflueger M, Agha A, Sukhatme GS (2019) Rover-IRL: inverse reinforcement learning with soft value iteration networks for planetary rover path planning. IEEE Rob Autom Lett 4(2):1387–1394
Zeng Y, Xu K, Qin L, Yin Q (2020) A semi-Markov decision model with inverse reinforcement learning for recognizing the destination of a maneuvering agent in real time strategy games. IEEE Access 8:15392–15409
Pelusi D, Mascella R (2013) Optimal control Algorithms for second order Systems. J Comput Sci 9(2):183–197
Roman RC, Precup RE, Petriu EM (2021) Hybrid data-driven fuzzy active disturbance rejection control for tower crane systems. Eur J Control 58:373–387
Turnip A, Panggabean JH (2020) Hybrid controller design based magneto-rheological damper lookup table for quarter car suspension. Int J Artif Intell 18(1):193–206
Xue W, Kolaric P, Fan J, Lian B, Chai T, Lewis FL (2021) Inverse reinforcement learning in tracking control based on inverse optimal control. IEEE Trans Cybern 2021:5
Dvijotham K, Todorov E (2010) Inverse optimal control with linearly-solvable MDPs. In: ICML
Xiang F, Wang Z, Yuan X (2013) Dissimilarity sparsity-preserving projections in feature extraction for visual recognition. Appl Opt 52(20):5022–5029
Xiang F, Jian Z, Liang P, Xueqiang G (2018) Robust image fusion with block sparse representation and online dictionary learning. IET Image Proc 12(3):345–353
Dai W, Yang Q, Xue G, Yu Y (2007) Boosting for Transfer Learning. 2007. In: Proceedings of the 24th international conference on machine learning
Ziebart BD, Maas AL, Bagnell JA, Dey AK (2008) Maximum entropy inverse reinforcement learning. In: Aaai (vol 8, pp 1433–1438)
Lin JL, Hwang KS, Shi H, Pan W (2020) An ensemble method for inverse reinforcement learning. Inf Sci 512:518–532
Pan W, Qu R, Hwang KS, Lin HS (2019) An ensemble fuzzy approach for inverse reinforcement learning. Int J Fuzzy Syst 21(1):95–103
Konar A, Chakraborty IG, Singh SJ, Jain LC, Nagar AK (2013) A deterministic improved Q-learning for path planning of a mobile robot. IEEE Trans Syst Man Cybern Syst 43(5):1141–1153
Hwang KS, Lin JL, Yeh KH (2015) Learning to adjust and refine gait patterns for a biped robot. IEEE Trans Syst Man Cybern Syst 45(12):1481–1490
Doltsinis S, Ferreira P, Lohse N (2014) An MDP model-based reinforcement learning approach for production station ramp-up optimization: Q-learning analysis. IEEE Trans Syst Man Cybern Syst 44(9):1125–1138
Hwang KS, Jiang WC, Chen YJ, Hwang I (2017) Model learning for multistep backward prediction in dyna-\({Q}\) learning. IEEE Trans Syst Man Cybern Syst 48(9):1470–1481
Xie Z, Zhang Q, Jiang Z, Liu H (2020) Robot learning from demonstration for path planning: a review. Sci China Technol Sci 2020:1–10
Balian R (2004) Entropy, a protean concept. Progress Math Phys 38:119
IRIS (2017) Inverse reinforcement learning based on critical state demo. IRIS Lab. National Sun Yat-sen University, Kaohsiung, Taiwan. [Online]. https://www.youtube.com/watch?v=cMaOdoTt4Hw. Accessed 16 Nov 2015
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Detailed information of all authors’ receives research support is listing as: This study was funded by the Ministry of Science and Technology, Taiwan, under Grant MOST 109-2221-E-029-022-. No other author has reported a potential conflict of interest relevant to this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hwang, M., Jiang, WC. & Chen, YJ. A critical state identification approach to inverse reinforcement learning for autonomous systems. Int. J. Mach. Learn. & Cyber. 13, 1409–1423 (2022). https://doi.org/10.1007/s13042-021-01454-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-021-01454-x