A critical state identification approach to inverse reinforcement learning for autonomous systems | International Journal of Machine Learning and Cybernetics Skip to main content
Log in

A critical state identification approach to inverse reinforcement learning for autonomous systems

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Inverse reinforcement learning features a reward function based on reward features and positive demonstrations. When complex learning tasks are performed, the entire state space is used to form the set of reward features, but this large set results in a long computational time. Retrieving important states from the full state space addresses this problem. This study formulated a method that extracts critical features by combining negative and positive demonstrations of searching for critical states from the entire state space to increase learning efficiency. In this method, two types of demonstrations are used: positive demonstrations, which are given by experts and agents imitate, and negative demonstrations, which demonstrate incorrect motions to be avoided by agents. All significant features are extracted by identifying the critical states over the entire state space. This is achieved by comparing the difference between the negative and positive demonstrations. When these critical states are identified, they form the set of reward features, and a reward function is derived that enables agents to learn a policy using reinforcement learning quickly. A speeding car simulation was used to verify the proposed method. The simulation results demonstrate that the proposed approach allows an agent to search for a positive strategy and that the agent then displays intelligent expert-like behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Cambridge

    MATH  Google Scholar 

  2. Shi H, Sun G, Wang Y, Hwang KS (2018) Adaptive image-based visual servoing with temporary loss of the visual signal. IEEE Trans Industr Inf 15(4):1956–1965

    Article  Google Scholar 

  3. Hwang KS, Jiang WC, Chen YJ (2014) Model learning and knowledge sharing for a multiagent system with Dyna-Q learning. IEEE Trans Cybern 45(5):978–990

    Article  Google Scholar 

  4. Shi H, Li X, Hwang KS, Pan W, Xu G (2016) Decoupled visual servoing with fuzzy Q-learning. IEEE Trans Industr Inf 14(1):241–252

    Article  Google Scholar 

  5. Liu B, Singh S, Lewis RL, Qin S (2014) Optimal rewards for cooperative agents. IEEE Trans Auton Ment Dev 6(4):286–297

    Article  Google Scholar 

  6. Abbeel P, Dolgov D, Ng AY, Thrun S (2008) Apprenticeship learning for motion planning with application to parking lot navigation. In: 2008 IEEE/RSJ international conference on intelligent robots and systems, IEEE (pp 1083–1090)

  7. Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on Machine learning, p 1

  8. Hwang M, Jiang WC, Chen YJ, Hwang KS, Tseng YC (2019) An efficient unified approach using demonstrations for inverse reinforcement learning. IEEE Trans Cogn Develop Syst. https://doi.org/10.1109/TCDS.2019.2957831

    Article  Google Scholar 

  9. Michini B, Walsh TJ, Agha-Mohammadi AA, How JP (2015) Bayesian nonparametric reward learning from demonstration. IEEE Trans Rob 31(2):369–386

    Article  Google Scholar 

  10. Choi J, Kim KE (2014) Hierarchical bayesian inverse reinforcement learning. IEEE Trans Cybern 45(4):793–805

    Article  Google Scholar 

  11. Daskalakis C, Foster DJ, Golowich N (2021) Independent policy gradient methods for competitive reinforcement learning. arXiv:2101.04233

  12. Moerland TM, Broekens J, Jonker CM (2020) Model-based reinforcement learning: a survey. arXiv:2006.16712

  13. Kiran BR, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Pérez P (2021) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Intell Transp Syst 2021:1

    Article  Google Scholar 

  14. Haydari A, Yilmaz Y (2020) Deep reinforcement learning for intelligent transportation systems: a survey. IEEE Trans Intell Transp Syst 2020:2

    Google Scholar 

  15. Levine S, Popovic Z, Koltun V (2010) Feature construction for inverse reinforcement learning. In: NIPS, vol 23, p 1342

  16. Abbeel P, Coates A, Ng AY (2010) Autonomous helicopter aerobatics through apprenticeship learning. Int J Rob Res 29(13):1608–1639

    Article  Google Scholar 

  17. Arora S, Doshi P (2021) A survey of inverse reinforcement learning: challenges, methods and progress. Artif Intell 2021:103500

    Article  MathSciNet  Google Scholar 

  18. Tang J, Singh A, Goehausen N, Abbeel P (2010) Parameterized maneuver learning for autonomous helicopter flight. In 2010 IEEE international conference on robotics and automation (pp 1142–1148), IEEE

  19. Grollman DH, Billard A (2011) Donut as i do: learning from failed demonstrations. In 2011 IEEE international conference on robotics and automation (pp 3804–3809), IEEE

  20. Zhang T, Liu Y, Hwang M, Hwang KS, Ma C, Cheng J (2020) An end-to-end inverse reinforcement learning by a boosting approach with relative entropy. Inf Sci 520:1–14

    Article  Google Scholar 

  21. Lopes M, Melo F, Montesano L (2009) Active learning for reward estimation in inverse reinforcement learning. In: Joint European conference on machine learning and knowledge discovery in databases (pp 31–46). Springer, Berlin, Heidelberg

  22. Kolter JZ, Abbeel P, Ng AY (2008) Hierarchical apprenticeship learning with application to quadruped locomotion. In: Advances in neural information processing systems (pp 769–776)

  23. Ng AY, Russell SJ (2000) Algorithms for inverse reinforcement learning. In: Icml (Vol 1, p2)

  24. Schapire RE (1999) A brief introduction to boosting. In: Ijcai (Vol 99, pp 1401–1406)

  25. Pflueger M, Agha A, Sukhatme GS (2019) Rover-IRL: inverse reinforcement learning with soft value iteration networks for planetary rover path planning. IEEE Rob Autom Lett 4(2):1387–1394

    Article  Google Scholar 

  26. Zeng Y, Xu K, Qin L, Yin Q (2020) A semi-Markov decision model with inverse reinforcement learning for recognizing the destination of a maneuvering agent in real time strategy games. IEEE Access 8:15392–15409

    Article  Google Scholar 

  27. Pelusi D, Mascella R (2013) Optimal control Algorithms for second order Systems. J Comput Sci 9(2):183–197

    Article  Google Scholar 

  28. Roman RC, Precup RE, Petriu EM (2021) Hybrid data-driven fuzzy active disturbance rejection control for tower crane systems. Eur J Control 58:373–387

    Article  MathSciNet  Google Scholar 

  29. Turnip A, Panggabean JH (2020) Hybrid controller design based magneto-rheological damper lookup table for quarter car suspension. Int J Artif Intell 18(1):193–206

    Google Scholar 

  30. Xue W, Kolaric P, Fan J, Lian B, Chai T, Lewis FL (2021) Inverse reinforcement learning in tracking control based on inverse optimal control. IEEE Trans Cybern 2021:5

    Google Scholar 

  31. Dvijotham K, Todorov E (2010) Inverse optimal control with linearly-solvable MDPs. In: ICML

  32. Xiang F, Wang Z, Yuan X (2013) Dissimilarity sparsity-preserving projections in feature extraction for visual recognition. Appl Opt 52(20):5022–5029

    Article  Google Scholar 

  33. Xiang F, Jian Z, Liang P, Xueqiang G (2018) Robust image fusion with block sparse representation and online dictionary learning. IET Image Proc 12(3):345–353

    Article  Google Scholar 

  34. Dai W, Yang Q, Xue G, Yu Y (2007) Boosting for Transfer Learning. 2007. In: Proceedings of the 24th international conference on machine learning

  35. Ziebart BD, Maas AL, Bagnell JA, Dey AK (2008) Maximum entropy inverse reinforcement learning. In: Aaai (vol 8, pp 1433–1438)

  36. Lin JL, Hwang KS, Shi H, Pan W (2020) An ensemble method for inverse reinforcement learning. Inf Sci 512:518–532

    Article  Google Scholar 

  37. Pan W, Qu R, Hwang KS, Lin HS (2019) An ensemble fuzzy approach for inverse reinforcement learning. Int J Fuzzy Syst 21(1):95–103

    Article  MathSciNet  Google Scholar 

  38. Konar A, Chakraborty IG, Singh SJ, Jain LC, Nagar AK (2013) A deterministic improved Q-learning for path planning of a mobile robot. IEEE Trans Syst Man Cybern Syst 43(5):1141–1153

    Article  Google Scholar 

  39. Hwang KS, Lin JL, Yeh KH (2015) Learning to adjust and refine gait patterns for a biped robot. IEEE Trans Syst Man Cybern Syst 45(12):1481–1490

    Article  Google Scholar 

  40. Doltsinis S, Ferreira P, Lohse N (2014) An MDP model-based reinforcement learning approach for production station ramp-up optimization: Q-learning analysis. IEEE Trans Syst Man Cybern Syst 44(9):1125–1138

    Article  Google Scholar 

  41. Hwang KS, Jiang WC, Chen YJ, Hwang I (2017) Model learning for multistep backward prediction in dyna-\({Q}\) learning. IEEE Trans Syst Man Cybern Syst 48(9):1470–1481

    Article  Google Scholar 

  42. Xie Z, Zhang Q, Jiang Z, Liu H (2020) Robot learning from demonstration for path planning: a review. Sci China Technol Sci 2020:1–10

    Google Scholar 

  43. Balian R (2004) Entropy, a protean concept. Progress Math Phys 38:119

    MathSciNet  Google Scholar 

  44. IRIS (2017) Inverse reinforcement learning based on critical state demo. IRIS Lab. National Sun Yat-sen University, Kaohsiung, Taiwan. [Online]. https://www.youtube.com/watch?v=cMaOdoTt4Hw. Accessed 16 Nov 2015

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei-Cheng Jiang.

Ethics declarations

Conflict of interest

Detailed information of all authors’ receives research support is listing as: This study was funded by the Ministry of Science and Technology, Taiwan, under Grant MOST 109-2221-E-029-022-. No other author has reported a potential conflict of interest relevant to this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hwang, M., Jiang, WC. & Chen, YJ. A critical state identification approach to inverse reinforcement learning for autonomous systems. Int. J. Mach. Learn. & Cyber. 13, 1409–1423 (2022). https://doi.org/10.1007/s13042-021-01454-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01454-x

Keywords

Navigation