Imperative Action Masking for Safe Exploration in Reinforcement Learning

Dey, Sumanta; Bhat, Sharat; Dasgupta, Pallab; Dey, Soumyajit

doi:10.1007/978-3-031-40878-6_8

Sumanta Dey¹⁵,
Sharat Bhat¹⁵,
Pallab Dasgupta¹⁵ &
…
Soumyajit Dey¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14127))

Included in the following conference series:

International Workshop on Explainable, Transparent Autonomous Agents and Multi-Agent Systems

464 Accesses

Abstract

Reinforcement Learning (RL) needs sufficient exploration to learn an optimal policy. However, exploratory actions could lead the learning agent to safety hazards, not necessarily in the next state but in the future. Therefore, it is essential to evaluate each action beforehand to ensure safety. The exploratory actions and the actions proposed by the RL agent could also be unsafe during training and in the deployment phase. In this work, we have proposed the Imperative Action Masking Framework, a Graph-Plan-based method considering a finite and small look ahead to assess the safety of actions from the current state. This information is used to construct action masks on the run, filtering out the unsafe actions proposed by the RL agent (including the exploitative ones). The Graph-Plan-based method makes our framework interpretable, while the finite and small look ahead makes the proposed method scalable for larger environments. However, considering the finite and small look ahead comes with a cost of overlooking safety beyond the look ahead. We have done a comparative study against the probabilistic safety shield in Pacman and Warehouse environments approach. Our framework has produced better results in terms of both safety and reward.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 6634; Price includes VAT (Japan)

Softcover Book: JPY 8293; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Online shielding for reinforcement learning

Article Open access 23 September 2022

Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning

Article 08 February 2019

Safe Offline Reinforcement Learning Through Hierarchical Policies

Notes

1.
https://github.com/sumantasunny/ImperativeActionMasking.git.

References

Abbeel, P., Coates, A., Ng, A.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Robot. Res. 29, 1608–1639 (2010)
Article Google Scholar
Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: International Conference on Machine Learning (2017)
Google Scholar
Alshiekh, M., Bloem, R., Ehlers, R., Könighofer, B., Niekum, S., Topcu, U.: Safe reinforcement learning via shielding. In: AAAI Conference on Artificial Intelligence (2017)
Google Scholar
Amodei, D., Olah, C., Steinhardt, J., Christiano, P.F., Schulman, J., Mané, D.: Concrete problems in AI safety. arXiv abs/1606.06565 (2016)
Berkeley, U.: UC Berkeley CS188 Intro to AI reinforcement learning. http://ai.berkeley.edu/reinforcement.html Accessed 14 Jun 2023
Berkenkamp, F., Moriconi, R., Schoellig, A.P., Krause, A.: Safe learning of regions of attraction for uncertain, nonlinear systems with gaussian processes. In: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 4661–4666 (2016)
Google Scholar
Bharadhwaj, H., Kumar, A., Rhinehart, N., Levine, S., Shkurti, F., Garg, A.: Conservative safety critics for exploration. arXiv abs/2010.14497 (2020)
Bit-Monnot, A., Leofante, F., Pulina, L., Ábrahám, E., Tacchella, A.: Smartplan: a task planner for smart factories. arXiv abs/1806.07135 (2018)
Blum, A., Furst, M.L.: Fast planning through planning graph analysis. In: International Joint Conference on Artificial Intelligence (1995)
Google Scholar
Chow, Y., Nachum, O., Duéñez-Guzmán, E.A., Ghavamzadeh, M.: A lyapunov-based approach to safe reinforcement learning. In: Neural Information Processing Systems (2018)
Google Scholar
Dey, S., Dasgupta, P., Dey, S.: Safe reinforcement learning through phasic safety oriented policy optimization. In: SafeAI@AAAI Conference on Artificial Intelligence (2023)
Google Scholar
Dey, S., Mujumdar, A., Dasgupta, P., Dey, S.: Adaptive safety shields for reinforcement learning-based cell shaping. IEEE Trans. Netw. Serv. Manage. 19, 5034–5043 (2022)
Article Google Scholar
Feghhi, S., Aumayr, E., Vannella, F., Hakim, E.A., Iakovidis, G.: Safe reinforcement learning for antenna tilt optimisation using shielding and multiple baselines. arXiv abs/2012.01296 (2020)
Fulton, N., Platzer, A.: Safe reinforcement learning via formal methods: Toward safe control through proof and learning. In: AAAI Conference on Artificial Intelligence (2018)
Google Scholar
García, J., Fernández, F.: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16, 1437–1480 (2015)
MathSciNet MATH Google Scholar
Jansen, N., Könighofer, B., Junges, S., Serban, A.C., Bloem, R.: Safe reinforcement learning using probabilistic shields (invited paper). In: International Conference on Concurrency Theory (2020)
Google Scholar
Nikou, A., Mujumdar, A., Orlic, M., Feljan, A.V.: Symbolic reinforcement learning for safe ran control. In: Adaptive Agents and Multi-Agent Systems (2021)
Google Scholar
Perkins, T.J., Barto, A.G.: Lyapunov design for safe reinforcement learning. J. Mach. Learn. Res. 3(null), 803–832 (mar 2003)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction. MIT press (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Technology Kharagpur, Kharagpur, India
Sumanta Dey, Sharat Bhat, Pallab Dasgupta & Soumyajit Dey

Authors

Sumanta Dey
View author publications
You can also search for this author in PubMed Google Scholar
Sharat Bhat
View author publications
You can also search for this author in PubMed Google Scholar
Pallab Dasgupta
View author publications
You can also search for this author in PubMed Google Scholar
Soumyajit Dey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sumanta Dey .

Editor information

Editors and Affiliations

University of Applied Sciences and Arts Western Switzerland, Sierre, Switzerland
Davide Calvaresi
Luxembourg Institute of Science and Technology, Esch-sur-Alzette, Luxembourg
Amro Najjar
Alma Mater Studiorum, Università di Bologna, Bologna, Italy
Andrea Omicini
Ozyegin University, Istanbul, Türkiye
Reyhan Aydogan
Alma Mater Studiorum, Università di Bologna, Bologna, Italy
Rachele Carli
Alma Mater Studiorum, Università di Bologna, Bologna, Italy
Giovanni Ciatto
Université de Technologie de Belfort-Montbéliard, Belfort Cedex, France
Yazan Mualla
Umeå University, Umeå, Sweden
Kary Främling

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dey, S., Bhat, S., Dasgupta, P., Dey, S. (2023). Imperative Action Masking for Safe Exploration in Reinforcement Learning. In: Calvaresi, D., et al. Explainable and Transparent AI and Multi-Agent Systems. EXTRAAMAS 2023. Lecture Notes in Computer Science(), vol 14127. Springer, Cham. https://doi.org/10.1007/978-3-031-40878-6_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-40878-6_8
Published: 05 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40877-9
Online ISBN: 978-3-031-40878-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Imperative Action Masking for Safe Exploration in Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Online shielding for reinforcement learning

Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning

Safe Offline Reinforcement Learning Through Hierarchical Policies

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Imperative Action Masking for Safe Exploration in Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Online shielding for reinforcement learning

Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning

Safe Offline Reinforcement Learning Through Hierarchical Policies

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation