Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

Gadot, Uri; Derman, Esther; Kumar, Navdeep; Elfatihi, Maxence Mohamed; Levy, Kfir; Mannor, Shie

Computer Science > Machine Learning

arXiv:2309.01107 (cs)

[Submitted on 3 Sep 2023 (v1), last revised 12 Feb 2024 (this version, v2)]

Title:Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

Authors:Uri Gadot, Esther Derman, Navdeep Kumar, Maxence Mohamed Elfatihi, Kfir Levy, Shie Mannor

View PDF HTML (experimental)

Abstract:In robust Markov decision processes (RMDPs), it is assumed that the reward and the transition dynamics lie in a given uncertainty set. By targeting maximal return under the most adversarial model from that set, RMDPs address performance sensitivity to misspecified environments. Yet, to preserve computational tractability, the uncertainty set is traditionally independently structured for each state. This so-called rectangularity condition is solely motivated by computational concerns. As a result, it lacks a practical incentive and may lead to overly conservative behavior. In this work, we study coupled reward RMDPs where the transition kernel is fixed, but the reward function lies within an $\alpha$-radius from a nominal one. We draw a direct connection between this type of non-rectangular reward-RMDPs and applying policy visitation frequency regularization. We introduce a policy-gradient method and prove its convergence. Numerical experiments illustrate the learned policy's robustness and its less conservative behavior when compared to rectangular uncertainty.

Comments:	accepted in AAAI2024
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2309.01107 [cs.LG]
	(or arXiv:2309.01107v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2309.01107

Submission history

From: Uri Gadot [view email]
[v1] Sun, 3 Sep 2023 07:34:26 UTC (4,970 KB)
[v2] Mon, 12 Feb 2024 11:23:29 UTC (4,310 KB)

Computer Science > Machine Learning

Title:Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Solving Non-Rectangular Reward-Robust MDPs via Frequency Regularization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators