Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Hau, Jia Lin; Delage, Erick; Derman, Esther; Ghavamzadeh, Mohammad; Petrik, Marek

Computer Science > Machine Learning

arXiv:2410.24128 (cs)

[Submitted on 31 Oct 2024]

Title:Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Authors:Jia Lin Hau, Erick Delage, Esther Derman, Mohammad Ghavamzadeh, Marek Petrik

View PDF HTML (experimental)

Abstract:In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2410.24128 [cs.LG]
	(or arXiv:2410.24128v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2410.24128

Submission history

From: Marek Petrik [view email]
[v1] Thu, 31 Oct 2024 16:53:20 UTC (18,330 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.LG

< prev | next >

new | recent | 2024-10

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Machine Learning

Title:Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators