A Framework for Computing Bounds for the Return of a Policy

Păduraru, Cosmin; Precup, Doina; Pineau, Joelle

doi:10.1007/978-3-642-29946-9_21

Cosmin Păduraru²¹,
Doina Precup²¹ &
Joelle Pineau²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7188))

Included in the following conference series:

European Workshop on Reinforcement Learning

2272 Accesses

Abstract

We present a framework for computing bounds for the return of a policy in finite-horizon, continuous-state Markov Decision Processes with bounded state transitions. The state transition bounds can be based on either prior knowledge alone, or on a combination of prior knowledge and data. Our framework uses a piecewise-constant representation of the return bounds and a backwards iteration process. We instantiate this framework for a previously investigated type of prior knowledge – namely, Lipschitz continuity of the transition function. In this context, we show that the existing bounds of Fonteneau et al. (2009, 2010) can be expressed as a particular instantiation of our framework, by bounding the immediate rewards using Lipschitz continuity and choosing a particular form for the regions in the piecewise-constant representation. We also show how different instantiations of our framework can improve upon their bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

Article 23 October 2018

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

Article 19 October 2019

On Generalized Bellman Equations and Temporal-Difference Learning

References

Brunskill, E., Leffler, B., Li, L., Littman, M., Roy, N.: CORL: A continuous-state offset-dynamics reinforcement learner. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, pp. 53–61 (2008)
Google Scholar
Delage, E., Mannor, S.: Percentile Optimization for Markov Decision Processes with Parameter Uncertainty. Operations Research 58(1), 203–213 (2009)
Article MathSciNet Google Scholar
Ermon, S., Conrad, J., Gomes, C., Selman, B.: Playing games against nature: optimal policies for renewable resource allocation. In: Proceedings of The 26th Conference on Uncertainty in Artificial Intelligence (2010)
Google Scholar
Fonteneau, R., Murphy, S., Wehenkel, L., Ernst, D.: Inferring bounds on the performance of a control policy from a sample of trajectories. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 117–123 (2009)
Google Scholar
Fonteneau, R., Murphy, S.A., Wehenkel, L., Ernst, D.: Towards Min Max Generalization in Reinforcement Learning. In: Filipe, J., Fred, A., Sharp, B. (eds.) ICAART 2010. CCIS, vol. 129, pp. 61–77. Springer, Heidelberg (2011)
Chapter Google Scholar
Kaelbling, L.P.: Learning in embedded systems. MIT Press (1993)
Google Scholar
Kakade, S., Kearns, M., Langford, J.: Exploration in Metric State Spaces. In: International Conference on Machine Learning, vol. 20, p. 306 (2003)
Google Scholar
Nilim, A., El Ghaoui, L.: Robust Control of Markov Decision Processes with Uncertain Transition Matrices. Operations Research 53(5), 780–798 (2005)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, McGill University, Montreal, Canada
Cosmin Păduraru, Doina Precup & Joelle Pineau

Authors

Cosmin Păduraru
View author publications
You can also search for this author in PubMed Google Scholar
Doina Precup
View author publications
You can also search for this author in PubMed Google Scholar
Joelle Pineau
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA and the Australian National University, 7 London Circuit, ACT 2601, Canberra, Australia
Scott Sanner
Research School of Computer Science, Australian National University, ACT 0200, Canberra, Australia
Marcus Hutter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Păduraru, C., Precup, D., Pineau, J. (2012). A Framework for Computing Bounds for the Return of a Policy. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-29946-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29945-2
Online ISBN: 978-3-642-29946-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Framework for Computing Bounds for the Return of a Policy

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

On Generalized Bellman Equations and Temporal-Difference Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A Framework for Computing Bounds for the Return of a Policy

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

On the Expected Total Reward with Unbounded Returns for Markov Decision Processes

Risk-sensitive continuous-time Markov decision processes with unbounded rates and Borel spaces

On Generalized Bellman Equations and Temporal-Difference Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation