Abstract
We present a framework for computing bounds for the return of a policy in finite-horizon, continuous-state Markov Decision Processes with bounded state transitions. The state transition bounds can be based on either prior knowledge alone, or on a combination of prior knowledge and data. Our framework uses a piecewise-constant representation of the return bounds and a backwards iteration process. We instantiate this framework for a previously investigated type of prior knowledge – namely, Lipschitz continuity of the transition function. In this context, we show that the existing bounds of Fonteneau et al. (2009, 2010) can be expressed as a particular instantiation of our framework, by bounding the immediate rewards using Lipschitz continuity and choosing a particular form for the regions in the piecewise-constant representation. We also show how different instantiations of our framework can improve upon their bounds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brunskill, E., Leffler, B., Li, L., Littman, M., Roy, N.: CORL: A continuous-state offset-dynamics reinforcement learner. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence, pp. 53–61 (2008)
Delage, E., Mannor, S.: Percentile Optimization for Markov Decision Processes with Parameter Uncertainty. Operations Research 58(1), 203–213 (2009)
Ermon, S., Conrad, J., Gomes, C., Selman, B.: Playing games against nature: optimal policies for renewable resource allocation. In: Proceedings of The 26th Conference on Uncertainty in Artificial Intelligence (2010)
Fonteneau, R., Murphy, S., Wehenkel, L., Ernst, D.: Inferring bounds on the performance of a control policy from a sample of trajectories. In: IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 117–123 (2009)
Fonteneau, R., Murphy, S.A., Wehenkel, L., Ernst, D.: Towards Min Max Generalization in Reinforcement Learning. In: Filipe, J., Fred, A., Sharp, B. (eds.) ICAART 2010. CCIS, vol. 129, pp. 61–77. Springer, Heidelberg (2011)
Kaelbling, L.P.: Learning in embedded systems. MIT Press (1993)
Kakade, S., Kearns, M., Langford, J.: Exploration in Metric State Spaces. In: International Conference on Machine Learning, vol. 20, p. 306 (2003)
Nilim, A., El Ghaoui, L.: Robust Control of Markov Decision Processes with Uncertain Transition Matrices. Operations Research 53(5), 780–798 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Păduraru, C., Precup, D., Pineau, J. (2012). A Framework for Computing Bounds for the Return of a Policy. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-29946-9_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29945-2
Online ISBN: 978-3-642-29946-9
eBook Packages: Computer ScienceComputer Science (R0)