Abstract
This chapter presents and evaluates an online representation selection method for factored Markov decision processes (MDPs). The method addresses a special case of the feature selection problem that only considers certain subsets of features, which we call candidate representations. A motivation for the method is that it can potentially deal with problems where other structure learning algorithms are infeasible due to a large degree of the associated dynamic Bayesian network. Our method uses switch actions to select a representation and uses off-policy updating to improve the policy of representations that were not selected. We demonstrate the validity of the method by showing for a contextual bandit task and a regular MDP that given a feature set containing only a single relevant feature, we can find this feature very efficiently using the switch method. We also show for a contextual bandit task that switching between a set of relevant features and a subset of these features can outperform each of the individual representations. The reason for this is that the switch method combines the fast performance increase of the small representation with the high asymptotic performance of the large representation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abbeel, P., Koller, D., Ng, A.: Learning factor graphs in polynomial time and sample complexity. The Journal of Machine Learning Research 7, 1743–1788 (2006)
Bellman, R.E.: A Markov decision process. Journal of Mathematical Mechanics 6, 679–684 (1957)
Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 1104–1113 (1995)
Diuk, C., Li, L., Leffler, B.: The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM, New York (2009)
Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored mdps. Journal of Artificial Intelligence Research 19, 399–468 (2003)
Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: Spudd: Stochastic planning using decision diagrams. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 279–288. Morgan Kaufmann, San Francisco (1999)
Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Kearns, M., Koller, D.: Efficient reinforcement learning in factored mdps. In: International Joint Conference on Artificial Intelligence, vol. 16, pp. 740–747 (1999)
Li, L., Littman, M., Walsh, T.: Knows what it knows: a framework for self-aware learning. In: Proceedings of the 25th international conference on Machine learning, pp. 568–575. ACM, New York (2008)
Siegmund, D.: Importance sampling in the monte carlo study of sequential tests. Annals of Statistics 4, 673–684 (1976)
St-Aubin, R., Hoey, J., Boutilier, C.: Apricodd: Approximate policy construction using decision diagrams. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1089–1095. MIT Press, Cambridge (2000)
Strehl, A., Diuk, C., Littman, M.: Efficient structure learning in factored-state mdps. In: Proceedings of the Twenty-Second National Conference on Artificial Intelligence, vol. 22, p. 645. AAAI Press/MIT Press, Menlo Park/Cambridge (2007)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
van Seijen, H., Whiteson, S., Kester, L. (2010). Switching between Representations in Reinforcement Learning. In: Babuška, R., Groen, F.C.A. (eds) Interactive Collaborative Information Systems. Studies in Computational Intelligence, vol 281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11688-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-11688-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11687-2
Online ISBN: 978-3-642-11688-9
eBook Packages: EngineeringEngineering (R0)