Switching between Representations in Reinforcement Learning

van Seijen, Harm; Whiteson, Shimon; Kester, Leon

doi:10.1007/978-3-642-11688-9_3

Harm van Seijen⁴,
Shimon Whiteson⁵ &
Leon Kester⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 281))

1556 Accesses

Abstract

This chapter presents and evaluates an online representation selection method for factored Markov decision processes (MDPs). The method addresses a special case of the feature selection problem that only considers certain subsets of features, which we call candidate representations. A motivation for the method is that it can potentially deal with problems where other structure learning algorithms are infeasible due to a large degree of the associated dynamic Bayesian network. Our method uses switch actions to select a representation and uses off-policy updating to improve the policy of representations that were not selected. We demonstrate the validity of the method by showing for a contextual bandit task and a regular MDP that given a feature set containing only a single relevant feature, we can find this feature very efficiently using the switch method. We also show for a contextual bandit task that switching between a set of relevant features and a subset of these features can outperform each of the individual representations. The reason for this is that the switch method combines the fast performance increase of the small representation with the high asymptotic performance of the large representation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 22879; Price includes VAT (Japan)

Softcover Book: JPY 28599; Price includes VAT (Japan)

Hardcover Book: JPY 28599; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

From Reinforcement Learning to Optimal Control: A Unified Framework for Sequential Decisions

Model Selection in Reinforcement Learning with General Function Approximations

Algorithmic Foundations of Reinforcement Learning

References

Abbeel, P., Koller, D., Ng, A.: Learning factor graphs in polynomial time and sample complexity. The Journal of Machine Learning Research 7, 1743–1788 (2006)
MathSciNet Google Scholar
Bellman, R.E.: A Markov decision process. Journal of Mathematical Mechanics 6, 679–684 (1957)
MATH Google Scholar
Boutilier, C., Dearden, R., Goldszmidt, M.: Exploiting structure in policy construction. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 1104–1113 (1995)
Google Scholar
Diuk, C., Li, L., Leffler, B.: The adaptive k-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. In: Proceedings of the 26th Annual International Conference on Machine Learning. ACM, New York (2009)
Google Scholar
Guestrin, C., Koller, D., Parr, R., Venkataraman, S.: Efficient solution algorithms for factored mdps. Journal of Artificial Intelligence Research 19, 399–468 (2003)
MATH MathSciNet Google Scholar
Hoey, J., St-Aubin, R., Hu, A., Boutilier, C.: Spudd: Stochastic planning using decision diagrams. In: Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 279–288. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)
Google Scholar
Kearns, M., Koller, D.: Efficient reinforcement learning in factored mdps. In: International Joint Conference on Artificial Intelligence, vol. 16, pp. 740–747 (1999)
Google Scholar
Li, L., Littman, M., Walsh, T.: Knows what it knows: a framework for self-aware learning. In: Proceedings of the 25th international conference on Machine learning, pp. 568–575. ACM, New York (2008)
Chapter Google Scholar
Siegmund, D.: Importance sampling in the monte carlo study of sequential tests. Annals of Statistics 4, 673–684 (1976)
Article MATH MathSciNet Google Scholar
St-Aubin, R., Hoey, J., Boutilier, C.: Apricodd: Approximate policy construction using decision diagrams. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1089–1095. MIT Press, Cambridge (2000)
Google Scholar
Strehl, A., Diuk, C., Littman, M.: Efficient structure learning in factored-state mdps. In: Proceedings of the Twenty-Second National Conference on Artificial Intelligence, vol. 22, p. 645. AAAI Press/MIT Press, Menlo Park/Cambridge (2007)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar

Download references

Author information

Authors and Affiliations

TNO Defence, Security and Safety, Oude Waalsdorperweg 63, 2597, AK, The Hague, The Netherlands
Harm van Seijen & Leon Kester
University of Amsterdam, Science Park 107, 1098, XG, Amsterdam, The Netherlands
Shimon Whiteson

Authors

Harm van Seijen
View author publications
You can also search for this author in PubMed Google Scholar
Shimon Whiteson
View author publications
You can also search for this author in PubMed Google Scholar
Leon Kester
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Delft Center for Systems and Control, Delft University of Technology, Mekelweg 2, 2628, Delft, CD, The Netherlands
Robert Babuška
Faculty of Science, Informatics Institute, Science Park 107, 1098, Amsterdam, XG, The Netherlands
Frans C. A. Groen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

van Seijen, H., Whiteson, S., Kester, L. (2010). Switching between Representations in Reinforcement Learning. In: Babuška, R., Groen, F.C.A. (eds) Interactive Collaborative Information Systems. Studies in Computational Intelligence, vol 281. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-11688-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-11688-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-11687-2
Online ISBN: 978-3-642-11688-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Switching between Representations in Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

From Reinforcement Learning to Optimal Control: A Unified Framework for Sequential Decisions

Model Selection in Reinforcement Learning with General Function Approximations

Algorithmic Foundations of Reinforcement Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Switching between Representations in Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

From Reinforcement Learning to Optimal Control: A Unified Framework for Sequential Decisions

Model Selection in Reinforcement Learning with General Function Approximations

Algorithmic Foundations of Reinforcement Learning

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation