Feature Reinforcement Learning in Practice | SpringerLink
Skip to main content

Feature Reinforcement Learning in Practice

  • Conference paper
Recent Advances in Reinforcement Learning (EWRL 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7188))

Included in the following conference series:

  • 2345 Accesses

Abstract

Following a recent surge in using history-based methods for resolving perceptual aliasing in reinforcement learning, we introduce an algorithm based on the feature reinforcement learning framework called ΦMDP [13]. To create a practical algorithm we devise a stochastic search procedure for a class of context trees based on parallel tempering and a specialized proposal distribution. We provide the first empirical evaluation for ΦMDP. Our proposed algorithm achieves superior performance to the classical U-tree algorithm [20] and the recent active-LZ algorithm [6], and is competitive with MC-AIXI-CTW [29] that maintains a bayesian mixture over all context trees up to a chosen depth. We are encouraged by our ability to compete with this sophisticated method using an algorithm that simply picks one single model, and uses Q-learning on the corresponding MDP. Our ΦMDP algorithm is simpler and consumes less time and memory. These results show promise for our future work on attacking more complex and larger problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Akaike, H.: A new look at the statistical model identification. IEEE Transactions on Automatic Control 19, 716–723 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Anthena Scientific, Belmont (1996)

    MATH  Google Scholar 

  3. Brafman, R.I., Tennenholz, M.: R-max -a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learing Research 3, 213–231 (2002)

    Google Scholar 

  4. Chrisman, L.: Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In: AAAI, pp. 183–188 (1992)

    Google Scholar 

  5. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Willey and Sons (1991)

    Google Scholar 

  6. Farias, V., Moallemi, C., Van Roy, B., Weissman, T.: Universal reinforcement learning. IEEE Transactions on Information Theory 56(5), 2441–2454 (2010)

    Article  Google Scholar 

  7. Geyer, C.J.: Markov chain Monte Calro maximum likelihood. In: Computing Science and Statistics: the 23rd Symposium on the Interface, pp. 156–163. Interface Foundation, Fairfax (1991)

    Google Scholar 

  8. Givan, R., Dean, T., Greig, M.: Equivalence notions and model minimization in Markov decision process. Artificial Intelligence 147, 163–223 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  9. Granville, V., Křivánek, M., Rasson, J.P.: Simulated annealing: A proof of convergence. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(6), 652–656 (1994)

    Article  Google Scholar 

  10. Grünwald, P.D.: The Minimum Description Length Principle. The MIT Press (2007)

    Google Scholar 

  11. Hukushima, K., Nemoto, K.: Exchange Monte Carlo method and application to spin glass simulations. Journal of the Physical Socieity of Japan 65(4), 1604–1608 (1996)

    Article  Google Scholar 

  12. Hutter, M.: Universal Articial Intelligence: Sequential Decisions based on Algorithmic Probability. Springer, Berlin (2005)

    Google Scholar 

  13. Hutter, M.: Feature reinforcement learning: Part I. Unstructured MDPs. Journal of General Artificial Intelligence (2009)

    Google Scholar 

  14. Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in paritally observable stochastic domains. Artifical Intelligence 101, 99–134 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  15. Kocsis, L., Szepesvári, C.: Bandit Based Monte-Carlo Planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  16. Li, L., Walsh, T.J., Littmans, M.L.: Towards a unified theory of state abstraction for MDPs. In: Proceedings of the 9th International Symposium on Artificial Intelligence and Mathematics (2006)

    Google Scholar 

  17. Liu, J.S.: Monte Carlo Strategies in Scientific Computing. Springer, Heidelberg (2001)

    MATH  Google Scholar 

  18. Madani, O., Handks, S., Condon: On the undecidability of probabilistic planning and related stochastic optimization problems. Artifical Intelligence 147, 5–34 (2003)

    Article  MATH  Google Scholar 

  19. Mahmud, M.M.H.: Constructing states for reinforcement learning. In: Fürnkranz, J., Joachims, T. (eds.) Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel, pp. 727–734 (June 2010), http://www.icml2010.org/papers/593.pdf

  20. McCallum, A.K.: Reinforcement Learning with Selective Perception and Hidden State. Ph.D. thesis, Department of Computer Science, University of Rochester (1996)

    Google Scholar 

  21. Nguyen, P., Sunehag, P., Hutter, M.: Feature refinrocement learning in practice. Tech. rep., Australian National University (2011)

    Google Scholar 

  22. Poland, J., Hutter, M.: Universal learning of repeated matrix games. In: Proc. 15th Annual Machine Learning Conf. of Belgium and The Netherlands (Benelearn 2006), pp. 7–14. Ghent (2006), http://arxiv.org/abs/cs.LG/0508073

  23. Rissanen, J.: A universal data compression system. IEEE Transactions on Information Theory 29(5), 656–663 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  24. Schneider, J., Kirkpatrick, S.: Stochastic Optimization, 1st edn. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  25. Singh, S.P., James, M.R., Rudary, M.R.: Predictive state representations: A new theory for modeling dynamical systems. In: Proceedings of the 20th Conference in Uncertainty in Artificial Intelligence, Banff, Canada, pp. 512–518 (2004)

    Google Scholar 

  26. Suman, B., Kumar, P.: A survey of simulated annealing as a tool for single and multiobjecctive optimization. Journal of the Operational Research Society 57, 1143–1160 (2006)

    Article  MATH  Google Scholar 

  27. Sunehag, P., Hutter, M.: Consistency of Feature Markov Processes. In: Hutter, M., Stephan, F., Vovk, V., Zeugmann, T. (eds.) ALT 2010. LNCS(LNAI), vol. 6331, pp. 360–374. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  28. Sutton, R., Barto, A.: Reinforcement Learning. The MIT Press (1998)

    Google Scholar 

  29. Veness, J., Ng, K.S., Hutter, M., Uther, W., Silver, D.: A Monte-Carlo AIXI approximation. Journal of Artifiicial Intelligence Research 40(1), 95–142 (2011)

    MathSciNet  MATH  Google Scholar 

  30. Vidal, E., Thollard, F., Higuera, C.D.L., Casacuberta, F., Carrasco, R.C.: Probabilitic finite-state machines. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(7), 1013–1025 (2005)

    Article  Google Scholar 

  31. Wallace, C.S.: Statistical and Inductive Inference by Minimum Message Length. Springer, Berlin (2005)

    MATH  Google Scholar 

  32. Wilems, F.M.J., Shtarkov, Y.M., Tjalkens, T.J.: The context tree weighting method: Basic properties. IEEE Transactions on Information Theory 41, 653–664 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nguyen, P., Sunehag, P., Hutter, M. (2012). Feature Reinforcement Learning in Practice. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29946-9_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29945-2

  • Online ISBN: 978-3-642-29946-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics