MapReduce for Parallel Reinforcement Learning

Li, Yuxi; Schuurmans, Dale

doi:10.1007/978-3-642-29946-9_30

Yuxi Li²¹ &
Dale Schuurmans²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7188))

Included in the following conference series:

European Workshop on Reinforcement Learning

2452 Accesses
11 Citations

Abstract

We investigate the parallelization of reinforcement learning algorithms using MapReduce, a popular parallel computing framework. We present parallel versions of several dynamic programming algorithms, including policy evaluation, policy iteration, and off-policy updates. Furthermore, we design parallel reinforcement learning algorithms to deal with large scale problems using linear function approximation, including model-based projection, least squares policy iteration, temporal difference learning and recent gradient temporal difference learning algorithms. We give time and space complexity analysis of the proposed algorithms. This study demonstrates how parallelization opens new avenues for solving large scale reinforcement learning problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Parallel Computing

RAMBO: Resource-Aware Model-Based Optimization with Scheduling for Heterogeneous Runtimes and a Comparison with Asynchronous Model-Based Optimization

Distributed Methods for Reinforcement Learning Survey

References

Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Massachusetts (1996)
MATH Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Athena Scientific, Massachusetts (1997)
Google Scholar
Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: Haloop: Efficient iterative data processing on large clusters. In: The 36th International Conference on Very Large Data Bases (VLDB 2010), Singapore (September 2010)
Google Scholar
Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A., Olukotun, K.: Map-reduce for machine learning on multicore. In: Advances in Neural Information Processing Systems 19 (NIPS 2006), pp. 281–288 (December 2006)
Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: Simplied data processing on large clusters. In: OSDI 2004, San Francisco, USA, pp. 137–150 (December 2004)
Google Scholar
Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J., Fox, G.: Twister: A runtime for iterative mapreduce. In: The First International Workshop on MapReduce and its Applications, Chicago, USA (June 2010)
Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Kung, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: A peta-scale graph mining system - implementation and observations. In: ICDM 2009, Miami, pp. 229–238 (December 2009)
Google Scholar
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. The Journal of Machine Learning Research 4, 1107–1149 (2003)
MathSciNet Google Scholar
Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce. Morgan & Claypool (2009)
Google Scholar
Liu, C., Yang, H.-C., Fan, J., He, L.-W., Wang, Y.-M.: Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In: Proceedings of the 19th International World Wide Web Conference (WWW 2010), Raleigh, North Carolina, USA, April 26-30, pp. 681–690 (2010)
Google Scholar
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: A new parallel framework for machine learning. In: Uncertainty in Artificial Intelligence (UAI), Catalina Island, USA (July 2010)
Google Scholar
Maei, H.R., Sutton, R.S.: GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In: Proceedings of the Third Conference on Artificial General Intelligence, Lugano, Switzerland (2010)
Google Scholar
Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, New York (1994)
MATH Google Scholar
Silver, D., Sutton, R.S., Muller, M.: Reinforcement learning of local shape in the game of Go. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, pp. 1053–1058 (January 2007)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Google Scholar
Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvari, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of ICML 2009, Montreal, Canada, pp. 993–1000 (June 2009)
Google Scholar
Szepesvari, C.: Algorithms for Reinforcement Learning. Morgan Kaufmann & Claypool (2010)
Google Scholar
Tsitsiklis, J.N., Van Roy, B.: Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks (Special Issue on Computational Finance) 12(4), 694–703 (2001)
Google Scholar
White, T.: Hadoop: The Definitive Guide. O’Reilly (2009)
Google Scholar
Zinkevich, M., Weimer, M., Smola, A., Li, L.: Parallelized stochastic gradient descent. In: Proceedings of Advances in Neural Information Processing Systems 24 (NIPS 2010), Vancouver, Canada, pp. 2217–2225 (December 2010)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer Science and Engineering, Univ. of Electronic Science and Technology of China, Chengdu, China
Yuxi Li
Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada
Dale Schuurmans

Authors

Yuxi Li
View author publications
You can also search for this author in PubMed Google Scholar
Dale Schuurmans
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA and the Australian National University, 7 London Circuit, ACT 2601, Canberra, Australia
Scott Sanner
Research School of Computer Science, Australian National University, ACT 0200, Canberra, Australia
Marcus Hutter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, Y., Schuurmans, D. (2012). MapReduce for Parallel Reinforcement Learning. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_30

Download citation

DOI: https://doi.org/10.1007/978-3-642-29946-9_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29945-2
Online ISBN: 978-3-642-29946-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MapReduce for Parallel Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Parallel Computing

RAMBO: Resource-Aware Model-Based Optimization with Scheduling for Heterogeneous Runtimes and a Comparison with Asynchronous Model-Based Optimization

Distributed Methods for Reinforcement Learning Survey

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

MapReduce for Parallel Reinforcement Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Parallel Computing

RAMBO: Resource-Aware Model-Based Optimization with Scheduling for Heterogeneous Runtimes and a Comparison with Asynchronous Model-Based Optimization

Distributed Methods for Reinforcement Learning Survey

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation