Abstract
We investigate the parallelization of reinforcement learning algorithms using MapReduce, a popular parallel computing framework. We present parallel versions of several dynamic programming algorithms, including policy evaluation, policy iteration, and off-policy updates. Furthermore, we design parallel reinforcement learning algorithms to deal with large scale problems using linear function approximation, including model-based projection, least squares policy iteration, temporal difference learning and recent gradient temporal difference learning algorithms. We give time and space complexity analysis of the proposed algorithms. This study demonstrates how parallelization opens new avenues for solving large scale reinforcement learning problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Massachusetts (1996)
Bertsekas, D.P., Tsitsiklis, J.N.: Parallel and Distributed Computation: Numerical Methods. Athena Scientific, Massachusetts (1997)
Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: Haloop: Efficient iterative data processing on large clusters. In: The 36th International Conference on Very Large Data Bases (VLDB 2010), Singapore (September 2010)
Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A., Olukotun, K.: Map-reduce for machine learning on multicore. In: Advances in Neural Information Processing Systems 19 (NIPS 2006), pp. 281–288 (December 2006)
Dean, J., Ghemawat, S.: Mapreduce: Simplied data processing on large clusters. In: OSDI 2004, San Francisco, USA, pp. 137–150 (December 2004)
Ekanayake, J., Li, H., Zhang, B., Gunarathne, T., Bae, S.-H., Qiu, J., Fox, G.: Twister: A runtime for iterative mapreduce. In: The First International Workshop on MapReduce and its Applications, Chicago, USA (June 2010)
Golub, G.H., Van Loan, C.F.: Matrix Computations. The Johns Hopkins University Press, Baltimore (1996)
Kung, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: A peta-scale graph mining system - implementation and observations. In: ICDM 2009, Miami, pp. 229–238 (December 2009)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. The Journal of Machine Learning Research 4, 1107–1149 (2003)
Lin, J., Dyer, C.: Data-Intensive Text Processing with MapReduce. Morgan & Claypool (2009)
Liu, C., Yang, H.-C., Fan, J., He, L.-W., Wang, Y.-M.: Distributed nonnegative matrix factorization for web-scale dyadic data analysis on mapreduce. In: Proceedings of the 19th International World Wide Web Conference (WWW 2010), Raleigh, North Carolina, USA, April 26-30, pp. 681–690 (2010)
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.: Graphlab: A new parallel framework for machine learning. In: Uncertainty in Artificial Intelligence (UAI), Catalina Island, USA (July 2010)
Maei, H.R., Sutton, R.S.: GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In: Proceedings of the Third Conference on Artificial General Intelligence, Lugano, Switzerland (2010)
Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, New York (1994)
Silver, D., Sutton, R.S., Muller, M.: Reinforcement learning of local shape in the game of Go. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), Hyderabad, India, pp. 1053–1058 (January 2007)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvari, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of ICML 2009, Montreal, Canada, pp. 993–1000 (June 2009)
Szepesvari, C.: Algorithms for Reinforcement Learning. Morgan Kaufmann & Claypool (2010)
Tsitsiklis, J.N., Van Roy, B.: Regression methods for pricing complex American-style options. IEEE Transactions on Neural Networks (Special Issue on Computational Finance) 12(4), 694–703 (2001)
White, T.: Hadoop: The Definitive Guide. O’Reilly (2009)
Zinkevich, M., Weimer, M., Smola, A., Li, L.: Parallelized stochastic gradient descent. In: Proceedings of Advances in Neural Information Processing Systems 24 (NIPS 2010), Vancouver, Canada, pp. 2217–2225 (December 2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Li, Y., Schuurmans, D. (2012). MapReduce for Parallel Reinforcement Learning. In: Sanner, S., Hutter, M. (eds) Recent Advances in Reinforcement Learning. EWRL 2011. Lecture Notes in Computer Science(), vol 7188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29946-9_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-29946-9_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29945-2
Online ISBN: 978-3-642-29946-9
eBook Packages: Computer ScienceComputer Science (R0)