Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces | IGI Global Scientific Publishing
Reference Hub19
Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces

Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces

Daniel Hein, Alexander Hentschel, Thomas A. Runkler, Steffen Udluft
Copyright: © 2016 |Volume: 7 |Issue: 3 |Pages: 20
ISSN: 1947-9263|EISSN: 1947-9271|EISBN13: 9781466691582|DOI: 10.4018/IJSIR.2016070102
Cite Article Cite Article

MLA

Hein, Daniel, et al. "Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces." IJSIR vol.7, no.3 2016: pp.23-42. https://doi.org/10.4018/IJSIR.2016070102

APA

Hein, D., Hentschel, A., Runkler, T. A., & Udluft, S. (2016). Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces. International Journal of Swarm Intelligence Research (IJSIR), 7(3), 23-42. https://doi.org/10.4018/IJSIR.2016070102

Chicago

Hein, Daniel, et al. "Reinforcement Learning with Particle Swarm Optimization Policy (PSO-P) in Continuous State and Action Spaces," International Journal of Swarm Intelligence Research (IJSIR) 7, no.3: 23-42. https://doi.org/10.4018/IJSIR.2016070102

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

This article introduces a model-based reinforcement learning (RL) approach for continuous state and action spaces. While most RL methods try to find closed-form policies, the approach taken here employs numerical on-line optimization of control action sequences. First, a general method for reformulating RL problems as optimization tasks is provided. Subsequently, Particle Swarm Optimization (PSO) is applied to search for optimal solutions. This Particle Swarm Optimization Policy (PSO-P) is effective for high dimensional state spaces and does not require a priori assumptions about adequate policy representations. Furthermore, by translating RL problems into optimization tasks, the rich collection of real-world inspired RL benchmarks is made available for benchmarking numerical optimization techniques. The effectiveness of PSO-P is demonstrated on the two standard benchmarks: mountain car and cart pole.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global Scientific Publishing bookstore.