Abstract
This paper introduces the Reinforced Genetic Programming (RGP) system, which enhances standard tree-based genetic programming (GP) with reinforcement learning (RL). RGP adds a new element to the GP function set: monitored action-selection points that provide hooks to a reinforcement-learning system. Using strong typing, RGP can restrict these choice points to leaf nodes, thereby turning GP trees into classify-and-act procedures. Then, environmental reinforcements channeled back through the choice points provide the basis for both lifetime learning and general GP fitness assessment. This paves the way for evolutionary acceleration via both Baldwinian and Lamarckian mechanisms. In addition, the hybrid hints of potential improvements to RL by exploiting evolution to design proper abstraction spaces, via the problem-state classifications of the internal tree nodes. This paper details the basic mechanisms of RGP and demonstrates its application on a series of static and dynamic maze-search problems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
D. H. Ackley and M. L. Littman, “Interactions between learning and evolution,” in Artificial Life II, C. G. Langton, C. Taylor, J. D. Farmer, and S. Rasmussen (eds.), Addison-Wesley, Reading, MA, 1992, pp. 487-509.
J. M. Baldwin, A new factor in evolution, American Naturalist vol. 30, pp. 441-451, 1896.
K. A. DeJong, “Genetic-algorithm-based learning,” in Machine Learning, Y. Kodratoff and R. Michalski (eds.), vol. 3, Morgan Kaufmann: San Francisco, 1990, pp. 611-638.
G. E. Hinton and S. J. Nowlan, “How learning can guide evolution,” Complex Syst. vol. 1, pp. 495-502, 1987.
J. H. Holland, Adaptation in Natural and Artificial Systems, 2nd ed., The MIT Press: Cambridge, MA, 1992.
C. R. Houck, J. A. Joines, M. G. Kay, and J. R. Wilson, “Empirical investigation of the benefits of partial Lamarckianism,” Evolutionary Comput. vol. 5, pp. 31-60, 1997.
H. Iba, “Multi-agent reinforcement learning with genetic programming,” in Genetic Programming 1998: Proc. Third Annual Conf., J. R. Koza, W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D. B. Fogel, M. H. Garzon, D. E. Goldberg, H. Iba, and R. Riolo (eds.), Morgan Kaufmann: San Francisco, 1998, pp. 167-172.
H. Kitano, “Deisgning neutral networks using genetic algorithms with graph generation system,” Complex syst. vol. 4, pp. 461-467, 1990.
J. R. Koza, Genetic Programming: On the Programming of Computers by Natural Selection, MIT Press: Cambridge, MA, 1992.
J. B. Lamarck, “Of the influence of the environment on the activities and habits of animals, and the influence of the activities and habits of these living bodies in modifiying their organization and structure,” Zool. Philos., pp. 106-127, 1914.
P. L. Lanzi and S. W. Wilson, “Toward optimal classifier system performance in non-markov environments,” Evolution Comput. vol. 8, pp. 393-418, 2000.
G. Mayley, “Landscapes, learning costs and genetic assimilation,” Evolutionary Comput. vol. 4, 1996.
G. F. Miller, P. M. Todd, and S. U. Hedge, “Designing neutral networks using genetic algorithms,” in Proc. Third Int. Conf. Genetic Algorithms, Morgan Kaufmann: San Francisco, 1989, pp. 379-384.
R. L. Riolo, “Bucket brigade performance: I. Long sequences of classifiers,” in Proc. Second Int. Conf. Genetic Algorithms, J. J. Grefenstette (ed.), Lawrence Erlbaum Association: Mahwah, NJ, 1987, pp. 184-195.
R. L. Riolo, “Lookahead planning and latent learning in a classifier system,” in Proc. First Int. Conf. Simulation of Adaptive Behavior: From Animals to Animats, J.-A. Meyer and S. W. Wilson (eds.), MIT Press: Cambridge, MA, 1991, pp. 316-326.
G. G. Robertson and R. L. Riolo, “A tale of two classifier systems,” Machine Learning vol. 3, pp. 139-159, 1988.
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press: Cambridge, MA, 1998.
S. Taylor, “Using Lamarckian evolution to increase the effectiveness of neutral network training with a genetic algorithm and backpropagation,” in Artificial Life at Stanford 1994, J. R. Koza (ed.), Stanford Bookstore: Stanford, CA, 1994, pp. 181-186.
A. Teller, “The internal reinforcement of evolving algorithms,” in Advances in Genetic Programming 3, L. Spector, W. B. Langdon, U.-M. O'Reilly, and P. J. Angeline (eds.), MIT Press: Cambridge, MA, 1999, pp. 325-354.
G. Tesauro, “Temporal difference learning and TD-Gammon,” Commun. ACM vol. 38, pp. 58-68, 1995.
P. Turney, L. D. Whitley, and R. W. Anderson, “Introduction to the special issue: Evolution, learning, and instinct: 100 years of the Baldwin effect,” Evolutionary Comput. vol. 4, pp. iv-viii, 1997.
C. Watkins and P. Dayan, “Q-learning,” Machine Learning, vol. 8, pp. 297-292, 1992.
D. L. Whitley, V. S. Gordon, and K. E. Mathias, “Lamarckian evolution, the Baldw in effect and function optimization,” in Parallel Problem Solving from Nature-PPSN III, Y. Davidor, H.-P. Schwefel, and R. Manner (eds.), Springer-Verlag: Berlin, 1994, pp. 6-15.
S. W. Wilson and D. E. Goldberg, “A critical reviewof classifier systems,” in Proc. 3rd Int. Conf. Genetic Algorithms (ICGA89), J. D. Schaffer (ed.), Morgan Kaufmann: San Francisco, CA, 1989, pp. 244-255.
L. Yaeger, “Computational genetics, physiology, metabolism, neutral systems, learning, vision and behavior or polyworld: Life in a new context,” in Artificial Life III, Proc. vol. XVII, C. G. Langton (ed.), Addison-Wesley, Reading, MA, 1994, Santa Fe Institute Studies in the Sciences of Complexity, pp. 263-298.
W. Zhang and T. G. Dietterich, “A reinforcement learning approach to job-shop scheduling,” in Proc. Int. Joint Conf. Artificial Intelligence, C. S. Mellish, (ed.), Morgan Kaufmann: San Francisco, CA, 1995, pp. 1114-1120.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Downing, K.L. Reinforced Genetic Programming. Genetic Programming and Evolvable Machines 2, 259–288 (2001). https://doi.org/10.1023/A:1011953410319
Issue Date:
DOI: https://doi.org/10.1023/A:1011953410319