Zusammenfassung
Mathematische Modelle von neurobiologisch und psychologisch inspirierten Lernparadigmen gelten als Schlüsseltechnologie für Problemstellungen, die anhand klassischer Programmierung schwer zu lösen sind. Reinforcement Learning ist in diesem Zusammenhang eines dieser Paradigmen, welches mittlerweile recht erfolgreich in der Praxis eingesetzt wird (u. a. in der Robotik), um Verhalten durch Versuch und Irrtum zu erlernen. In diesem Artikel möchte ich etwas näher auf die in Zusammenhang stehenden neurobiologischen und psychologischen Aspekte eingehen, welche das Vorbild einer Vielzahl mathematischer Modelle sind. Gesamtheitlich betrachtet ist Reinforcement Learning nicht ausschließlich für Lernen im Gehirn von Menschen und Tieren verantwortlich. Stattdessen findet ein großartiges Zusammenspiel mehrerer Paradigmen aus unterschiedlichen Hirnarealen statt, bei welchem auch Supervised- und Unsupervised Learning beteiligt sind.
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Bei Batch-Training wird der Fehler offline über einer Menge mehrerer Input-Output-Muster minimiert, anstatt online für jedes Einzelne.
Für alle s,a muss der Vorhersagefehler in (2) Null sein.
Literatur
Albus JS (1971) A theory of cerebellar function. Math Biosci 10(1–2):25–61
Artola A, Bröcher S, Singer W (1990) Different voltage-dependent thresholds for inducing long-term depression and long-term potentiation in slices of rat visual cortex. Nature 347(6288):69–72
Aston-Jones G, Cohen JD (2005) An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annu Rev Neurosci 28:403–450
Barto AG (1995) Adaptive critics and the basal ganglia. In: Models of information processing in the basal ganglia. MIT Press, Cambridge, S 215–232
Blakemore C, Cooper GF (1970) Development of the brain depends on the visual environment. Nature 228(5270):477–478
Bostan AC, Dum RP, Strick PL (2010) The basal ganglia communicate with the cerebellum. Proc Natl Acad Sci USA 107(18):8452–8456
Cohen JD, McClure SM, Yu AJ (2007) Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philos Trans R Soc Lond B, Biol Sci 362(1481):933–942
Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441(7095):876–879
Dayan P (2009) Prospective and retrospective temporal difference learning. Networks 20(1):32–46
Distler M (2012) Können Lernalgorithmen interagieren wie im Gehirn? Bachelor-thesis, Fachgebiet für Intelligente Autonome Systeme, Technische Universität Darmstadt
Doya K (1999) What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw 12(7–8):961–974
Doya K (2000) Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr Opin Neurobiol 10(6):732–739
Doya K (2007) Reinforcement learning: computational theory and biological mechanisms. HFSP Journal 1(1):30–40
Doya K (2008) Modulators of decision making. Nat Neurosci 11(4):410–416
van Eck NJ, van Wezel M (2008) Application of reinforcement learning to the game of othello. Comput Oper Res 35:1999–2017
Ertle P, Tokic M, Cubek R, Voos H, Söffker D (2012) Towards learning of safety knowledge from human demonstrations. In: Proceedings of the 25th IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE Press, New York
Faußer S, Schwenker F (2008) Neural approximation of Monte Carlo policy evaluation deployed in connect four. In: Artificial neural networks in pattern recognition. LNAI, Bd 5064. Springer, Berlin, S 90–100
Faußer S, Schwenker F (2010) Learning a strategy with neural approximated temporal-difference methods in English draughts. In: Proceedings of the 20th international conference on pattern recognition (ICPR’10). IEEE Press, New York, S 2925–2928
Handrich S, Herzog A, Wolf A, Herrmann CS (2011) Combining supervised, unsupervised, and reinforcement learning in a network of spiking neurons. In: Advances in cognitive neurodynamics (II). Springer, Berlin, S 163–176
Hans A, SchneegaßD, Schäfer AM, Udluft S (2008) Safe exploration for reinforcement learning. In: Proceedings of the 16th European symposium on artificial neural networks (ESANN), S 143–148
Hebb DO (1949) The organization of behavior: a neuropsychological theory. Wiley, New York
Hirsch HVB, Spinelli DN (1970) Visual experience modifies distribution of horizontally and vertically oriented receptive fields in cats. Science 168(3933):869–871
Houk JC, Wise SP (1995) Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: their role in planning and controlling action. Cereb Cortex 5(2):95–110
Ito M, Sakurai M, Tongroach P (1982) Climbing fibre induced depression of both mossy fibre responsiveness and glutamate sensitivity of cerebellar purkinje cells. J Gen Physiol 324(1):113–134
Kietzmann TC, Riedmiller M (2009) The neuro slot car racer: reinforcement learning in a real world setting. In: Proceedings of the 4th international conference on machine learning and applications (ICMLA). IEEE Press, New York, S 311–316
Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parametrized motor primitives to new situations. Auton Robots 33(4):361–379
Marr D (1969) A theory of cerebellar cortex. J Gen Physiol 202(2):437–470.1
Maslow AH (1943) A theory of human motivation. Psychol Rev 50(4):370–396
McClure SM, Gilzenrat MS, Cohen JD (2006) An exploration-exploitation model based on norepinephrine and dopamine activity. In: Advances in neural information processing systems, Bd 18. MIT Press, Cambridge, S 867–874
Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H (2006) Midbrain dopamine neurons encode decisions for future action. Nat Neurosci 9(8):1057–1063
Ngo H, Luciw M, Förster A, Schmidhuber J (2012) Learning skills from play: artificial curiosity on a katana robot arm. In: Proceedings of the international joint conference of neural networks (IJCNN 2012), Brisbane, Australia, S 1–8
Niv Y (2009) Reinforcement learning in the brain. J Math Psychol 53(3):139–154
Niv Y, Daw ND, Dayan P (2006) Choice values. Nat Neurosci 9(8):987–988
Pavlov IP (1927) Conditioned reflexes—an investigation of the physiological activity of the cerebral cortex. Oxford University Press, London. Translated and edited by GV Anrep
Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7–9):1180–1190
Peters J, Schaal S (2008) Reinforcement learning of motor skills with policy gradients. Neural Netw 21(4):682–697
Rescorla R, Wagner A (1972) A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Classical conditioning II: current research and theory. Appleton-Century-Crofts, New York, S 64–99
Riedmiller M (2005) Neural fitted Q iteration—first experiences with a data efficient neural reinforcement learning method. In: Machine learning: ECML 2005. LNCS, Bd 3720. Springer, Berlin, S 317–328
Riedmiller M, Gabel T, Hafner R, Lange S (2009) Reinforcement learning for robot soccer. Auton Robots 27(1):55–73
Riedmiller M, Montemerlo M, Dahlkamp H (2007) Learning to drive a real car in 20 minutes. In: Proceedings of the FBIT 2007 conference, Jeju, Korea. Springer, Berlin
Roesch MR, Calu DJ, Schoenbaum G (2007) Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat Neurosci 10(12):1615–1624
Sasakawa T, Hu J, Hirasawa K (2008) A brainlike learning system with supervised, unsupervised, and reinforcement learning. Electr Eng Jpn 162(1):32–39
Schneider M, Ertel W (2010) Robot learning by demonstration with local Gaussian process regression. In: Proceedings of the 23rd IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE Press, New York, S 255–260
Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80(1):1–27
Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275(5306):1593–1599
Simsek O, Barto AG (2006) An intrinsic reward mechanism for efficient exploration. In: Proceedings of the 23rd international conference on machine learning, S 833–840
Skinner BF (1953) Science and human behavior. Macmillan, New York
Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44
Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Tesauro G (2002) Programming backgammon using self-teaching neural nets. Artif Intell 134(1–2):181–199
Thorndike EL (1911) Animal intelligence. Macmillan, New York
Thrun S (1995) Learning to play the game of chess. In: Advances in neural information processing systems, Bd 7. MIT Press, Cambridge, S 1069–1076
Togelius J, Schaul T, Wierstra D, Igel C, Gomez F, Schmidhuber J (2009) Ontogenetic and phylogenetic reinforcement learning. Künstl Intell 03/2009:30–33
Tokic M, Fessler J, Ertel W (2009) The crawler, a class room demonstrator for reinforcement learning. In: Proceedings of the 22th international florida artificial intelligence research society conference (FLAIRS). AAAI Press, New York, S 160–165
Tokic M, Palm G (2011) Value-difference based exploration: adaptive control between epsilon-greedy and softmax. In: KI 2011: advances in artificial intelligence. LNAI, Bd 7006. Springer, Berlin, S 335–346
Tokic M, Palm G (2012) Adaptive exploration using stochastic neurons. In: Artificial neural networks and machine learning – ICANN 2012. LNCS, Bd 7553. Springer, Berlin, S 42–49
Tokic M, Palm G (2012) Gradient algorithms for Exploration/Exploitation trade-offs: global and local variants. In: Artificial neural networks in pattern recognition. LNAI, Bd 7477. Springer, Berlin, S 60–71
Tsumoto T, Suda K (1979) Cross-depression: an electrophysiological manifestation of binocular competition in the developing visual cortex. Brain Res 168(1):190–194
Vitay J, Fix J, Beuth F, Schroll H, Hamker F (2009) Biological models of reinforcement learning. Künstl Intell 03(2009):12–18
Wardle F (1987) Getting back to the basics of children’s play. Child Care Inf Exch 57:27–30
Watkins C (1989) Learning from delayed rewards. Ph.D. thesis, University of Cambridge, Cambridge, England
Watson JB, Rayner R (1920) Conditioned emotional reactions. J Exp Psychol 3(1):1–14
Wierstra D, Förster A, Peters J, Schmidhuber J (2010) Recurrent policy gradients. Log J IGPL 18(5):620–634
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tokic, M. Reinforcement Learning: Psychologische und neurobiologische Aspekte. Künstl Intell 27, 213–219 (2013). https://doi.org/10.1007/s13218-013-0261-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13218-013-0261-4