Reinforcement Learning: Psychologische und neurobiologische Aspekte | KI - Künstliche Intelligenz Skip to main content
Log in

Reinforcement Learning: Psychologische und neurobiologische Aspekte

  • Technical Contribution
  • Published:
KI - Künstliche Intelligenz Aims and scope Submit manuscript

Zusammenfassung

Mathematische Modelle von neurobiologisch und psychologisch inspirierten Lernparadigmen gelten als Schlüsseltechnologie für Problemstellungen, die anhand klassischer Programmierung schwer zu lösen sind. Reinforcement Learning ist in diesem Zusammenhang eines dieser Paradigmen, welches mittlerweile recht erfolgreich in der Praxis eingesetzt wird (u. a. in der Robotik), um Verhalten durch Versuch und Irrtum zu erlernen. In diesem Artikel möchte ich etwas näher auf die in Zusammenhang stehenden neurobiologischen und psychologischen Aspekte eingehen, welche das Vorbild einer Vielzahl mathematischer Modelle sind. Gesamtheitlich betrachtet ist Reinforcement Learning nicht ausschließlich für Lernen im Gehirn von Menschen und Tieren verantwortlich. Stattdessen findet ein großartiges Zusammenspiel mehrerer Paradigmen aus unterschiedlichen Hirnarealen statt, bei welchem auch Supervised- und Unsupervised Learning beteiligt sind.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Abb. 1

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Notes

  1. Bei Batch-Training wird der Fehler offline über einer Menge mehrerer Input-Output-Muster minimiert, anstatt online für jedes Einzelne.

  2. Für alle s,a muss der Vorhersagefehler in (2) Null sein.

Literatur

  1. Albus JS (1971) A theory of cerebellar function. Math Biosci 10(1–2):25–61

    Article  Google Scholar 

  2. Artola A, Bröcher S, Singer W (1990) Different voltage-dependent thresholds for inducing long-term depression and long-term potentiation in slices of rat visual cortex. Nature 347(6288):69–72

    Article  Google Scholar 

  3. Aston-Jones G, Cohen JD (2005) An integrative theory of locus coeruleus-norepinephrine function: adaptive gain and optimal performance. Annu Rev Neurosci 28:403–450

    Article  Google Scholar 

  4. Barto AG (1995) Adaptive critics and the basal ganglia. In: Models of information processing in the basal ganglia. MIT Press, Cambridge, S 215–232

    Google Scholar 

  5. Blakemore C, Cooper GF (1970) Development of the brain depends on the visual environment. Nature 228(5270):477–478

    Article  Google Scholar 

  6. Bostan AC, Dum RP, Strick PL (2010) The basal ganglia communicate with the cerebellum. Proc Natl Acad Sci USA 107(18):8452–8456

    Article  Google Scholar 

  7. Cohen JD, McClure SM, Yu AJ (2007) Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philos Trans R Soc Lond B, Biol Sci 362(1481):933–942

    Article  Google Scholar 

  8. Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441(7095):876–879

    Article  Google Scholar 

  9. Dayan P (2009) Prospective and retrospective temporal difference learning. Networks 20(1):32–46

    Article  MathSciNet  Google Scholar 

  10. Distler M (2012) Können Lernalgorithmen interagieren wie im Gehirn? Bachelor-thesis, Fachgebiet für Intelligente Autonome Systeme, Technische Universität Darmstadt

  11. Doya K (1999) What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Netw 12(7–8):961–974

    Article  Google Scholar 

  12. Doya K (2000) Complementary roles of basal ganglia and cerebellum in learning and motor control. Curr Opin Neurobiol 10(6):732–739

    Article  Google Scholar 

  13. Doya K (2007) Reinforcement learning: computational theory and biological mechanisms. HFSP Journal 1(1):30–40

    Article  Google Scholar 

  14. Doya K (2008) Modulators of decision making. Nat Neurosci 11(4):410–416

    Article  Google Scholar 

  15. van Eck NJ, van Wezel M (2008) Application of reinforcement learning to the game of othello. Comput Oper Res 35:1999–2017

    Article  MathSciNet  MATH  Google Scholar 

  16. Ertle P, Tokic M, Cubek R, Voos H, Söffker D (2012) Towards learning of safety knowledge from human demonstrations. In: Proceedings of the 25th IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE Press, New York

    Google Scholar 

  17. Faußer S, Schwenker F (2008) Neural approximation of Monte Carlo policy evaluation deployed in connect four. In: Artificial neural networks in pattern recognition. LNAI, Bd 5064. Springer, Berlin, S 90–100

    Chapter  Google Scholar 

  18. Faußer S, Schwenker F (2010) Learning a strategy with neural approximated temporal-difference methods in English draughts. In: Proceedings of the 20th international conference on pattern recognition (ICPR’10). IEEE Press, New York, S 2925–2928

    Chapter  Google Scholar 

  19. Handrich S, Herzog A, Wolf A, Herrmann CS (2011) Combining supervised, unsupervised, and reinforcement learning in a network of spiking neurons. In: Advances in cognitive neurodynamics (II). Springer, Berlin, S 163–176

    Chapter  Google Scholar 

  20. Hans A, SchneegaßD, Schäfer AM, Udluft S (2008) Safe exploration for reinforcement learning. In: Proceedings of the 16th European symposium on artificial neural networks (ESANN), S 143–148

    Google Scholar 

  21. Hebb DO (1949) The organization of behavior: a neuropsychological theory. Wiley, New York

    Google Scholar 

  22. Hirsch HVB, Spinelli DN (1970) Visual experience modifies distribution of horizontally and vertically oriented receptive fields in cats. Science 168(3933):869–871

    Article  Google Scholar 

  23. Houk JC, Wise SP (1995) Distributed modular architectures linking basal ganglia, cerebellum, and cerebral cortex: their role in planning and controlling action. Cereb Cortex 5(2):95–110

    Article  Google Scholar 

  24. Ito M, Sakurai M, Tongroach P (1982) Climbing fibre induced depression of both mossy fibre responsiveness and glutamate sensitivity of cerebellar purkinje cells. J Gen Physiol 324(1):113–134

    Google Scholar 

  25. Kietzmann TC, Riedmiller M (2009) The neuro slot car racer: reinforcement learning in a real world setting. In: Proceedings of the 4th international conference on machine learning and applications (ICMLA). IEEE Press, New York, S 311–316

    Google Scholar 

  26. Kober J, Wilhelm A, Oztop E, Peters J (2012) Reinforcement learning to adjust parametrized motor primitives to new situations. Auton Robots 33(4):361–379

    Article  Google Scholar 

  27. Marr D (1969) A theory of cerebellar cortex. J Gen Physiol 202(2):437–470.1

    Google Scholar 

  28. Maslow AH (1943) A theory of human motivation. Psychol Rev 50(4):370–396

    Article  Google Scholar 

  29. McClure SM, Gilzenrat MS, Cohen JD (2006) An exploration-exploitation model based on norepinephrine and dopamine activity. In: Advances in neural information processing systems, Bd 18. MIT Press, Cambridge, S 867–874

    Google Scholar 

  30. Morris G, Nevet A, Arkadir D, Vaadia E, Bergman H (2006) Midbrain dopamine neurons encode decisions for future action. Nat Neurosci 9(8):1057–1063

    Article  Google Scholar 

  31. Ngo H, Luciw M, Förster A, Schmidhuber J (2012) Learning skills from play: artificial curiosity on a katana robot arm. In: Proceedings of the international joint conference of neural networks (IJCNN 2012), Brisbane, Australia, S 1–8

    Chapter  Google Scholar 

  32. Niv Y (2009) Reinforcement learning in the brain. J Math Psychol 53(3):139–154

    MathSciNet  MATH  Google Scholar 

  33. Niv Y, Daw ND, Dayan P (2006) Choice values. Nat Neurosci 9(8):987–988

    Article  Google Scholar 

  34. Pavlov IP (1927) Conditioned reflexes—an investigation of the physiological activity of the cerebral cortex. Oxford University Press, London. Translated and edited by GV Anrep

    Google Scholar 

  35. Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7–9):1180–1190

    Article  Google Scholar 

  36. Peters J, Schaal S (2008) Reinforcement learning of motor skills with policy gradients. Neural Netw 21(4):682–697

    Article  Google Scholar 

  37. Rescorla R, Wagner A (1972) A theory of pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: Classical conditioning II: current research and theory. Appleton-Century-Crofts, New York, S 64–99

    Google Scholar 

  38. Riedmiller M (2005) Neural fitted Q iteration—first experiences with a data efficient neural reinforcement learning method. In: Machine learning: ECML 2005. LNCS, Bd 3720. Springer, Berlin, S 317–328

    Chapter  Google Scholar 

  39. Riedmiller M, Gabel T, Hafner R, Lange S (2009) Reinforcement learning for robot soccer. Auton Robots 27(1):55–73

    Article  Google Scholar 

  40. Riedmiller M, Montemerlo M, Dahlkamp H (2007) Learning to drive a real car in 20 minutes. In: Proceedings of the FBIT 2007 conference, Jeju, Korea. Springer, Berlin

    Google Scholar 

  41. Roesch MR, Calu DJ, Schoenbaum G (2007) Dopamine neurons encode the better option in rats deciding between differently delayed or sized rewards. Nat Neurosci 10(12):1615–1624

    Article  Google Scholar 

  42. Sasakawa T, Hu J, Hirasawa K (2008) A brainlike learning system with supervised, unsupervised, and reinforcement learning. Electr Eng Jpn 162(1):32–39

    Article  Google Scholar 

  43. Schneider M, Ertel W (2010) Robot learning by demonstration with local Gaussian process regression. In: Proceedings of the 23rd IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE Press, New York, S 255–260

    Google Scholar 

  44. Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80(1):1–27

    Google Scholar 

  45. Schultz W, Dayan P, Montague PR (1997) A neural substrate of prediction and reward. Science 275(5306):1593–1599

    Article  Google Scholar 

  46. Simsek O, Barto AG (2006) An intrinsic reward mechanism for efficient exploration. In: Proceedings of the 23rd international conference on machine learning, S 833–840

    Google Scholar 

  47. Skinner BF (1953) Science and human behavior. Macmillan, New York

    Google Scholar 

  48. Sutton RS (1988) Learning to predict by the methods of temporal differences. Mach Learn 3(1):9–44

    Google Scholar 

  49. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge

    Google Scholar 

  50. Tesauro G (2002) Programming backgammon using self-teaching neural nets. Artif Intell 134(1–2):181–199

    Article  MATH  Google Scholar 

  51. Thorndike EL (1911) Animal intelligence. Macmillan, New York

    Google Scholar 

  52. Thrun S (1995) Learning to play the game of chess. In: Advances in neural information processing systems, Bd 7. MIT Press, Cambridge, S 1069–1076

    Google Scholar 

  53. Togelius J, Schaul T, Wierstra D, Igel C, Gomez F, Schmidhuber J (2009) Ontogenetic and phylogenetic reinforcement learning. Künstl Intell 03/2009:30–33

    Google Scholar 

  54. Tokic M, Fessler J, Ertel W (2009) The crawler, a class room demonstrator for reinforcement learning. In: Proceedings of the 22th international florida artificial intelligence research society conference (FLAIRS). AAAI Press, New York, S 160–165

    Google Scholar 

  55. Tokic M, Palm G (2011) Value-difference based exploration: adaptive control between epsilon-greedy and softmax. In: KI 2011: advances in artificial intelligence. LNAI, Bd 7006. Springer, Berlin, S 335–346

    Chapter  Google Scholar 

  56. Tokic M, Palm G (2012) Adaptive exploration using stochastic neurons. In: Artificial neural networks and machine learning – ICANN 2012. LNCS, Bd 7553. Springer, Berlin, S 42–49

    Chapter  Google Scholar 

  57. Tokic M, Palm G (2012) Gradient algorithms for Exploration/Exploitation trade-offs: global and local variants. In: Artificial neural networks in pattern recognition. LNAI, Bd 7477. Springer, Berlin, S 60–71

    Chapter  Google Scholar 

  58. Tsumoto T, Suda K (1979) Cross-depression: an electrophysiological manifestation of binocular competition in the developing visual cortex. Brain Res 168(1):190–194

    Article  Google Scholar 

  59. Vitay J, Fix J, Beuth F, Schroll H, Hamker F (2009) Biological models of reinforcement learning. Künstl Intell 03(2009):12–18

    Google Scholar 

  60. Wardle F (1987) Getting back to the basics of children’s play. Child Care Inf Exch 57:27–30

    Google Scholar 

  61. Watkins C (1989) Learning from delayed rewards. Ph.D. thesis, University of Cambridge, Cambridge, England

  62. Watson JB, Rayner R (1920) Conditioned emotional reactions. J Exp Psychol 3(1):1–14

    Article  Google Scholar 

  63. Wierstra D, Förster A, Peters J, Schmidhuber J (2010) Recurrent policy gradients. Log J IGPL 18(5):620–634

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michel Tokic.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tokic, M. Reinforcement Learning: Psychologische und neurobiologische Aspekte. Künstl Intell 27, 213–219 (2013). https://doi.org/10.1007/s13218-013-0261-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13218-013-0261-4

Schlüsselwörter

Navigation