{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,9,5]],"date-time":"2024-09-05T18:55:08Z","timestamp":1725562508832},"reference-count":83,"publisher":"Springer Science and Business Media LLC","issue":"2","license":[{"start":{"date-parts":[[2022,7,16]],"date-time":"2022-07-16T00:00:00Z","timestamp":1657929600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,7,16]],"date-time":"2022-07-16T00:00:00Z","timestamp":1657929600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/100000009","name":"Foundation for the National Institutes of Health","doi-asserted-by":"publisher","award":["1R01CA240452-01A1"],"id":[{"id":"10.13039\/100000009","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100003130","name":"Research Foundation Flanders","doi-asserted-by":"crossref","award":["1242021N"],"id":[{"id":"10.13039\/501100003130","id-type":"DOI","asserted-by":"crossref"}]},{"name":"Onderzoeksprogramma Artifici\\\"{e}le Intelligentie (AI) Vlaanderen"},{"DOI":"10.13039\/501100001858","name":"Swedish Governmental Agency for Innovation Systems","doi-asserted-by":"crossref","award":["NFFP7\/2017-04885"],"id":[{"id":"10.13039\/501100001858","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/501100004063","name":"Knut and Alice Wallenberg Foundation","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100004063","id-type":"DOI","asserted-by":"crossref"}]},{"DOI":"10.13039\/100007630","name":"College of Engineering and Informatics, National University of Ireland, Galway","doi-asserted-by":"publisher","id":[{"id":"10.13039\/100007630","id-type":"DOI","asserted-by":"publisher"}]},{"DOI":"10.13039\/501100006643","name":"Federation University Australia","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100006643","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Auton Agent Multi-Agent Syst"],"published-print":{"date-parts":[[2022,10]]},"abstract":"Abstract<\/jats:title>The recent paper \u201cReward is Enough\u201d by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, this type of reward is insufficient for the development of human-aligned artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour.<\/jats:p>","DOI":"10.1007\/s10458-022-09575-5","type":"journal-article","created":{"date-parts":[[2022,7,16]],"date-time":"2022-07-16T04:02:58Z","timestamp":1657944178000},"update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":11,"title":["Scalar reward is not enough: a response to Silver, Singh, Precup and Sutton (2021)"],"prefix":"10.1007","volume":"36","author":[{"ORCID":"http:\/\/orcid.org\/0000-0002-8687-4424","authenticated-orcid":false,"given":"Peter","family":"Vamplew","sequence":"first","affiliation":[]},{"given":"Benjamin J.","family":"Smith","sequence":"additional","affiliation":[]},{"given":"Johan","family":"K\u00e4llstr\u00f6m","sequence":"additional","affiliation":[]},{"given":"Gabriel","family":"Ramos","sequence":"additional","affiliation":[]},{"given":"Roxana","family":"R\u0103dulescu","sequence":"additional","affiliation":[]},{"given":"Diederik M.","family":"Roijers","sequence":"additional","affiliation":[]},{"given":"Conor F.","family":"Hayes","sequence":"additional","affiliation":[]},{"given":"Fredrik","family":"Heintz","sequence":"additional","affiliation":[]},{"given":"Patrick","family":"Mannion","sequence":"additional","affiliation":[]},{"given":"Pieter J. K.","family":"Libin","sequence":"additional","affiliation":[]},{"given":"Richard","family":"Dazeley","sequence":"additional","affiliation":[]},{"given":"Cameron","family":"Foale","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,7,16]]},"reference":[{"key":"9575_CR1","unstructured":"Abdolmaleki, A., Huang, S., Hasenclever, L., Neunert, M., Song, F., Zambelli, M., Martins, M., Heess, N., Hadsell, R., & Riedmiller, M. (2020). A distributional view on multi-objective policy optimization. In International Conference on Machine Learning (pp. 11\u201322). PMLR."},{"key":"9575_CR2","unstructured":"Abdolmaleki, A., Huang, S. H., Vezzani, G., Shahriari, B., Springenberg, J. T., Mishra, S., TB, D., Byravan, A., Bousmalis, K., Gyorgy, A., et\u00a0al. (2021). On multi-objective policy optimization as a tool for reinforcement learning. arXiv preprint arXiv:2106.08199."},{"key":"9575_CR3","unstructured":"Abels, A., Roijers, D., Lenaerts, T., Now\u00e9, A., & Steckelmacher, D. (2019). Dynamic weights in multi-objective deep reinforcement learning. In International Conference on Machine Learning (pp. 11\u201320). PMLR."},{"key":"9575_CR4","unstructured":"Alegre, L. N., Bazzan, A. L., & da\u00a0Silva, B. C. (2022). Optimistic linear support and successor features as a basis for optimal policy transfer. arXiv preprint arXiv:2206.11326."},{"issue":"1","key":"9575_CR5","doi-asserted-by":"publisher","first-page":"357","DOI":"10.1146\/annurev.ne.09.030186.002041","volume":"9","author":"GE Alexander","year":"1986","unstructured":"Alexander, G. E., DeLong, M. R., & Strick, P. L. (1986). Parallel organization of functionally segregated circuits linking basal ganglia and cortex. Annual Review of Neuroscience, 9(1), 357\u2013381.","journal-title":"Annual Review of Neuroscience"},{"key":"9575_CR6","doi-asserted-by":"publisher","first-page":"65","DOI":"10.1613\/jair.1.12202","volume":"70","author":"M Alfonseca","year":"2021","unstructured":"Alfonseca, M., Cebrian, M., Anta, A. F., Coviello, L., Abeliuk, A., & Rahwan, I. (2021). Superintelligence cannot be contained: Lessons from computability theory. Journal of Artificial Intelligence Research, 70, 65\u201376.","journal-title":"Journal of Artificial Intelligence Research"},{"key":"9575_CR7","unstructured":"Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & Man\u00e9, D. (2016). Concrete problems in ai safety. arXiv preprint arXiv:1606.06565. https:\/\/arxiv.org\/pdf\/1606.06565.pdf."},{"key":"9575_CR8","unstructured":"Barreto, A., Dabney, W., Munos, R., Hunt, J. J., Schaul, T., van Hasselt, H. P., & Silver, D. (2017). Successor features for transfer in reinforcement learning. In Advances in neural information processing systems (pp. 4055\u20134065)."},{"key":"9575_CR9","doi-asserted-by":"crossref","unstructured":"Barto, A. G. (2013). Intrinsic motivation and reinforcement learning. In Intrinsically motivated learning in natural and artificial systems (pp. 17\u201347). Springer.","DOI":"10.1007\/978-3-642-32375-1_2"},{"key":"9575_CR10","unstructured":"Bostrom, N. (2003). Ethical issues in advanced artificial intelligence. Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence, pp. 12\u201317."},{"key":"9575_CR11","unstructured":"Bostrom, N. (2014). Superintelligence: Paths, dangers, strategies."},{"issue":"3","key":"9575_CR12","doi-asserted-by":"publisher","first-page":"465","DOI":"10.1007\/s11098-013-0259-7","volume":"170","author":"D Bourget","year":"2014","unstructured":"Bourget, D., & Chalmers, D. J. (2014). What do philosophers believe? Philosophical Studies, 170(3), 465\u2013500.","journal-title":"Philosophical Studies"},{"key":"9575_CR13","doi-asserted-by":"publisher","first-page":"48","DOI":"10.1016\/j.neucom.2017.02.096","volume":"263","author":"T Brys","year":"2017","unstructured":"Brys, T., Harutyunyan, A., Vrancx, P., Now\u00e9, A., & Taylor, M. E. (2017). Multi-objectivization and ensembles of shapings in reinforcement learning. Neurocomputing, 263, 48\u201359.","journal-title":"Neurocomputing"},{"key":"9575_CR14","doi-asserted-by":"crossref","unstructured":"Brys, T., Van\u00a0Moffaert, K., Van\u00a0Vaerenbergh, K., & Now\u00e9, A. (2013). On the behaviour of scalarization methods for the engagement of a wet clutch. In 2013 12th International Conference on Machine Learning and Applications (Vol.\u00a01, pp. 258\u2013263). IEEE.","DOI":"10.1109\/ICMLA.2013.52"},{"key":"9575_CR15","unstructured":"Byrnes, S. (2021). Big picture of phasic dopamine. Alignment Forum. https:\/\/www.alignmentforum.org\/posts\/jrewt3rLFiKWrKuyZ\/big-picture-of-phasic-dopamine."},{"issue":"43","key":"9575_CR16","doi-asserted-by":"publisher","first-page":"15368","DOI":"10.1073\/pnas.1414602111","volume":"111","author":"AW Cappelen","year":"2014","unstructured":"Cappelen, A. W., Eichele, T., Hugdahl, K., Specht, K., S\u00f8rensen, E. \u00d8., & Tungodden, B. (2014). Equity theory and fair inequality: A neuroeconomic study. Proceedings of the National Academy of Sciences, 111(43), 15368\u201315372. https:\/\/doi.org\/10.1073\/pnas.1414602111.","journal-title":"Proceedings of the National Academy of Sciences"},{"key":"9575_CR17","doi-asserted-by":"crossref","unstructured":"Cheney, D. L., & Seyfarth, R. M. (1990). How Monkeys See The World: Inside the mind of another species. University of Chicago Press.","DOI":"10.7208\/chicago\/9780226218526.001.0001"},{"key":"9575_CR18","unstructured":"Clemen, R. T. (1996). Making hard decisions: an introduction to decision analysis. Brooks\/Cole Publishing Company."},{"key":"9575_CR19","doi-asserted-by":"crossref","unstructured":"Coello, C. A. C., & Lamont, G. B. (2004). Applications of multi-objective evolutionary algorithms (Vol. 1). World Scientific.","DOI":"10.1142\/5712"},{"key":"9575_CR20","unstructured":"Coello, C. A. C., Lamont, G. B., Van Veldhuizen, D. A., et al. (2007). Evolutionary algorithms for solving multi-objective problems (Vol. 5). Springer."},{"issue":"6498","key":"9575_CR21","doi-asserted-by":"publisher","first-page":"1433","DOI":"10.1126\/science.aba9647","volume":"368","author":"D Coyle","year":"2020","unstructured":"Coyle, D., & Weller, A. (2020). \u201cExplaining\u2019\u2019 machine learning reveals policy challenges. Science, 368(6498), 1433\u20131434.","journal-title":"Science"},{"key":"9575_CR22","doi-asserted-by":"crossref","unstructured":"Cruz, F., Dazeley, R., & Vamplew, P. (2019). Memory-based explainable reinforcement learning. In Australasian joint conference on artificial intelligence (pp. 66\u201377). Springer.","DOI":"10.1007\/978-3-030-35288-2_6"},{"key":"9575_CR23","unstructured":"Das, A., Gervet, T., Romoff, J., Batra, D., Parikh, D., Rabbat, M., & Pineau, J. (2019). Tarmac: Targeted multi-agent communication. In: International Conference on machine learning (pp. 1538\u20131546). PMLR."},{"issue":"1","key":"9575_CR24","doi-asserted-by":"publisher","first-page":"63","DOI":"10.1007\/BF01197559","volume":"14","author":"I Das","year":"1997","unstructured":"Das, I., & Dennis, J. E. (1997). A closer look at drawbacks of minimizing weighted sums of objectives for pareto set generation in multicriteria optimization problems. Structural optimization, 14(1), 63\u201369.","journal-title":"Structural optimization"},{"key":"9575_CR25","unstructured":"Dazeley, R., Vamplew, P., & Cruz, F. (2021). Explainable reinforcement learning for broad-xai: a conceptual framework and survey. arXiv preprint arXiv:2108.09003."},{"key":"9575_CR26","doi-asserted-by":"publisher","first-page":"103525","DOI":"10.1016\/j.artint.2021.103525","volume":"299","author":"R Dazeley","year":"2021","unstructured":"Dazeley, R., Vamplew, P., Foale, C., Young, C., Aryal, S., & Cruz, F. (2021). Levels of explainable artificial intelligence for human-aligned conversational explanations. Artificial Intelligence, 299, 103525.","journal-title":"Artificial Intelligence"},{"key":"9575_CR27","doi-asserted-by":"crossref","unstructured":"Deb, K. (2014). Multi-objective optimization. In Search methodologies (pp. 403\u2013449). Springer.","DOI":"10.1007\/978-1-4614-6940-7_15"},{"key":"9575_CR28","doi-asserted-by":"crossref","unstructured":"Debreu, G. (1997) On the preferences characterization of additively separable utility. In Constructing Scalar-Valued Objective Functions (pp. 25\u201338). Springer.","DOI":"10.1007\/978-3-642-48773-6_3"},{"key":"9575_CR29","doi-asserted-by":"crossref","unstructured":"Delgado, M., & Rigney, A. (2009). Reward systems: Human. Encyclopedia of Neuroscience, 8, 345\u2013352.","DOI":"10.1016\/B978-008045046-9.00855-X"},{"issue":"3","key":"9575_CR30","doi-asserted-by":"publisher","first-page":"343","DOI":"10.1017\/S0140525X00016393","volume":"6","author":"DC Dennett","year":"1983","unstructured":"Dennett, D. C. (1983). Intentional systems in cognitive ethology: The \u201cpanglossian paradigm\u2019\u2019 defended. Behavioral and Brain Sciences, 6(3), 343\u2013355.","journal-title":"Behavioral and Brain Sciences"},{"key":"9575_CR31","unstructured":"Dewey, D. (2014). Reinforcement learning and the reward engineering principle. In 2014 AAAI Spring Symposium Series."},{"issue":"6","key":"9575_CR32","doi-asserted-by":"publisher","first-page":"400","DOI":"10.1177\/1059712308092835","volume":"16","author":"S Elfwing","year":"2008","unstructured":"Elfwing, S., Uchibe, E., Doya, K., & Christensen, H. I. (2008). Co-evolution of shaping rewards and meta-parameters in reinforcement learning. Adaptive Behavior, 16(6), 400\u2013412.","journal-title":"Adaptive Behavior"},{"key":"9575_CR33","doi-asserted-by":"crossref","unstructured":"Everitt, T., Lea, G., & Hutter, M. (2018). AGI safety literature review. arXiv preprint arXiv:1805.01109.","DOI":"10.24963\/ijcai.2018\/768"},{"issue":"1","key":"9575_CR34","doi-asserted-by":"publisher","first-page":"32130","DOI":"10.3402\/snp.v6.32130","volume":"6","author":"DS Fleischman","year":"2016","unstructured":"Fleischman, D. S. (2016). An evolutionary behaviorist perspective on orgasm. Socioaffective neuroscience and psychology, 6(1), 32130.","journal-title":"Socioaffective neuroscience and psychology"},{"key":"9575_CR35","doi-asserted-by":"crossref","unstructured":"Frankfurt, H. (1982). The importance of what we care about. Synthese, pp. 257\u2013272.","DOI":"10.1007\/BF00484902"},{"key":"9575_CR36","doi-asserted-by":"crossref","unstructured":"Graham, J., Haidt, J., Koleva, S., Motyl, M., Iyer, R., Wojcik, S. P., & Ditto, P. H. (2013). Moral foundations theory: The pragmatic validity of moral pluralism. In Advances in experimental social psychology (Vol.\u00a047, pp. 55\u2013130). Elsevier.","DOI":"10.1016\/B978-0-12-407236-7.00002-4"},{"key":"9575_CR37","unstructured":"Griffin, D. R. (1976). The Question Of Animal Awareness: Evolutionary Continuity Of Mental Experience. Rockefeller University Press."},{"issue":"4","key":"9575_CR38","doi-asserted-by":"publisher","first-page":"814","DOI":"10.1037\/0033-295X.108.4.814","volume":"108","author":"J Haidt","year":"2001","unstructured":"Haidt, J. (2001). The emotional dog and its rational tail: A social intuitionist approach to moral judgment. Psychological Review, 108(4), 814.","journal-title":"Psychological Review"},{"key":"9575_CR39","unstructured":"Harari, Y. N. (2014). Sapiens: A brief history of humankind. Random House."},{"key":"9575_CR40","unstructured":"Havrylov, S., & Titov, I. (2017). Emergence of language with multi-agent games: Learning to communicate with sequences of symbols. In 31st Conference on Neural Information Processing Systems."},{"key":"9575_CR41","doi-asserted-by":"crossref","unstructured":"Hayes, C.F., R\u0103dulescu, R., Bargiacchi, E., K\u00e4llstr\u00f6m, J., Macfarlane, M., Reymond, M., Verstraeten, T., Zintgraf, L.M., Dazeley, R., Heintz, F., Howley, E., Irissappane, A.A., Mannion, P., Now\u00e9, A., Ramos, G., Restelli, M., Vamplew, P., Roijers, D.M.: A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Multi-Agent Systems 36 (2022)","DOI":"10.1007\/s10458-022-09552-y"},{"key":"9575_CR42","doi-asserted-by":"crossref","unstructured":"Henrich, J. (2015). The secret of our success. Princeton University Press.","DOI":"10.2307\/j.ctvc77f0d"},{"key":"9575_CR43","first-page":"473","volume":"171","author":"B Hibbard","year":"2008","unstructured":"Hibbard, B. (2008). Open source AI. Frontiers in Artificial Intelligence and Applications, 171, 473.","journal-title":"Frontiers in Artificial Intelligence and Applications"},{"key":"9575_CR44","doi-asserted-by":"crossref","unstructured":"Igarashi, A., & Roijers, D. M. (2017). Multi-criteria coalition formation games. In International Conference on Algorithmic DecisionTheory (pp. 197\u2013213). Springer.","DOI":"10.1007\/978-3-319-67504-6_14"},{"issue":"1","key":"9575_CR45","doi-asserted-by":"publisher","first-page":"6","DOI":"10.1016\/S0165-0173(99)00023-5","volume":"31","author":"S Ikemoto","year":"1999","unstructured":"Ikemoto, S., & Panksepp, J. (1999). The role of nucleus accumbens dopamine in motivated behavior: a unifying interpretation with special reference to reward-seeking. Brain Research Reviews, 31(1), 6\u201341.","journal-title":"Brain Research Reviews"},{"key":"9575_CR46","doi-asserted-by":"publisher","first-page":"39","DOI":"10.1016\/j.neucom.2017.04.074","volume":"263","author":"TG Karimpanal","year":"2017","unstructured":"Karimpanal, T. G., & Wilhelm, E. (2017). Identification and off-policy learning of multiple objectives using adaptive clustering. Neurocomputing, 263, 39\u201347.","journal-title":"Neurocomputing"},{"key":"9575_CR47","unstructured":"Kilcher, Y. (2021). Reward is enough (machine learning research paper explained). https:\/\/www.youtube.com\/watch?v=dmH1ZpcROMk &t=24s."},{"key":"9575_CR48","unstructured":"Krakovna, V., Orseau, L., Ngo, R., Martic, M., & Legg, S. (2020). Avoiding side effects by considering future tasks. arXiv preprint arXiv:2010.07877."},{"key":"9575_CR49","unstructured":"Kurniawan, B. (2021). Single- and multiobjective reinforcement learning in dynamic adversarial games. Ph.D. thesis, Federation University Australia."},{"key":"9575_CR50","unstructured":"Leike, J., Martic, M., Krakovna, V., Ortega, P.A., Everitt, T., Lefrancq, A., Orseau, L., & Legg, S. (2017). AI safety gridworlds. arXiv preprint arXiv:1711.09883."},{"issue":"6","key":"9575_CR51","doi-asserted-by":"publisher","first-page":"1027","DOI":"10.1016\/j.conb.2012.06.001","volume":"22","author":"DJ Levy","year":"2012","unstructured":"Levy, D. J., & Glimcher, P. W. (2012). The root of all value: A neural common currency for choice. Current Opinion in Neurobiology, 22(6), 1027\u20131038.","journal-title":"Current Opinion in Neurobiology"},{"key":"9575_CR52","doi-asserted-by":"publisher","first-page":"49","DOI":"10.1016\/j.pbb.2013.06.011","volume":"119","author":"TM Love","year":"2014","unstructured":"Love, T. M. (2014). Oxytocin, motivation and the role of dopamine. Pharmacology, Biochemistry and Behavior, 119, 49\u201360.","journal-title":"Pharmacology, Biochemistry and Behavior"},{"issue":"1","key":"9575_CR53","doi-asserted-by":"publisher","first-page":"316","DOI":"10.1093\/icb\/icab019","volume":"61","author":"M Macedo-Lima","year":"2021","unstructured":"Macedo-Lima, M., & Remage-Healey, L. (2021). Dopamine modulation of motor and sensory cortical plasticity among vertebrates. Integrative and Comparative Biology, 61(1), 316\u2013336.","journal-title":"Integrative and Comparative Biology"},{"issue":"6","key":"9575_CR54","doi-asserted-by":"publisher","first-page":"972","DOI":"10.1037\/rev0000199","volume":"127","author":"JA Mollick","year":"2020","unstructured":"Mollick, J. A., Hazy, T. E., Krueger, K. A., Nair, A., Mackie, P., Herd, S. A., & O\u2019Reilly, R. C. (2020). A systems-neuroscience model of phasic dopamine. Psychological Review, 127(6), 972.","journal-title":"Psychological Review"},{"issue":"7438","key":"9575_CR55","doi-asserted-by":"publisher","first-page":"472","DOI":"10.1038\/nature11905","volume":"494","author":"Y Oka","year":"2013","unstructured":"Oka, Y., Butnaru, M., von Buchholtz, L., Ryba, N. J., & Zuker, C. S. (2013). High salt recruits aversive taste pathways. Nature, 494(7438), 472\u2013475.","journal-title":"Nature"},{"key":"9575_CR56","unstructured":"Omohundro, S. M. (2008). The basic AI drives. In AGI (Vol. 171, pp. 483\u2013492)."},{"key":"9575_CR57","first-page":"6","volume":"1","author":"PY Oudeyer","year":"2009","unstructured":"Oudeyer, P. Y., & Kaplan, F. (2009). What is intrinsic motivation? A typology of computational approaches. Frontiers in Neurorobotics, 1, 6.","journal-title":"Frontiers in Neurorobotics"},{"key":"9575_CR58","unstructured":"Ouellette, S. (2021). Reward is enough \u2013 but not efficient. https:\/\/www.linkedin.com\/pulse\/reward-enough-efficient-simon-ouellette\/."},{"issue":"3","key":"9575_CR59","doi-asserted-by":"publisher","first-page":"657","DOI":"10.1007\/s10071-015-0834-8","volume":"18","author":"A Perret","year":"2015","unstructured":"Perret, A., Henry, L., Coulon, M., Caudal, J. P., Richard, J. P., Cousillas, H., et al. (2015). Social visual contact, a primary \u201cdrive\u2019\u2019 for social animals? Animal Cognition, 18(3), 657\u2013666.","journal-title":"Animal Cognition"},{"issue":"1","key":"9575_CR60","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/s10458-019-09433-x","volume":"34","author":"R R\u0103dulescu","year":"2020","unstructured":"R\u0103dulescu, R., Mannion, P., Roijers, D. M., & Now\u00e9, A. (2020). Multi-objective multi-agent decision making: A utility-based analysis and survey. Autonomous Agents and Multi-Agent Systems, 34(1), 1\u201352.","journal-title":"Autonomous Agents and Multi-Agent Systems"},{"key":"9575_CR61","doi-asserted-by":"crossref","unstructured":"R\u0103dulescu, R., Mannion, P., Zhang, Y., Roijers, D. M., & Now\u00e9, A. (2020). A utility-based analysis of equilibria in multi-objective normal-form games. The Knowledge Engineering Review,35.","DOI":"10.1017\/S0269888920000351"},{"key":"9575_CR62","doi-asserted-by":"publisher","first-page":"67","DOI":"10.1613\/jair.3987","volume":"48","author":"DM Roijers","year":"2013","unstructured":"Roijers, D. M., Vamplew, P., Whiteson, S., & Dazeley, R. (2013). A survey of multi-objective sequential decision-making. Journal of Artificial Intelligence Research, 48, 67\u2013113.","journal-title":"Journal of Artificial Intelligence Research"},{"issue":"1","key":"9575_CR63","doi-asserted-by":"publisher","first-page":"1","DOI":"10.1007\/978-3-031-01576-2","volume":"11","author":"DM Roijers","year":"2017","unstructured":"Roijers, D. M., & Whiteson, S. (2017). Multi-objective decision making. Synthesis lectures on artificial intelligence and machine learning, 11(1), 1\u2013129.","journal-title":"Synthesis lectures on artificial intelligence and machine learning"},{"key":"9575_CR64","unstructured":"Roitblat, H. (2021). Building artificial intelligence: Reward is not enough. https:\/\/bdtechtalks.com\/2021\/07\/07\/ai-reward-is-not-enough-herbert-roitblat\/."},{"issue":"48","key":"9575_CR65","doi-asserted-by":"publisher","first-page":"15988","DOI":"10.1523\/JNEUROSCI.3192-14.2014","volume":"34","author":"S Rudorf","year":"2014","unstructured":"Rudorf, S., & Hare, T. A. (2014). Interactions between dorsolateral and ventromedial prefrontal cortex underlie context-dependent stimulus valuation in goal-directed choice. Journal of Neuroscience, 34(48), 15988\u201315996.","journal-title":"Journal of Neuroscience"},{"key":"9575_CR66","unstructured":"Schaul, T., Horgan, D., Gregor, K., & Silver, D. (2015). Universal value function approximators. In International conference on machine learning (pp. 1312\u20131320)."},{"issue":"5306","key":"9575_CR67","doi-asserted-by":"publisher","first-page":"1593","DOI":"10.1126\/science.275.5306.1593","volume":"275","author":"W Schultz","year":"1997","unstructured":"Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593\u20131599.","journal-title":"Science"},{"issue":"3","key":"9575_CR68","doi-asserted-by":"publisher","first-page":"230","DOI":"10.1016\/S0092-6566(03)00069-2","volume":"38","author":"SH Schwartz","year":"2004","unstructured":"Schwartz, S. H., & Boehnke, K. (2004). Evaluating the structure of human values with confirmatory factor analysis. Journal of Research in Personality, 38(3), 230\u2013255. https:\/\/doi.org\/10.1016\/S0092-6566(03)00069-2.","journal-title":"Journal of Research in Personality"},{"key":"9575_CR69","unstructured":"Shead, S. (2021). Computer scientists are questioning whether Alphabet\u2019s DeepMind will ever make A.I. more human-like. https:\/\/www.cnbc.com\/2021\/06\/18\/computer-scientists-ask-if-deepmind-can-ever-make-ai-human-like.html."},{"key":"9575_CR70","doi-asserted-by":"crossref","unstructured":"Silver, D., Singh, S., Precup, D., & Sutton, R. S. (2021). Reward is enough. Artificial Intelligence, pp. 103535.","DOI":"10.1016\/j.artint.2021.103535"},{"issue":"2","key":"9575_CR71","doi-asserted-by":"publisher","first-page":"70","DOI":"10.1109\/TAMD.2010.2051031","volume":"2","author":"S Singh","year":"2010","unstructured":"Singh, S., Lewis, R. L., Barto, A. G., & Sorg, J. (2010). Intrinsically motivated reinforcement learning: An evolutionary perspective. IEEE Transactions on Autonomous Mental Development, 2(2), 70\u201382.","journal-title":"IEEE Transactions on Autonomous Mental Development"},{"key":"9575_CR72","unstructured":"Smith, B. J., & Read, S. J. (forthcoming). Modeling incentive salience in pavlovian learning more parsimoniously using a multiple attribute model. Cognitive Affective Behavioral Neuroscience."},{"key":"9575_CR73","unstructured":"Taylor, J. (2016). Quantilizers: A safer alternative to maximizers for limited optimization. In: AAAI Workshop: AI, Ethics, and Society."},{"issue":"4","key":"9575_CR74","doi-asserted-by":"publisher","first-page":"697","DOI":"10.3945\/ajcn.114.097543","volume":"101","author":"JM Thomas","year":"2015","unstructured":"Thomas, J. M., Higgs, S., Dourish, C. T., Hansen, P. C., Harmer, C. J., & McCabe, C. (2015). Satiation attenuates bold activity in brain regions involved in reward and increases activity in dorsolateral prefrontal cortex: an fmri study in healthy volunteers. The American Journal of Clinical Nutrition, 101(4), 697\u2013704.","journal-title":"The American Journal of Clinical Nutrition"},{"key":"9575_CR75","doi-asserted-by":"crossref","unstructured":"Triantaphyllou, E. (2000). Multi-criteria decision making methods. In Multi-criteria decision making methods: A comparative study (pp. 5\u201321). Springer.","DOI":"10.1007\/978-1-4757-3157-6_2"},{"issue":"10","key":"9575_CR76","doi-asserted-by":"publisher","first-page":"1447","DOI":"10.1016\/j.neunet.2008.09.013","volume":"21","author":"E Uchibe","year":"2008","unstructured":"Uchibe, E., & Doya, K. (2008). Finding intrinsic rewards by embodied evolution and constrained reinforcement learning. Neural Networks, 21(10), 1447\u20131455.","journal-title":"Neural Networks"},{"issue":"1","key":"9575_CR77","doi-asserted-by":"publisher","first-page":"27","DOI":"10.1007\/s10676-017-9440-6","volume":"20","author":"P Vamplew","year":"2018","unstructured":"Vamplew, P., Dazeley, R., Foale, C., Firmin, S., & Mummery, J. (2018). Human-aligned artificial intelligence is a multiobjective problem. Ethics and Information Technology, 20(1), 27\u201340.","journal-title":"Ethics and Information Technology"},{"key":"9575_CR78","doi-asserted-by":"publisher","first-page":"26","DOI":"10.1016\/j.neucom.2016.08.152","volume":"263","author":"P Vamplew","year":"2017","unstructured":"Vamplew, P., Issabekov, R., Dazeley, R., Foale, C., Berry, A., Moore, T., & Creighton, D. (2017). Steering approaches to pareto-optimal multiobjective reinforcement learning. Neurocomputing, 263, 26\u201338.","journal-title":"Neurocomputing"},{"key":"9575_CR79","doi-asserted-by":"crossref","unstructured":"Vamplew, P., Yearwood, J., Dazeley, R., & Berry, A. (2008). On the limitations of scalarisation for multi-objective reinforcement learning of pareto fronts. In Australasian joint conference on artificial intelligence (pp. 372\u2013378). Springer.","DOI":"10.1007\/978-3-540-89378-3_37"},{"issue":"2","key":"9575_CR80","first-page":"56","volume":"10","author":"M Velasquez","year":"2013","unstructured":"Velasquez, M., & Hester, P. T. (2013). An analysis of multi-criteria decision making methods. International Journal of Operations Research, 10(2), 56\u201366.","journal-title":"International Journal of Operations Research"},{"key":"9575_CR81","doi-asserted-by":"publisher","first-page":"225","DOI":"10.1016\/j.neunet.2012.11.008","volume":"41","author":"J Weng","year":"2013","unstructured":"Weng, J., Paslaski, S., Daly, J., VanDam, C., & Brown, J. (2013). Modulation for emergent networks: Serotonin and dopamine. Neural Networks, 41, 225\u2013239.","journal-title":"Neural Networks"},{"issue":"4","key":"9575_CR82","doi-asserted-by":"publisher","first-page":"661","DOI":"10.1037\/0735-7044.98.4.661","volume":"98","author":"G Wolf","year":"1984","unstructured":"Wolf, G., Schulkin, J., & Simson, P. E. (1984). Multiple factors in the satiation of salt appetite. Behavioral Neuroscience, 98(4), 661.","journal-title":"Behavioral Neuroscience"},{"key":"9575_CR83","doi-asserted-by":"crossref","unstructured":"Yates, C., Christopher, R., & Tumer, K. (2020). Multi-fitness learning for behavior-driven cooperation. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference (pp. 453\u2013461).","DOI":"10.1145\/3377930.3390220"}],"container-title":["Autonomous Agents and Multi-Agent Systems"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-022-09575-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s10458-022-09575-5\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s10458-022-09575-5.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2022,10,29]],"date-time":"2022-10-29T12:54:07Z","timestamp":1667048047000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s10458-022-09575-5"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,7,16]]},"references-count":83,"journal-issue":{"issue":"2","published-print":{"date-parts":[[2022,10]]}},"alternative-id":["9575"],"URL":"https:\/\/doi.org\/10.1007\/s10458-022-09575-5","relation":{},"ISSN":["1387-2532","1573-7454"],"issn-type":[{"value":"1387-2532","type":"print"},{"value":"1573-7454","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,7,16]]},"assertion":[{"value":"2 July 2022","order":1,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"16 July 2022","order":2,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}}],"article-number":"41"}}