{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2024,8,12]],"date-time":"2024-08-12T14:31:17Z","timestamp":1723473077643},"reference-count":73,"publisher":"Springer Science and Business Media LLC","issue":"25","license":[{"start":{"date-parts":[[2022,1,12]],"date-time":"2022-01-12T00:00:00Z","timestamp":1641945600000},"content-version":"tdm","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"},{"start":{"date-parts":[[2022,1,12]],"date-time":"2022-01-12T00:00:00Z","timestamp":1641945600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0"}],"funder":[{"DOI":"10.13039\/501100001778","name":"Deakin University","doi-asserted-by":"crossref","id":[{"id":"10.13039\/501100001778","id-type":"DOI","asserted-by":"crossref"}]}],"content-domain":{"domain":["link.springer.com"],"crossmark-restriction":false},"short-container-title":["Neural Comput & Applic"],"published-print":{"date-parts":[[2023,9]]},"abstract":"Abstract<\/jats:title>Interactive reinforcement learning proposes the use of externally sourced information in order to speed up the learning process. When interacting with a learner agent, humans may provide either evaluative or informative advice. Prior research has focused on the effect of human-sourced advice by including real-time feedback on the interactive reinforcement learning process, specifically aiming to improve the learning speed of the agent, while minimising the time demands on the human. This work focuses on answering which of two approaches, evaluative or informative, is the preferred instructional approach for humans. Moreover, this work presents an experimental setup for a human trial designed to compare the methods people use to deliver advice in terms of human engagement. The results obtained show that users giving informative advice to the learner agents provide more accurate advice, are willing to assist the learner agent for a longer time, and provide more advice per episode. Additionally, self-evaluation from participants using the informative approach has indicated that the agent\u2019s ability to follow the advice is higher, and therefore, they feel their own advice to be of higher accuracy when compared to people providing evaluative advice.<\/jats:p>","DOI":"10.1007\/s00521-021-06850-6","type":"journal-article","created":{"date-parts":[[2022,1,12]],"date-time":"2022-01-12T05:02:52Z","timestamp":1641963772000},"page":"18215-18230","update-policy":"http:\/\/dx.doi.org\/10.1007\/springer_crossmark_policy","source":"Crossref","is-referenced-by-count":8,"title":["Human engagement providing evaluative and informative advice for interactive reinforcement learning"],"prefix":"10.1007","volume":"35","author":[{"given":"Adam","family":"Bignold","sequence":"first","affiliation":[]},{"ORCID":"http:\/\/orcid.org\/0000-0002-1131-3382","authenticated-orcid":false,"given":"Francisco","family":"Cruz","sequence":"additional","affiliation":[]},{"given":"Richard","family":"Dazeley","sequence":"additional","affiliation":[]},{"given":"Peter","family":"Vamplew","sequence":"additional","affiliation":[]},{"given":"Cameron","family":"Foale","sequence":"additional","affiliation":[]}],"member":"297","published-online":{"date-parts":[[2022,1,12]]},"reference":[{"key":"6850_CR1","volume-title":"Reinforcement learning: an introduction","author":"RS Sutton","year":"2018","unstructured":"Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge"},{"key":"6850_CR2","doi-asserted-by":"publisher","first-page":"237","DOI":"10.1613\/jair.301","volume":"4","author":"LP Kaelbling","year":"1996","unstructured":"Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237\u2013285","journal-title":"J Artif Intell Res"},{"issue":"2","key":"6850_CR3","doi-asserted-by":"publisher","first-page":"153","DOI":"10.1016\/S0925-5273(00)00156-0","volume":"78","author":"I Giannoccaro","year":"2002","unstructured":"Giannoccaro I, Pontrandolfo P (2002) Inventory management in supply chains: a reinforcement learning approach. Int J Prod Econ 78(2):153\u2013161","journal-title":"Int J Prod Econ"},{"key":"6850_CR4","doi-asserted-by":"publisher","first-page":"100677","DOI":"10.1109\/ACCESS.2021.3096662","volume":"9","author":"K Lepenioti","year":"2021","unstructured":"Lepenioti K, Bousdekis A, Apostolou D, Mentzas G (2021) Human-augmented prescriptive analytics with interactive multi-objective reinforcement learning. IEEE Access 9:100677\u2013100693","journal-title":"IEEE Access"},{"key":"6850_CR5","doi-asserted-by":"publisher","first-page":"107496","DOI":"10.1016\/j.compchemeng.2021.107496","volume":"155","author":"D Machalek","year":"2021","unstructured":"Machalek D, Quah T, Powell KM (2021) A novel implicit hybrid machine learning model and its application for reinforcement learning. Comput Chem Eng 155:107496","journal-title":"Comput Chem Eng"},{"key":"6850_CR6","doi-asserted-by":"crossref","unstructured":"Cruz F, Acu\u00f1a G, Cubillos F, Moreno V, Bassi D (2007) Indirect training of grey-box models: application to a bioprocess. In: International symposium on neural networks. Springer, pp 391\u2013397","DOI":"10.1007\/978-3-540-72393-6_47"},{"issue":"1","key":"6850_CR7","first-page":"73","volume":"18","author":"H Kitano","year":"1997","unstructured":"Kitano H, Asada M, Kuniyoshi Y, Noda I, Osawa E, Matsubara H (1997) RoboCup: a challenge problem for AI. AI Mag 18(1):73","journal-title":"AI Mag"},{"key":"6850_CR8","unstructured":"Churamani N, Cruz F, Griffiths S, Barros P (2016) iCub: learning emotion expressions using human reward. In: Proceedings of the workshop on bio-inspired social robot learning in home scenarios. IEEE\/RSJ IROS, p\u00a02"},{"key":"6850_CR9","unstructured":"Lee K, Smith LM, Abbeel P (2021) PEBBLE: feedback-efficient interactive reinforcement learning via relabeling experience and unsupervised pre-training. In: Proceedings of the 38th international conference on machine learning. PMLR, pp 6152\u20136163"},{"issue":"2","key":"6850_CR10","doi-asserted-by":"publisher","first-page":"215","DOI":"10.1162\/neco.1994.6.2.215","volume":"6","author":"G Tesauro","year":"1994","unstructured":"Tesauro G (1994) TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput 6(2):215\u2013219","journal-title":"Neural Comput"},{"key":"6850_CR11","doi-asserted-by":"crossref","unstructured":"Barros P, Tanevska A, Cruz F, Sciutti A (2020) Moody learners-explaining competitive behaviour of reinforcement learning agents. In: 2020 joint IEEE 10th international conference on development and learning and epigenetic robotics (ICDL-EpiRob). IEEE, pp 1\u20138","DOI":"10.1109\/ICDL-EpiRob48136.2020.9278125"},{"key":"6850_CR12","unstructured":"Mankowitz DJ, Dulac-Arnold G, Hester T (2019) Challenges of real-world reinforcement learning. In: ICML workshop on real-life reinforcement learning, p 14"},{"key":"6850_CR13","doi-asserted-by":"crossref","unstructured":"Cruz F, W\u00fcppen P, Fazrie A, Weber C, Wermter S (2018) Action selection methods in a robotic reinforcement learning scenario. In: 2018 IEEE Latin American conference on computational intelligence (LA-CCI). IEEE, pp 13\u201318","DOI":"10.1109\/LA-CCI.2018.8625243"},{"key":"6850_CR14","unstructured":"Thomaz AL, Hoffman G, Breazeal C (2005) Real-time interactive reinforcement learning for robots. In: Proceedings of association for the advancement of artificial intelligence conference AAAI, workshop on human comprehensible machine learning, pp 9\u201313"},{"issue":"4","key":"6850_CR15","doi-asserted-by":"publisher","first-page":"337","DOI":"10.1109\/THMS.2019.2912447","volume":"49","author":"G Li","year":"2019","unstructured":"Li G, Gomez R, Nakamura K, He B (2019) Human-centered reinforcement learning: a survey. IEEE Trans Hum Mach Syst 49(4):337\u2013349","journal-title":"IEEE Trans Hum Mach Syst"},{"key":"6850_CR16","doi-asserted-by":"publisher","first-page":"139","DOI":"10.3389\/fnbeh.2013.00139","volume":"7","author":"G Brod","year":"2013","unstructured":"Brod G, Werkle-Bergner M, Shing YL (2013) The influence of prior knowledge on memory: a developmental cognitive neuroscience perspective. Front Behav Neurosci 7:139","journal-title":"Front Behav Neurosci"},{"key":"6850_CR17","unstructured":"Subramanian K, Isbell\u00a0Jr CL, Thomaz AL (2016) Exploration from demonstration for interactive reinforcement learning. In: Proceedings of the 2016 international conference on autonomous agents & multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, pp 447\u2013456"},{"issue":"1","key":"6850_CR18","doi-asserted-by":"publisher","first-page":"13","DOI":"10.3390\/biomimetics6010013","volume":"6","author":"A Bignold","year":"2021","unstructured":"Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2021) An evaluation methodology for interactive reinforcement learning with simulated users. Biomimetics 6(1):13","journal-title":"Biomimetics"},{"key":"6850_CR19","doi-asserted-by":"crossref","unstructured":"Arzate\u00a0Cruz C, Igarashi T (2020) A survey on interactive reinforcement learning: design principles and open challenges. In: Proceedings of the 2020 ACM designing interactive systems conference, pp 1195\u20131209","DOI":"10.1145\/3357236.3395525"},{"key":"6850_CR20","first-page":"1","volume":"56","author":"A Bignold","year":"2021","unstructured":"Bignold A, Cruz F, Taylor ME, Brys T, Dazeley R, Vamplew P, Foale C (2021) A conceptual framework for externally-influenced agents: an assisted reinforcement learning review. J Amb Intell Hum Comput 56:1\u201324","journal-title":"J Amb Intell Hum Comput"},{"issue":"4","key":"6850_CR21","first-page":"105","volume":"35","author":"S Amershi","year":"2014","unstructured":"Amershi S, Cakmak M, Knox WB, Kulesza T (2014) Power to the people: the role of humans in interactive machine learning. AI Mag 35(4):105\u2013120","journal-title":"AI Mag"},{"issue":"6\u20137","key":"6850_CR22","doi-asserted-by":"publisher","first-page":"716","DOI":"10.1016\/j.artint.2007.09.009","volume":"172","author":"AL Thomaz","year":"2008","unstructured":"Thomaz AL, Breazeal C (2008) Teachable robots: understanding human teaching behavior to build more effective robot learners. Artif Intell 172(6\u20137):716\u2013737","journal-title":"Artif Intell"},{"key":"6850_CR23","doi-asserted-by":"crossref","unstructured":"Cakmak M, Thomaz AL (2010) Optimality of human teachers for robot learners. In: 2010 IEEE 9th international conference on development and learning (ICDL). IEEE, pp 64\u201369","DOI":"10.1109\/DEVLRN.2010.5578865"},{"key":"6850_CR24","first-page":"1","volume":"89","author":"A Bignold","year":"2021","unstructured":"Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2021) Persistent rule-based interactive reinforcement learning. Neural Comput Appl 89:1\u201318","journal-title":"Neural Comput Appl"},{"issue":"16","key":"6850_CR25","doi-asserted-by":"publisher","first-page":"5574","DOI":"10.3390\/app10165574","volume":"10","author":"I Moreira","year":"2020","unstructured":"Moreira I, Rivas J, Cruz F, Dazeley R, Ayala A, Fernandes B (2020) Deep reinforcement learning with interactive feedback in a human-robot environment. Appl Sci 10(16):5574","journal-title":"Appl Sci"},{"key":"6850_CR26","doi-asserted-by":"crossref","unstructured":"Cruz F, Parisi GI, Wermter S (2018) Multi-modal feedback for affordance-driven interactive reinforcement learning. In: 2018 International joint conference on neural networks (IJCNN). IEEE, pp 1\u20138","DOI":"10.1109\/IJCNN.2018.8489237"},{"key":"6850_CR27","first-page":"1041","volume":"7","author":"M Sharma","year":"2007","unstructured":"Sharma M, Holmes MP, Santamar\u00eda JC, Irani A, Isbell CL Jr, Ram A (2007) Transfer learning in real-time strategy games using hybrid CBR\/RL. IJCAI 7:1041\u20131046","journal-title":"IJCAI"},{"issue":"Sep","key":"6850_CR28","first-page":"2125","volume":"8","author":"ME Taylor","year":"2007","unstructured":"Taylor ME, Stone P, Liu Y (2007) Transfer learning via inter-task mappings for temporal difference learning. J Mach Learn Res 8(Sep):2125\u20132167","journal-title":"J Mach Learn Res"},{"key":"6850_CR29","doi-asserted-by":"crossref","unstructured":"Shin YS, Niv Y (2020) Biased evaluations emerge from inferring hidden causes. PsyArXiv preprint psyarxiv:10.31234","DOI":"10.31234\/osf.io\/tkhwn"},{"key":"6850_CR30","unstructured":"Grzes M (2017) Reward shaping in episodic reinforcement learning. In: Proceedings of the sixteenth international conference on autonomous agents and multiagent systems (AAMAS 2017). ACM, pp 565\u2013573"},{"key":"6850_CR31","doi-asserted-by":"crossref","unstructured":"Marom O, Rosman BS (2018) Belief reward shaping in reinforcement learning. In: Proceedings of the thirty-second AAAI conference on artificial intelligence. AAAI, pp 3762\u20133769","DOI":"10.1609\/aaai.v32i1.11741"},{"key":"6850_CR32","doi-asserted-by":"crossref","unstructured":"Mill\u00e1n-Arias C, Fernandes B, Cruz F, Dazeley R, Fernandes S (2020) A robust approach for continuous interactive reinforcement learning. In: Proceedings of the 8th international conference on human-agent interaction, pp 278\u2013280","DOI":"10.1145\/3406499.3418769"},{"key":"6850_CR33","doi-asserted-by":"publisher","first-page":"104242","DOI":"10.1109\/ACCESS.2021.3099071","volume":"9","author":"C Mill\u00e1n-Arias","year":"2021","unstructured":"Mill\u00e1n-Arias C, Fernandes B, Cruz F, Dazeley R, Fernandes S (2021) A robust approach for continuous interactive actor-critic algorithms. IEEE Access 9:104242\u2013104260","journal-title":"IEEE Access"},{"key":"6850_CR34","first-page":"1","volume":"2016","author":"P Shah","year":"2016","unstructured":"Shah P, Hakkani-Tur D, Heck L (2016) Interactive reinforcement learning for task-oriented dialogue management. Workshop on deep learning for action and interaction. Adv Neural Inf Process Syst 2016:1\u201311","journal-title":"Adv Neural Inf Process Syst"},{"key":"6850_CR35","volume-title":"Deep learning","author":"I Goodfellow","year":"2016","unstructured":"Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning, vol 1. MIT Press, Cambridge"},{"issue":"7553","key":"6850_CR36","doi-asserted-by":"publisher","first-page":"436","DOI":"10.1038\/nature14539","volume":"521","author":"Y LeCun","year":"2015","unstructured":"LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436","journal-title":"Nature"},{"key":"6850_CR37","unstructured":"Kang B, Compton P, Preston P (1995) Multiple classification ripple down rules: evaluation and possibilities. In: Proceedings 9th Banff knowledge acquisition for knowledge based systems workshop, vol\u00a01, pp\u00a017\u201321"},{"key":"6850_CR38","unstructured":"Compton P, Edwards G, Kang B, Lazarus L, Malor R, Menzies T, Preston P, Srinivasan A, Sammut C (1991) Ripple down rules: possibilities and limitations. In: Proceedings of the sixth AAAI knowledge acquisition for knowledge-based systems workshop, Calgary, Canada, University of Calgary, pp 6\u20131"},{"issue":"Jul","key":"6850_CR39","first-page":"1633","volume":"10","author":"ME Taylor","year":"2009","unstructured":"Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10(Jul):1633\u20131685","journal-title":"J Mach Learn Res"},{"key":"6850_CR40","doi-asserted-by":"crossref","unstructured":"Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Machine learning proceedings 1994. Elsevier, pp\u00a0157\u2013163","DOI":"10.1016\/B978-1-55860-335-6.50027-1"},{"key":"6850_CR41","doi-asserted-by":"crossref","unstructured":"Tan M (1993) Multi-agent reinforcement learning: independent versus cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330\u2013337","DOI":"10.1016\/B978-1-55860-307-3.50049-6"},{"key":"6850_CR42","unstructured":"Cruz F, Parisi GI, Wermter S (2016) Learning contextual affordances with an associative neural architecture. In: Proceedings of the European symposium on artificial neural network. Computational intelligence and machine learning ESANN, UCLouvain, pp\u00a0665\u2013670"},{"key":"6850_CR43","doi-asserted-by":"crossref","unstructured":"Argall B, Browning B, Veloso M (2007) Learning by demonstration with critique from a human teacher. In: Proceedings of the ACM\/IEEE international conference on human-robot interaction. ACM, pp 57\u201364","DOI":"10.1145\/1228716.1228725"},{"key":"6850_CR44","unstructured":"Mill\u00e1n C, Fernandes B, Cruz F (2019) Human feedback in continuous actor-critic reinforcement learning. In: Proceedings of the European symposium on artificial neural networks, computational intelligence and machine learning ESANN. ESANN, pp 661\u2013666"},{"key":"6850_CR45","doi-asserted-by":"crossref","unstructured":"Ayala A, Henr\u00edquez C, Cruz F (2019) Reinforcement learning using continuous states and interactive feedback. In: Proceedings of the international conference on applications of intelligent systems, pp 1\u20135","DOI":"10.1145\/3309772.3309801"},{"key":"6850_CR46","doi-asserted-by":"crossref","unstructured":"Thomaz AL, Breazeal C (2007) Asymmetric interpretations of positive and negative human feedback for a social learning agent. In: The 16th IEEE international symposium on robot and human interactive communication, 2007. RO-MAN 2007. IEEE, pp 720\u2013725","DOI":"10.1109\/ROMAN.2007.4415180"},{"key":"6850_CR47","unstructured":"Ng AY, Harada D, Russell S (1999) Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the international conference on machine learning ICML, vol 99, pp 278\u2013287"},{"key":"6850_CR48","doi-asserted-by":"crossref","unstructured":"Brys T, Now\u00e9 A, Kudenko D, Taylor ME (2014) Combining multiple correlated reward and shaping signals by measuring confidence. In: Proceedings of the association for the advancement of artificial intelligence conference. AAAI, pp 1687\u20131693","DOI":"10.1609\/aaai.v28i1.8998"},{"key":"6850_CR49","doi-asserted-by":"crossref","unstructured":"Marthi B (2007) Automatic shaping and decomposition of reward functions. In: Proceedings of the international conference on machine learning ICML. ACM, pp 601\u2013608","DOI":"10.1145\/1273496.1273572"},{"key":"6850_CR50","doi-asserted-by":"crossref","unstructured":"Rosman B, Ramamoorthy S (2014) Giving advice to agents with hidden goals. In: 2014 IEEE international conference on robotics and automation (ICRA). IEEE, pp 1959\u20131964","DOI":"10.1109\/ICRA.2014.6907118"},{"key":"6850_CR51","unstructured":"Huang J, Juan R, Gomez R, Nakamura K, Sha Q, He B, Li G (2021) Gan-based interactive reinforcement learning from demonstration and human evaluative feedback. arXiv preprint arXiv:2104.06600"},{"key":"6850_CR52","unstructured":"Knox WB, Stone P (2010) Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems, vol 1. International Foundation for Autonomous Agents and Multiagent Systems, pp\u00a05\u201312"},{"key":"6850_CR53","doi-asserted-by":"crossref","unstructured":"Knox WB, Stone P (2009) Interactively shaping agents via human reinforcement: the TAMER framework. In: Proceedings of the fifth international conference on knowledge capture. ACM, pp 9\u201316","DOI":"10.1145\/1597735.1597738"},{"key":"6850_CR54","unstructured":"MacGlashan J, Ho MK, Loftin R, Peng B, Wang G, Roberts DL, Taylor ME, Littman ML (2017) Interactive learning from policy-dependent human feedback. In: Proceedings of the 34th international conference on machine learning, vol 70, pp 2285\u20132294"},{"key":"6850_CR55","unstructured":"Arumugam D, Lee JK, Saskin S, Littman ML (2019) Deep reinforcement learning from policy-dependent human feedback. arXiv preprint arXiv:1902.04257"},{"key":"6850_CR56","unstructured":"Kessler\u00a0Faulkner T, Gutierrez RA, Short ES, Hoffman G, Thomaz AL (2019) Active attention-modified policy shaping: socially interactive agents track. In: Proceedings of the international conference on autonomous agents and multiagent systems AAMAS. International Foundation for Autonomous Agents and Multiagent Systems, pp 728\u2013736"},{"key":"6850_CR57","unstructured":"Behboudian P, Satsangi Y, Taylor ME, Harutyunyan A, Bowling M (2020) Useful policy invariant shaping from arbitrary advice. In: AAMAS adaptive and learning agents workshop ALA 2020, p\u00a09"},{"key":"6850_CR58","doi-asserted-by":"publisher","first-page":"120757","DOI":"10.1109\/ACCESS.2020.3006254","volume":"8","author":"J Lin","year":"2020","unstructured":"Lin J, Ma Z, Gomez R, Nakamura K, He B, Li G (2020) A review on interactive reinforcement learning from human social feedback. IEEE Access 8:120757\u2013120765","journal-title":"IEEE Access"},{"key":"6850_CR59","doi-asserted-by":"crossref","unstructured":"Cruz F, W\u00fcppen P, Magg S, Fazrie A, Wermter S (2017) Agent-advising approaches in an interactive reinforcement learning scenario. In: 2017 joint IEEE international conference on development and learning and epigenetic robotics (ICDL-EpiRob). IEEE, pp\u00a0209\u2013214","DOI":"10.1109\/DEVLRN.2017.8329809"},{"key":"6850_CR60","doi-asserted-by":"crossref","unstructured":"Grizou J, Lopes M, Oudeyer P-Y (2013) Robot learning simultaneously a task and how to interpret human instructions. In: Proceedings of the joint IEEE international conference on development and learning and epigenetic robotics ICDL-EpiRob. IEEE, pp 1\u20138","DOI":"10.1109\/DevLrn.2013.6652523"},{"key":"6850_CR61","unstructured":"Griffith S, Subramanian K, Scholz J, Isbell C, Thomaz AL (2013) Policy shaping: integrating human feedback with reinforcement learning. In: Advances in neural information processing systems, pp 2625\u20132633"},{"key":"6850_CR62","unstructured":"Pilarski PM, Sutton RS (2012) Between instruction and reward: human-prompted switching. In: AAAI fall symposium series: robots learning interactively from human teachers, pp\u00a045\u201352"},{"key":"6850_CR63","unstructured":"Amir O, Kamar E, Kolobov A, Grosz B (2016) Interactive teaching strategies for agent training. In: Proceedings of the international joint conference on artificial intelligence IJCAI, pp\u00a0804\u2013811"},{"issue":"3","key":"6850_CR64","doi-asserted-by":"publisher","first-page":"306","DOI":"10.1080\/09540091.2018.1443318","volume":"30","author":"F Cruz","year":"2018","unstructured":"Cruz F, Magg S, Nagai Y, Wermter S (2018) Improving interactive reinforcement learning: what makes a good teacher? Connect Sci 30(3):306\u2013325","journal-title":"Connect Sci"},{"key":"6850_CR65","unstructured":"Kamar E, Hacker S, Horvitz E (2012) Combining human and machine intelligence in large-scale crowdsourcing. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems. International Foundation for Autonomous Agents and Multiagent Systems, vol 1, pp 467\u2013474"},{"key":"6850_CR66","doi-asserted-by":"crossref","unstructured":"Knox WB, Stone P (2012) Reinforcement learning from human reward: discounting in episodic tasks. In: 2012 IEEE RO-MAN: the 21st IEEE international symposium on robot and human interactive communication. IEEE, pp\u00a0878\u2013885","DOI":"10.1109\/ROMAN.2012.6343862"},{"key":"6850_CR67","doi-asserted-by":"crossref","unstructured":"Knox WB, Stone P (2013) Learning non-myopically from human-generated reward. In: Proceedings of the 2013 international conference on intelligent user interfaces. ACM, pp\u00a0191\u2013202","DOI":"10.1145\/2449396.2449422"},{"issue":"2","key":"6850_CR68","doi-asserted-by":"publisher","first-page":"108","DOI":"10.1109\/TAMD.2010.2051030","volume":"2","author":"M Cakmak","year":"2010","unstructured":"Cakmak M, Chao C, Thomaz AL (2010) Designing interactions for robot active learners. IEEE Trans Auton Ment Dev 2(2):108\u2013118","journal-title":"IEEE Trans Auton Ment Dev"},{"key":"6850_CR69","first-page":"369","volume":"11","author":"A Guillory","year":"2011","unstructured":"Guillory A, Bilmes JA (2011) Simultaneous learning and covering with adversarial noise. ICML 11:369\u2013376","journal-title":"ICML"},{"key":"6850_CR70","unstructured":"Guillory A, Bilmes JA (2011) Online submodular set cover, ranking, and repeated active learning. In: Advances in neural information processing systems, pp\u00a01107\u20131115"},{"key":"6850_CR71","doi-asserted-by":"crossref","unstructured":"Moore AW, Birnbaum L, Collins G (1991) Variable resolution dynamic programming: efficiently learning action maps in multivariate real-valued state-spaces. In: Proceedings of the eighth international conference on machine learning, pp\u00a0333\u2013337","DOI":"10.1016\/B978-1-55860-200-7.50069-6"},{"key":"6850_CR72","doi-asserted-by":"crossref","unstructured":"Kessler\u00a0Faulkner TA, Thomaz A (2021) Interactive reinforcement learning from imperfect teachers. In: Companion of the 2021 ACM\/IEEE international conference on human-robot interaction, pp\u00a0577\u2013579","DOI":"10.1145\/3434074.3446361"},{"key":"6850_CR73","doi-asserted-by":"crossref","unstructured":"Fern\u00e1ndez F, Veloso M (2006) Probabilistic policy reuse in a reinforcement learning agent. In: Proceedings of the fifth international joint conference on autonomous agents and multi-agent systems. ACM, pp\u00a0720\u2013727","DOI":"10.1145\/1160633.1160762"}],"container-title":["Neural Computing and Applications"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-021-06850-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/article\/10.1007\/s00521-021-06850-6\/fulltext.html","content-type":"text\/html","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/link.springer.com\/content\/pdf\/10.1007\/s00521-021-06850-6.pdf","content-type":"application\/pdf","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2023,8,22]],"date-time":"2023-08-22T08:14:01Z","timestamp":1692692041000},"score":1,"resource":{"primary":{"URL":"https:\/\/link.springer.com\/10.1007\/s00521-021-06850-6"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2022,1,12]]},"references-count":73,"journal-issue":{"issue":"25","published-print":{"date-parts":[[2023,9]]}},"alternative-id":["6850"],"URL":"https:\/\/doi.org\/10.1007\/s00521-021-06850-6","relation":{},"ISSN":["0941-0643","1433-3058"],"issn-type":[{"value":"0941-0643","type":"print"},{"value":"1433-3058","type":"electronic"}],"subject":[],"published":{"date-parts":[[2022,1,12]]},"assertion":[{"value":"10 February 2021","order":1,"name":"received","label":"Received","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 December 2021","order":2,"name":"accepted","label":"Accepted","group":{"name":"ArticleHistory","label":"Article History"}},{"value":"12 January 2022","order":3,"name":"first_online","label":"First Online","group":{"name":"ArticleHistory","label":"Article History"}},{"order":1,"name":"Ethics","group":{"name":"EthicsHeading","label":"Declarations"}},{"value":"The authors declare that they have no conflict of interest.","order":2,"name":"Ethics","group":{"name":"EthicsHeading","label":"Conflict of interest"}}]}}