Abstract
Interactive reinforcement learning has allowed speeding up the learning process in autonomous agents by including a human trainer providing extra information to the agent in real-time. Current interactive reinforcement learning research has been limited to real-time interactions that offer relevant user advice to the current state only. Additionally, the information provided by each interaction is not retained and instead discarded by the agent after a single-use. In this work, we propose a persistent rule-based interactive reinforcement learning approach, i.e., a method for retaining and reusing provided knowledge, allowing trainers to give general advice relevant to more than just the current state. Our experimental results show persistent advice substantially improves the performance of the agent while reducing the number of interactions required for the trainer. Moreover, rule-based advice shows similar performance impact as state-based advice, but with a substantially reduced interaction count.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arzate C, Igarashi T (2020) A survey on interactive reinforcement learning: design principles and open challenges. In: Proceedings of the 2020 ACM designing interactive systems conference. pp 1195–1209
Lin J, Ma Z, Gomez R, Nakamura K, He B, Li G (2020) A review on interactive reinforcement learning from human social feedback. IEEE Access 8:120757–120765
Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2020) Human engagement providing evaluative and informative advice for interactive reinforcement learning arXiv preprint arXiv:2009.09575
Knox WB, Stone P (2009) Interactively shaping agents via human reinforcement: The TAMER framework. In: Proceedings of the fifth international conference on knowledge capture, pp. 9–16, ACM
Bignold A, Cruz F, Taylor ME, Brys T, Dazeley R, Vamplew P, Foale C (2020) A conceptual framework for externally-influenced agents: an assisted reinforcement learning review, arXiv preprint arXiv:2007.01544
Griffith S, Subramanian K, Scholz J, Isbell C, Thomaz AL (2013) Policy shaping: integrating human feedback with reinforcement learning. In: Advances in neural information processing systems. pp 2625–2633
Knox WB, and Stone P (2010) Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: Proceedings of the 9th international conference on autonomous agents and multiagent systems, vol 1, pp 5–12
Taylor ME, Carboni N, Fachantidis A, Vlahavas I, Torrey L (2014) Reinforcement learning agents providing advice in complex video games. Connect Sci 26(1):45–63
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT Press, Cambridge
Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming. Wiley, Hoboken
Sledge IJ, Príncipe JC (2017) Balancing exploration and exploitation in reinforcement learning using a value of information criterion. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 2816–2820
Subramanian K, Isbell CL Jr, Thomaz AL (2016) Exploration from demonstration for interactive reinforcement learning. In: Proceedings of the 2016 international conference on autonomous agents & multiagent systems, pp 447–456
Moreira I, Rivas J, Cruz F, Dazeley R, Ayala A, Fernandes B (2020) Deep reinforcement learning with interactive feedback in a human-robot environment. Appl Sci 10(16):5574
Thomaz AL, Hoffman G, Breazeal C (2005) Real-time interactive reinforcement learning for robots. In: AAAI 2005 workshop on human comprehensible machine learning
Ayala A, Henríquez C, Cruz F (2019) Reinforcement learning using continuous states and interactive feedback. In: Proceedings of the international conference on applications of intelligent systems, pp 1–5
Millán C, Fernandes B, Cruz F (2019) Human feedback in continuous actor-critic reinforcement learning. In: Proceedings of the European symposium on artificial neural networks, computational intelligence and machine learning ESANN, pp 661–666, ESANN
Pilarski PM, and Sutton RS (2012) Between instruction and reward: human-prompted switching. In: AAAI fall symposium series: robots learning interactively from human teachers, pp 45–52
Cruz F, Wüppen P, Magg S, Fazrie A, Wermter S (2017) Agent-advising approaches in an interactive reinforcement learning scenario. In: Proceedings of the joint IEEE international conference on development and learning and epigenetic robotics ICDL-EpiRob, pp 209–214, IEEE
Torrey L, Taylor ME (2013) Teaching on a budget: agents advising agents in reinforcement learning, In: Proceedings of the international conference on autonomous agents and multiagent systems AAMAS
López G, Quesada L, Guerrero LA (2017) Alexa vs. siri vs. cortana vs. google assistant: a comparison of speech-based natural user interfaces. In: International conference on applied human factors and ergonomics, pp 241–250, Springer
Churamani N, Cruz F, Griffiths S, and Barros P (2016) iCub: learning emotion expressions using human reward. In: Proceedings of the workshop on bio-inspired social robot learning in home scenarios. IEEE/RSJ IROS, p 2
Kwok SW, Carter C (1990) Multiple decision trees. In: Machine intelligence and pattern recognition, vol 9. Elsevier, pp 327–335
Rokach L, Maimon O (2005) Decision trees, in data mining and knowledge discovery handbook. Springer, pp 165–192
Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106
Breiman L (2017) Classification and regression trees. Routledge, Milton Park
Džeroski S, De Raedt L, Driessens K (2001) Relational reinforcement learning. Mach Learn 43(1):7–52
Li R, Jabri A, Darrell T, Agrawal P (2020) Towards practical multi-object manipulation using relational reinforcement learning. In: IEEE international conference on robotics and automation, pp 4051–4058
Tadepalli P, Givan R, Driessens K (2004) Relational reinforcement learning: an overview. In: Proceedings of the ICML-2004 workshop on relational reinforcement learning, pp 1–9
Glatt R, Da Silva FL, da Costa Bianchi RA, Costa AHR (2020) DECAF: deep case-based policy inference for knowledge transfer in reinforcement learning. Expert Syst Appl 156:113420
Bianchi RA, Ros R, De Mantaras RL (2009) Improving reinforcement learning by using case based heuristics. In: International conference on case-based reasoning. Springer, pp 75–89
Taylor ME, Stone P (2009) Transfer learning for reinforcement learning domains: a survey. J Mach Learn Res 10(7):1633–1685
Bianchi RA, Celiberto LA Jr, Santos PE, Matsuura JP, de Mantaras RL (2015) Transferring knowledge as heuristics in reinforcement learning: a case-based approach. Artif Intell 226:102–121
Kang B, Compton P, and Preston P (1995) Multiple classification ripple down rules: evaluation and possibilities. In: Proceedings 9th Banff knowledge acquisition for knowledge-based systems workshop, vol 1, pp 17–1
Compton P, Edwards G, Kang B, Lazarus L, Malor R, Menzies T, Preston P, Srinivasan A, Sammut C (1991) Ripple down rules: possibilities and limitations. In: Proceedings of the sixth AAAI knowledge acquisition for knowledge-based systems workshop. University of Calgary, Calgary, Canada, pp 6–1
Herbert D, Kang BH (2018) Intelligent conversation system using multiple classification ripple down rules and conversational context. Expert Syst Appl 112:342–352
Richards D (2009) Two decades of ripple down rules research. Knowl Eng Rev 24(2):159–184
Randløv J and Alstrøm P (1988) Learning to drive a bicycle using reinforcement learning and shaping. In: ICML, vol 98, pp 463–471, Citeseer
Ng AY, Harada D, Russell S (1999) Policy invariance under reward transformations: theory and application to reward shaping. Proc. Int. Conf. Mach. Learn. ICML 99:278–287
Devlin S and Kudenko D (2011) Theoretical considerations of potential-based reward shaping for multi-agent systems. In: The 10th international conference on autonomous agents and multiagent systems-vol 1, pp 225–232
Harutyunyan A, Devlin S, Vrancxn P, Nowé A (2015) Expressing arbitrary reward functions as potential-based advice.. In: AAAI, pp 2652–2658
Fernández F, Veloso M (2006) Probabilistic policy reuse in a reinforcement learning agent, in Proceedings of the fifth International Joint Conference on Autonomous Agents and Multi-Agent Systems. pp 720–727
Bignold A, Cruz F, Dazeley R, Vamplew P, Foale C (2021) An evaluation methodology for interactive reinforcement learning with simulated users. Biomimetics 6(1):13
Kang BH, Preston P, Compton P, (1998) Simulated expert evaluation of multiple classification ripple down rules. In: Proceedings of the 11th workshop on knowledge acquisition, modeling and management
Compton P, Preston P, Kang B (1995) The use of simulated experts in evaluating knowledge acquisition. University of Calgary, Calgary
Gaines BR, Compton P (1995) Induction of ripple-down rules applied to modeling large databases. J Intell Inf Syst 5(3):211–228
Compton P, Peters L, Edwards G, Lavers TG (2006) Experience with ripple-down rules. Applications and innovations in intelligent systems XIII. Springer, pp 109–121
Acknowledgments
This work has been partially supported by the Australian Government Research Training Program (RTP) and the RTP Fee-Offset Scholarship through Federation University Australia.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Bignold, A., Cruz, F., Dazeley, R. et al. Persistent rule-based interactive reinforcement learning. Neural Comput & Applic 35, 23411–23428 (2023). https://doi.org/10.1007/s00521-021-06466-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06466-w