How Humans Teach Agents

Knox, W. Bradley; Glass, Brian D.; Love, Bradley C.; Maddox, W. Todd; Stone, Peter

doi:10.1007/s12369-012-0163-x

How Humans Teach Agents

A New Experimental Perspective

Published: 04 July 2012

Volume 4, pages 409–421, (2012)
Cite this article

International Journal of Social Robotics Aims and scope Submit manuscript

W. Bradley Knox¹,
Brian D. Glass²,
Bradley C. Love³,
W. Todd Maddox² &
…
Peter Stone¹

928 Accesses
45 Citations
Explore all metrics

Abstract

Human beings are a largely untapped source of in-the-loop knowledge and guidance for computational learning agents, including robots. To effectively design agents that leverage available human expertise, we need to understand how people naturally teach. In this paper, we describe two experiments that ask how differing conditions affect a human teacher’s feedback frequency and the computational agent’s learned performance. The first experiment considers the impact of a self-perceived teaching role in contrast to believing one is merely critiquing a recording. The second considers whether a human trainer will give more frequent feedback if the agent acts less greedily (i.e., choosing actions believed to be worse) when the trainer’s recent feedback frequency decreases. From the results of these experiments, we draw three main conclusions that inform the design of agents. More broadly, these two studies stand as early examples of a nascent technique of using agents as highly specifiable social entities in experiments on human behavior.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

Interactive Robot Learning: An Overview

Agents teaching agents: a survey on inter-agent transfer learning

Article 09 December 2019

Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning

Article 13 February 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Following common practice in reinforcement learning, we use “reward” to mean both positively and negatively valued feedback.
The trainer’s assessment of return is, of course, dependent on her understanding of the task and expectation of future behavior, both of which may be flawed and will likely become more accurate over time.
Tetris is one of five task domains for which tamer has published results [18–20, 33].
The specification of Tetris in RL-Library follows does not differ from that of traditional Tetris, except that there are no points or levels of increasing speed, omissions that are standard in Tetris learning literature [4]. We use RL-Library for convenience and its compatibility with RL-Glue, a software specification for reinforcement learning agents and environments.
Instructions given to subjects can be found at http://www.cs.utexas.edu/~bradknox/papers/12ijsr.
The Greedy group can be considered similar to the Teaching group from the critique experiment. The two groups’ instructions do contain differences, but both groups have identical tamer agent algorithms and subjects are aware that they are teaching.
Performance is again tested offline, not during training, and the testing policy is greedy regardless of condition.
Illustrating the bimodality of performance, there were 79 subjects across conditions. In the 9th testing interval, 23 agents clear between 0–1 lines; 47 clear more than 100. Only 2 agents clear 5–20 lines.
Though exploration is often considered equivalent to non-greedy action, this definition does not fit all instances of its use in RL. For instance, an agent that employs an exploratory policy might have a greedy policy that sometimes agrees on what action to select. However, this is a semantic point that does not affect our assertion that the comprehensive dichotomy of explore/exploit is insufficient.
A human opposite the subject could have fully scripted behavior, act naturally except in certain situations (like misbehaving at certain times), or simply act naturally. Additionally, the subject may believe either that this person is a fellow subject or that she is working for the experimenters. We call this human that would potentially be replaced by an agent a “human actor” for simplicity and to differentiate from the subject.

References

Abbeel P, Ng A (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning. ACM, New York, p 1
Chapter Google Scholar
Argall B, Browning B, Veloso M (2007) Learning by demonstration with critique from a human teacher. In: Proceedings of the ACM/IEEE international conference on Human-robot interaction. ACM, New York, pp 57–64
Chapter Google Scholar
Argall B, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483
Article Google Scholar
Bertsekas D, Tsitsiklis J (1996) Neuro-dynamic programming. Athena Scientific, Nashua
MATH Google Scholar
Bouton M (2007) Learning and behavior: a contemporary synthesis. Sinauer Associates, Sunderland
Google Scholar
Breazeal C (2004) Designing sociable robots. MIT Press, Cambridge
Google Scholar
Chernova S, Veloso M (2009) Interactive policy learning through confidence-based autonomy. J Artif Intell Res 34(1):1–25
MathSciNet MATH Google Scholar
Chernova S, Veloso M (2009) Teaching collaborative multi-robot tasks through demonstration. In: 8th IEEE-RAS international conference on humanoid robots, Humanoids, 2008. IEEE Press, New York, pp 385–390.
Google Scholar
Dautenhahn K (2007) Methodology and themes of human-robot interaction: a growing research field. Int J Adv Robot Syst 4(1):103–108
Google Scholar
Dobbs J, Arnold D, Doctoroff G (2004) Attention in the preschool classroom: the relationships among child gender, child misbehavior, and types of teacher attention. Early Child Dev Care 174(3):281–295
Article Google Scholar
Evers V, Maldonado H, Brodecki T, Hinds P (2008) Relational vs. group self-construal: untangling the role of national culture in hri. In: Proceedings of the 3rd ACM/IEEE international conference on human robot interaction. ACM, New York, pp 255–262
Chapter Google Scholar
Fagot B (1973) Influence of teacher behavior in the preschool. Dev Psychol 9(2):198
Article Google Scholar
Grollman D, Jenkins O (2007) Dogged learning for robots. In: IEEE international conference on robotics and automation, 2007. IEEE Press, New York, pp 2483–2488
Google Scholar
Hinds P, Roberts T, Jones H (2004) Whose job is it anyway? A study of human-robot interaction in a collaborative task. Hum-Comput Interact 19(1):151–181
Article Google Scholar
Isbell C, Kearns M, Singh S, Shelton C, Stone P, Kormann D (2006) Cobot in LambdaMOO: an adaptive social statistics agent. In: AAMAS
Google Scholar
Kaochar T, Peralta R, Morrison C, Fasel I, Walsh T, Cohen P (2011) Towards understanding how humans teach robots. In: User modeling, adaption and personalization, pp 347–352
Chapter Google Scholar
Kim E, Leyzberg D, Tsui K, Scassellati B (2009) How people talk when teaching a robot. In: Proceedings of the 4th ACM/IEEE international conference on human robot interaction. ACM, New York, pp 23–30
Google Scholar
Knox W, Stone P (2009) Interactively shaping agents via human reinforcement: the TAMER framework. In: The 5th international conference on knowledge capture
Google Scholar
Knox WB, Breazeal C, Stone P (2012) Learning from feedback on actions past and intended. In: Proceedings of 7th ACM/IEEE international conference on Human-Robot interaction, Late-Breaking reports session (HRI 2012)
Google Scholar
Knox WB, Stone P (2012) Reinforcement learning with human and MDP reward. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems (AAMAS)
Google Scholar
Kuhlmann G, Stone P, Mooney R, Shavlik J (2004) Guiding a reinforcement learner with natural language advice: initial results in RoboCup soccer. In: The AAAI-2004 workshop on supervisory control of learning and adaptive systems
Google Scholar
MacDorman K, Ishiguro H (2006) The uncanny advantage of using androids in cognitive and social science research. Interact Stud 7(3):297–337
Article Google Scholar
MacDorman K, Minato T, Shimada M, Itakura S, Cowley S, Ishiguro H (2005) Assessing human likeness by eye contact in an android testbed. In: Proceedings of the XXVII annual meeting of the cognitive science society, pp 21–23
Google Scholar
Maclin R, Shavlik J (1996) Creating advice-taking reinforcement learners. Mach Learn 22(1):251–281
Google Scholar
Nicolescu M, Mataric M (2002) Learning and interacting in human-robot domains. IEEE Trans Syst Man Cybern, Part A, Syst Hum 31(5):419–430
Article Google Scholar
Nicolescu M, Mataric M (2003) Natural methods for robot task learning: instructive demonstrations, generalization and practice. In: AAMAS. ACM, New York, pp 241–248
Google Scholar
Pomerleau D (1989) ALVINN: an autonomous land vehicle in a neural network. Advances in neural information processing systems, vol 1. Morgan Kaufmann, San Mateo
Google Scholar
Pryor K (2002) Don’t shoot the dog! The new art of teaching and training. Interpet Publishing, Dorking
Google Scholar
Ramirez K (1999) Animal training: successful animal management through positive reinforcement. Shedd Aquarium, Chicago
Google Scholar
Reed K, Patton J, Peshkin M (2007) Replicating human-human physical interaction. In: IEEE international conference on robotics and automation
Google Scholar
Rouder J, Speckman P, Sun D, Morey R, Iverson G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev 16(2):225–237
Article Google Scholar
Saunders J, Nehaniv C, Dautenhahn K (2006) Teaching robots by moulding behavior and scaffolding the environment. In: Proceedings of the 1st ACM SIGCHI/SIGART conference on human-robot interaction. ACM, New York, pp 118–125
Chapter Google Scholar
Sridharan M (2011) Augmented reinforcement learning for interaction with non-expert humans in agent domains. In: Proceedings of IEEE international conference on machine learning applications
Google Scholar
Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Google Scholar
Tanner B, White A (2009) RL-Glue: Language-independent software for reinforcement-learning experiments. J Mach Learn Res 10:2133–2136
Google Scholar
Thomaz A (2006) Socially guided machine learning. PhD thesis, Citeseer
Thomaz A, Breazeal C (2006) Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance. In: AAAI
Google Scholar
Thomaz A, Cakmak M (2009) Learning about objects with human teachers. In: Proceedings of the 4th ACM/IEEE international conference on human robot interaction. ACM, New York, pp 15–22
Google Scholar
Wolfgang C (2004) Solving discipline and classroom management problems: methods and models for today’s teachers. Wiley, New York
Google Scholar
Woodward M, Wood R (2009) Using Bayesian inference to learn high-level tasks from a human teacher. In: International conference on artificial intelligence and pattern recognition, AIPR-09
Google Scholar

Download references

Acknowledgements

This research was supported in part by NIH (R01 MH077708 to WTM), NSF (IIS-0917122), AFOSR (FA9550-10-1-0268), ONR (N00014-09-1-0658), and the FHWA (DTFH61-07-H-00030). We thank the research assistants of MaddoxLab for their crucial help gathering data.

Author information

Authors and Affiliations

Department of Computer Science, University of Texas at Austin, Austin, USA
W. Bradley Knox & Peter Stone
Department of Psychology, University of Texas at Austin, Austin, USA
Brian D. Glass & W. Todd Maddox
Department of Cognitive, Perceptual and Brain Sciences, University College London, London, UK
Bradley C. Love

Authors

W. Bradley Knox
View author publications
You can also search for this author in PubMed Google Scholar
Brian D. Glass
View author publications
You can also search for this author in PubMed Google Scholar
Bradley C. Love
View author publications
You can also search for this author in PubMed Google Scholar
W. Todd Maddox
View author publications
You can also search for this author in PubMed Google Scholar
Peter Stone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to W. Bradley Knox.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Knox, W.B., Glass, B.D., Love, B.C. et al. How Humans Teach Agents. Int J of Soc Robotics 4, 409–421 (2012). https://doi.org/10.1007/s12369-012-0163-x

Download citation

Accepted: 03 June 2012
Published: 04 July 2012
Issue Date: November 2012
DOI: https://doi.org/10.1007/s12369-012-0163-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Institutional subscriptions

How Humans Teach Agents

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Interactive Robot Learning: An Overview

Agents teaching agents: a survey on inter-agent transfer learning

Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

How Humans Teach Agents

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Interactive Robot Learning: An Overview

Agents teaching agents: a survey on inter-agent transfer learning

Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation