Abstract
Human beings are a largely untapped source of in-the-loop knowledge and guidance for computational learning agents, including robots. To effectively design agents that leverage available human expertise, we need to understand how people naturally teach. In this paper, we describe two experiments that ask how differing conditions affect a human teacher’s feedback frequency and the computational agent’s learned performance. The first experiment considers the impact of a self-perceived teaching role in contrast to believing one is merely critiquing a recording. The second considers whether a human trainer will give more frequent feedback if the agent acts less greedily (i.e., choosing actions believed to be worse) when the trainer’s recent feedback frequency decreases. From the results of these experiments, we draw three main conclusions that inform the design of agents. More broadly, these two studies stand as early examples of a nascent technique of using agents as highly specifiable social entities in experiments on human behavior.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Following common practice in reinforcement learning, we use “reward” to mean both positively and negatively valued feedback.
The trainer’s assessment of return is, of course, dependent on her understanding of the task and expectation of future behavior, both of which may be flawed and will likely become more accurate over time.
The specification of Tetris in RL-Library follows does not differ from that of traditional Tetris, except that there are no points or levels of increasing speed, omissions that are standard in Tetris learning literature [4]. We use RL-Library for convenience and its compatibility with RL-Glue, a software specification for reinforcement learning agents and environments.
Instructions given to subjects can be found at http://www.cs.utexas.edu/~bradknox/papers/12ijsr.
The Greedy group can be considered similar to the Teaching group from the critique experiment. The two groups’ instructions do contain differences, but both groups have identical tamer agent algorithms and subjects are aware that they are teaching.
Performance is again tested offline, not during training, and the testing policy is greedy regardless of condition.
Illustrating the bimodality of performance, there were 79 subjects across conditions. In the 9th testing interval, 23 agents clear between 0–1 lines; 47 clear more than 100. Only 2 agents clear 5–20 lines.
Though exploration is often considered equivalent to non-greedy action, this definition does not fit all instances of its use in RL. For instance, an agent that employs an exploratory policy might have a greedy policy that sometimes agrees on what action to select. However, this is a semantic point that does not affect our assertion that the comprehensive dichotomy of explore/exploit is insufficient.
A human opposite the subject could have fully scripted behavior, act naturally except in certain situations (like misbehaving at certain times), or simply act naturally. Additionally, the subject may believe either that this person is a fellow subject or that she is working for the experimenters. We call this human that would potentially be replaced by an agent a “human actor” for simplicity and to differentiate from the subject.
References
Abbeel P, Ng A (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the twenty-first international conference on machine learning. ACM, New York, p 1
Argall B, Browning B, Veloso M (2007) Learning by demonstration with critique from a human teacher. In: Proceedings of the ACM/IEEE international conference on Human-robot interaction. ACM, New York, pp 57–64
Argall B, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483
Bertsekas D, Tsitsiklis J (1996) Neuro-dynamic programming. Athena Scientific, Nashua
Bouton M (2007) Learning and behavior: a contemporary synthesis. Sinauer Associates, Sunderland
Breazeal C (2004) Designing sociable robots. MIT Press, Cambridge
Chernova S, Veloso M (2009) Interactive policy learning through confidence-based autonomy. J Artif Intell Res 34(1):1–25
Chernova S, Veloso M (2009) Teaching collaborative multi-robot tasks through demonstration. In: 8th IEEE-RAS international conference on humanoid robots, Humanoids, 2008. IEEE Press, New York, pp 385–390.
Dautenhahn K (2007) Methodology and themes of human-robot interaction: a growing research field. Int J Adv Robot Syst 4(1):103–108
Dobbs J, Arnold D, Doctoroff G (2004) Attention in the preschool classroom: the relationships among child gender, child misbehavior, and types of teacher attention. Early Child Dev Care 174(3):281–295
Evers V, Maldonado H, Brodecki T, Hinds P (2008) Relational vs. group self-construal: untangling the role of national culture in hri. In: Proceedings of the 3rd ACM/IEEE international conference on human robot interaction. ACM, New York, pp 255–262
Fagot B (1973) Influence of teacher behavior in the preschool. Dev Psychol 9(2):198
Grollman D, Jenkins O (2007) Dogged learning for robots. In: IEEE international conference on robotics and automation, 2007. IEEE Press, New York, pp 2483–2488
Hinds P, Roberts T, Jones H (2004) Whose job is it anyway? A study of human-robot interaction in a collaborative task. Hum-Comput Interact 19(1):151–181
Isbell C, Kearns M, Singh S, Shelton C, Stone P, Kormann D (2006) Cobot in LambdaMOO: an adaptive social statistics agent. In: AAMAS
Kaochar T, Peralta R, Morrison C, Fasel I, Walsh T, Cohen P (2011) Towards understanding how humans teach robots. In: User modeling, adaption and personalization, pp 347–352
Kim E, Leyzberg D, Tsui K, Scassellati B (2009) How people talk when teaching a robot. In: Proceedings of the 4th ACM/IEEE international conference on human robot interaction. ACM, New York, pp 23–30
Knox W, Stone P (2009) Interactively shaping agents via human reinforcement: the TAMER framework. In: The 5th international conference on knowledge capture
Knox WB, Breazeal C, Stone P (2012) Learning from feedback on actions past and intended. In: Proceedings of 7th ACM/IEEE international conference on Human-Robot interaction, Late-Breaking reports session (HRI 2012)
Knox WB, Stone P (2012) Reinforcement learning with human and MDP reward. In: Proceedings of the 11th international conference on autonomous agents and multiagent systems (AAMAS)
Kuhlmann G, Stone P, Mooney R, Shavlik J (2004) Guiding a reinforcement learner with natural language advice: initial results in RoboCup soccer. In: The AAAI-2004 workshop on supervisory control of learning and adaptive systems
MacDorman K, Ishiguro H (2006) The uncanny advantage of using androids in cognitive and social science research. Interact Stud 7(3):297–337
MacDorman K, Minato T, Shimada M, Itakura S, Cowley S, Ishiguro H (2005) Assessing human likeness by eye contact in an android testbed. In: Proceedings of the XXVII annual meeting of the cognitive science society, pp 21–23
Maclin R, Shavlik J (1996) Creating advice-taking reinforcement learners. Mach Learn 22(1):251–281
Nicolescu M, Mataric M (2002) Learning and interacting in human-robot domains. IEEE Trans Syst Man Cybern, Part A, Syst Hum 31(5):419–430
Nicolescu M, Mataric M (2003) Natural methods for robot task learning: instructive demonstrations, generalization and practice. In: AAMAS. ACM, New York, pp 241–248
Pomerleau D (1989) ALVINN: an autonomous land vehicle in a neural network. Advances in neural information processing systems, vol 1. Morgan Kaufmann, San Mateo
Pryor K (2002) Don’t shoot the dog! The new art of teaching and training. Interpet Publishing, Dorking
Ramirez K (1999) Animal training: successful animal management through positive reinforcement. Shedd Aquarium, Chicago
Reed K, Patton J, Peshkin M (2007) Replicating human-human physical interaction. In: IEEE international conference on robotics and automation
Rouder J, Speckman P, Sun D, Morey R, Iverson G (2009) Bayesian t tests for accepting and rejecting the null hypothesis. Psychon Bull Rev 16(2):225–237
Saunders J, Nehaniv C, Dautenhahn K (2006) Teaching robots by moulding behavior and scaffolding the environment. In: Proceedings of the 1st ACM SIGCHI/SIGART conference on human-robot interaction. ACM, New York, pp 118–125
Sridharan M (2011) Augmented reinforcement learning for interaction with non-expert humans in agent domains. In: Proceedings of IEEE international conference on machine learning applications
Sutton R, Barto A (1998) Reinforcement learning: an introduction. MIT Press, Cambridge
Tanner B, White A (2009) RL-Glue: Language-independent software for reinforcement-learning experiments. J Mach Learn Res 10:2133–2136
Thomaz A (2006) Socially guided machine learning. PhD thesis, Citeseer
Thomaz A, Breazeal C (2006) Reinforcement learning with human teachers: evidence of feedback and guidance with implications for learning performance. In: AAAI
Thomaz A, Cakmak M (2009) Learning about objects with human teachers. In: Proceedings of the 4th ACM/IEEE international conference on human robot interaction. ACM, New York, pp 15–22
Wolfgang C (2004) Solving discipline and classroom management problems: methods and models for today’s teachers. Wiley, New York
Woodward M, Wood R (2009) Using Bayesian inference to learn high-level tasks from a human teacher. In: International conference on artificial intelligence and pattern recognition, AIPR-09
Acknowledgements
This research was supported in part by NIH (R01 MH077708 to WTM), NSF (IIS-0917122), AFOSR (FA9550-10-1-0268), ONR (N00014-09-1-0658), and the FHWA (DTFH61-07-H-00030). We thank the research assistants of MaddoxLab for their crucial help gathering data.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Knox, W.B., Glass, B.D., Love, B.C. et al. How Humans Teach Agents. Int J of Soc Robotics 4, 409–421 (2012). https://doi.org/10.1007/s12369-012-0163-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12369-012-0163-x