Abstract
Motion control is fundamental to mobile robots, and the associated challenge in development can be assisted by the incorporation of execution experience to increase policy robustness. In this work, we present an approach that updates a policy learned from demonstration with human teacher feedback. We contribute advice-operators as a feedback form that provides corrections on state-action pairs produced during a learner execution, and Focused Feedback for Mobile Robot Policies (F3MRP) as a framework for providing feedback to rapidly-sampled policies. Both are appropriate for mobile robot motion control domains. We present a general feedback algorithm in which multiple types of feedback, including advice-operators, are provided through the F3MRP framework, and shown to improve policies initially derived from a set of behavior examples. A comparison to providing more behavior examples instead of more feedback finds data to be generated in different areas of the state and action spaces, and feedback to be more effective at improving policy performance while producing smaller datasets.
Similar content being viewed by others
Notes
The F3MRP framework was developed within the GNU Octave scientific language [14].
The empirical validations of Sect. 4.2 employ lazy learning regression techniques [6]; specifically, a form of locally weighted averaging. Incremental policy updating is particularly straightforward under lazy learning regression, since explicit rederivation is not required; policy derivation happens at execution time and so a complete policy update is accomplished by simply adding new data to the set.
The positive credit flag adds the execution point, unmodified, to the dataset; and thus may equivalently be viewed as an identity function advice-operator, i.e. f(z,a)=(z,a).
This scale becomes finer, and association with the underlying data trickier, if a single value is intended to be somehow distributed across only a portion of the execution states; akin to the RL issue of reward back-propagation.
A Poisson formulation was chosen since the distance calculations never fall below, and often cluster near, zero. To estimate λ, frequency counts were computed for k bins (uniformly sized) of distance data (k=50).
The traces ξ d and ξ p correspond respectively to the “Prediction Data” and “Position Data” in Fig. 1. Similarly, the trace subsets \(\hat{\xi}_{d}=\nobreak\{x,y,\theta\}_{\varPhi}\) and \(\hat{\xi}_{p} =\{\mathbf{z},\mathbf{a}\}_{\varPhi}\).
Here an earlier version of F3MRP was employed, that did not provide visual dataset support or interactive tagging.
The same teacher (one of the authors) was used to provide both demonstration and feedback.
Full domain, and algorithm, details may be found in [4].
The exceptions being when the entire learner execution receives a correction, or when the teacher provides a demonstration for only the beginning portion of an execution.
In Table 2, operators 0–5 are the baseline operators and operators 6–8 were built through operator-scaffolding.
Note that operator composition is not transitive.
The limit being the number of unique combinations of the parameters of the child operators.
If a constant value for the rate of change in action dimension j is not defined for the robot system, reasonable options for this value include, for example, average rate of change seen during the demonstrations.
The value γ j,max is defined either by the physical constraints of the robot, or artificially by the control system.
References
Abbeel P, Coates A, Quigley M, Ng AY (2007) An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of advances in neural information processing
Argall B, Browning B, Veloso M (2008) Learning robot motion control with demonstration and advice-operators. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems
Argall B, Browning B, Veloso M (2009) Automatic weight learning for multiple data sources when learning from demonstration. In: Proceedings of the IEEE international conference on robotics and automation
Argall B, Browning B, Veloso M (2011) Teacher feedback to scaffold and refine demonstrated motion primitives on a mobile robot. Robot Auton Syst 59(3–4):243–255
Argall B, Chernova S, Veloso M, Browning B (2009) A survey of robot learning from demonstration. Robot Auton Syst 57(5):469–483
Atkeson CG, Moore AW, Schaal S (1997) Locally weighted learning. Artif Intell Rev 11:11–73
Atkeson CG, Schaal S (1997) Robot learning from demonstration. In: Proceedings of the fourteenth international conference on machine learning (ICML’97)
Bagnell JA, Schneider JG (2001) Autonomous helicopter control using reinforcement learning policy search methods. In: Proceedings of the IEEE international conference on robotics and automation
Bentivegna DC (2004) Learning from observation using primitives. Ph.D. thesis, College of Computing, Georgia Institute of Technology, Atlanta, GA
Billard A, Callinon S, Dillmann R, Schaal S (2008) Robot programming by demonstration. In: Siciliano B, Khatib O (eds) Handbook of robotics. Springer, New York, Chap. 59
Breazeal C, Scassellati B (2002) Robots that imitate humans. Trends Cogn Sci 6(11):481–487
Calinon S, Billard A (2007) Incremental learning of gestures by imitation in a humanoid robot. In: Proceedings of the 2nd ACM/IEEE international conference on human-robot interactions
Chernova S, Veloso M (2008) Learning equivalent action choices from demonstration. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems
Eaton JW (2002) GNU Octave Manual. Network Theory Limited
Grollman DH, Jenkins OC (2007) Dogged learning for robots. In: Proceedings of the IEEE international conference on robotics and automation
Ijspeert AJ, Nakanishi J, Schaal S (2002) Learning rhythmic movements by demonstration using nonlinear oscillators. In: Proceedings of the IEEE/RSJ international conference on intelligent robots and systems
Kober J, Peters J (2009) Learning motor primitives for robotics. In: Proceedings of the IEEE international conference on robotics and automation
Kolter JZ, Abbeel P, Ng AY (2008) Hierarchical apprenticeship learning with application to quadruped locomotion. In: Proceedings of advances in neural information processing
Matarić MJ (2002) Sensory-motor primitives as a basis for learning by imitation: Linking perception to action and biology to robotics. In: Dautenhahn K, Nehaniv CL (eds) Imitation in animals and artifacts. MIT Press, Cambridge, Chap. 15
Nehaniv CL, Dautenhahn K (2002) The correspondence problem. In: Dautenhahn K, Nehaniv CL (eds) Imitation in animals and artifacts. MIT Press, Cambridge, Chap. 2
Nicolescu M, Mataric M (2003) Methods for robot task learning: Demonstrations, generalization and practice. In: Proceedings of the second international joint conference on autonomous agents and multi-agent systems
Pastor P, Kalakrishnan M, Chitta S, Theodorou E, Schaal S (2011) Skill learning and task outcome prediction for manipulation. In: Proceedings of IEEE international conference on robotics and automation
Peters J, Schaal S (2008) Natural actor-critic. Neurocomputing 71(7–9):1180–1190
Ratliff N, Bradley D, Bagnell JA, Chestnutt J (2007) Boosting structured prediction for imitation learning. In: Proceedings of advances in neural information processing systems
Smart WD (2002) Making reinforcement learning work on real robots. Ph.D. thesis, Department of Computer Science, Brown University, Providence, RI
Acknowledgements
The research is partly sponsored by the Boeing Corporation under Grant No. CMU-BA-GTA-1, BBNT Solutions under subcontract No. 950008572, via prime Air Force contract No. SA-8650-06-C-7606, the United States Department of the Interior under Grant No. NBCH-1040007 and the Qatar Foundation for Education, Science and Community Development. The views and conclusions contained in this document are solely those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Argall, B.D., Browning, B. & Veloso, M.M. Policy Feedback for the Refinement of Learned Motion Control on a Mobile Robot. Int J of Soc Robotics 4, 383–395 (2012). https://doi.org/10.1007/s12369-012-0156-9
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12369-012-0156-9