Learning Reward Functions for Robotic Manipulation by Observing Humans

Alakuijala, Minttu; Dulac-Arnold, Gabriel; Mairal, Julien; Ponce, Jean; Schmid, Cordelia

Computer Science > Robotics

arXiv:2211.09019 (cs)

[Submitted on 16 Nov 2022 (v1), last revised 7 Mar 2023 (this version, v2)]

Title:Learning Reward Functions for Robotic Manipulation by Observing Humans

Authors:Minttu Alakuijala, Gabriel Dulac-Arnold, Julien Mairal, Jean Ponce, Cordelia Schmid

View PDF

Abstract:Observing a human demonstrator manipulate objects provides a rich, scalable and inexpensive source of data for learning robotic policies. However, transferring skills from human videos to a robotic manipulator poses several challenges, not least a difference in action and observation spaces. In this work, we use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. Thanks to the diversity of this training data, the learned reward function sufficiently generalizes to image observations from a previously unseen robot embodiment and environment to provide a meaningful prior for directed exploration in reinforcement learning. We propose two methods for scoring states relative to a goal image: through direct temporal regression, and through distances in an embedding space obtained with time-contrastive learning. By conditioning the function on a goal image, we are able to reuse one model across a variety of tasks. Unlike prior work on leveraging human videos to teach robots, our method, Human Offline Learned Distances (HOLD) requires neither a priori data from the robot environment, nor a set of task-specific human demonstrations, nor a predefined notion of correspondence across morphologies, yet it is able to accelerate training of several manipulation tasks on a simulated robot arm compared to using only a sparse reward obtained from task completion.

Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2211.09019 [cs.RO]
	(or arXiv:2211.09019v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2211.09019

Submission history

From: Minttu Alakuijala [view email]
[v1] Wed, 16 Nov 2022 16:26:48 UTC (7,257 KB)
[v2] Tue, 7 Mar 2023 16:29:49 UTC (7,281 KB)

Computer Science > Robotics

Title:Learning Reward Functions for Robotic Manipulation by Observing Humans

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Learning Reward Functions for Robotic Manipulation by Observing Humans

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators