Unsupervised Intuitive Physics from Visual Observations

Ehrhardt, Sebastien; Monszpart, Aron; Mitra, Niloy; Vedaldi, Andrea

doi:10.1007/978-3-030-20893-6_44

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11363))

Included in the following conference series:

Asian Conference on Computer Vision

3855 Accesses
3 Citations
1 Altmetric

Abstract

While learning models of intuitive physics is an active area of research, current approaches fall short of natural intelligences in one important regard: they require external supervision, such as explicit access to physical states, at training and sometimes even at test time. Some approaches sidestep these requirements by building models on top of handcrafted physical simulators. In both cases, however, methods cannot learn automatically new physical environments and their laws as humans do. In this work, we successfully demonstrate, for the first time, learning unsupervised predictors of physical states, such as the position of objects in an environment, directly from raw visual observations and without relying on simulators. We do so in two steps: (i) we learn to track dynamically-salient objects in videos using causality and equivariance, two non-generative unsupervised learning principles that do not require manual or external supervision. (ii) we demonstrate that the extracted positions are sufficient to successfully train visual motion predictors that can take the underlying environment into account. We validate our predictors on synthetic datasets; then, we introduce a new dataset, Roll4Real, consisting of real objects rolling on complex terrains (pool table, elliptical bowl, and random height-field). We show that it is possible to learn reliable object trajectory extrapolators from raw videos alone, without any external supervision and with no more prior knowledge than the choice of a convolutional neural network architecture.

S. Ehrhardt and A. Monszpart—Contributed equally.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 5719; Price includes VAT (Japan)

Softcover Book: JPY 7149; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping

Learning Predictive Models from Observation and Interaction

Learning to Identify Physical Parameters from Video Using Differentiable Physics

References

Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). tensorflow.org
Agrawal, P., et al.: Learning to poke by poking: experiential learning of intuitive physics. In: Proceedings of NIPS, pp. 5074–5082 (2016)
Google Scholar
Battaglia, P., et al.: Interaction networks for learning about objects, relations and physics. In: Proceedings of NIPS, pp. 4502–4510 (2016)
Google Scholar
Battaglia, P., Hamrick, J., Tenenbaum, J.: Simulation as an engine of physical scene understanding. PNAS 110(45), 18327–18332 (2013)
Article Google Scholar
Bhattacharyya, A., et al.: Long-term image boundary prediction. In: Thirty-Second AAAI Conference on Artificial Intelligence. AAAI (2018)
Google Scholar
Bradski, G.: The OpenCV library. Dr. Dobb’s J. Softw. Tools 120, 122–125 (2000)
Google Scholar
Chang, M.B., et al.: A compositional object-based approach to learning physical dynamics. In: Proceedings of ICLR (2017)
Google Scholar
Chiappa, S., et al.: Recurrent environment simulators (2017)
Google Scholar
Denil, M., et al.: Learning to perform physics experiments via deep reinforcement learning. In: Deep Reinforcement Learning Workshop, NIPS (2016)
Google Scholar
Ehrhardt, S., et al.: Learning A Physical Long-term Predictor. arXiv e-prints arXiv:1703.00247, March 2017
Ehrhardt, S., et al.: Learning to Represent Mechanics via Long-term Extrapolation and Interpolation. arXiv preprint arXiv:1706.02179, June 2017
Eslami, S.A., et al.: Attend, infer, repeat: fast scene understanding with generative models. In: Advances in Neural Information Processing Systems, pp. 3225–3233 (2016)
Google Scholar
Finn, C., et al.: Deep spatial autoencoders for visuomotor learning. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 512–519. IEEE (2016)
Google Scholar
Fragkiadaki, K., et al.: Learning visual predictive models of physics for playing billiards. In: Proceedings of NIPS (2016)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Google Scholar
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks (2017)
Google Scholar
Kansky, K., et al.: Schema networks: zero-shot transfer with a generative causal model of intuitive physics. In: International Conference on Machine Learning, pp. 1809–1818 (2017)
Google Scholar
Ladický, L., et al.: Data-driven fluid simulations using regression forests. ACM Trans. Graph. (TOG) 34(6), 199 (2015)
Article Google Scholar
Lee, A.X., et al.: Stochastic adversarial video prediction. arXiv preprint arXiv:1804.01523 (2018)
Lerer, A., Gross, S., Fergus, R.: Learning physical intuition of block towers by example. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 430–438 (2016)
Google Scholar
Li, W., Leonardis, A., Fritz, M.: Visual stability prediction and its application to manipulation. In: AAAI (2017)
Google Scholar
Luc, P., Neverova, N., Couprie, C., Verbeek, J., LeCun, Y.: Predicting deeper into the future of semantic segmentation. In: ICCV (2017)
Google Scholar
Misra, I., Zitnick, C.L., Hebert, M.: Shuffle and learn: unsupervised learning using temporal order verification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 527–544. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_32
Chapter Google Scholar
Monszpart, A., Thuerey, N., Mitra, N.: SMASH: physics-guided reconstruction of collisions from videos. ACM Trans. Graph. (TOG) 35(6), 1–14 (2016)
Article Google Scholar
Mottaghi, R., et al.: Newtonian scene understanding: unfolding the dynamics of objects in static images. In: IEEE CVPR (2016)
Google Scholar
Mrowca, D., et al.: Flexible Neural Representation for Physics Prediction. arXiv e-prints (2018)
Google Scholar
Novotny, D., et al.: Self-supervised learning of geometrically stable features through probabilistic introspection (2018)
Google Scholar
Oh, J., et al.: Action-conditional video prediction using deep networks in atari games. In: Advances in Neural Information Processing Systems, pp. 2863–2871 (2015)
Google Scholar
Ondruska, P., Posner, I.: Deep tracking: seeing beyond seeing using recurrent neural networks. In: Proceedings of AAAI (2016)
Google Scholar
Riochet, R., et al.: IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning. arXiv e-prints (2018)
Google Scholar
Sanborn, A.N., Mansinghka, V.K., Griffiths, T.L.: Reconciling intuitive physics and newtonian mechanics for colliding objects. Psychol. Rev. 120(2), 411 (2013)
Article Google Scholar
Sanchez-Gonzalez, A., et al.: Graph networks as learnable physics engines for inference and control (2018)
Google Scholar
Stewart, R., Ermon, S.: Label-free supervision of neural networks with physics and domain knowledge. In: AAAI, pp. 2576–2582 (2017)
Google Scholar
Thewlis, J., Bilen, H., Vedaldi, A.: Unsupervised learning of object frames by dense equivariant image labelling. In: Advances in Neural Information Processing Systems (NIPS), pp. 844–855 (2017)
Google Scholar
Tompson, J., et al.: Accelerating Eulerian Fluid Simulation With Convolutional Networks. arXiv e-print arXiv:1607.03597 (2016)
Watters, N., et al.: Visual interaction networks: learning a physics simulator from video. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 4542–4550. Curran Associates, Inc. (2017)
Google Scholar
Wu, J., et al.: Galileo: perceiving physical object properties by integrating a physics engine with deep learning. In: Proceedings of NIPS, pp. 127–135 (2015)
Google Scholar
Wu, J., et al.: Physics 101: learning physical object properties from unlabeled videos. In: Proceedings of BMVC (2016)
Google Scholar
Wu, J., et al.: Learning to see physics via visual de-animation. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems (NIPS) 30, pp. 153–164. Curran Associates, Inc. (2017)
Google Scholar
Wu, J., et al.: Learning to see physics via visual de-animation. In: Proceedings of NIPS (2017)
Google Scholar

Download references

Acknowledgements

The authors would like to gratefully acknowledge the support of ERC 677195-IDIU and ERC SmartGeometry StG-2013-335373 grants.

Author information

Authors and Affiliations

University of Oxford, Oxford, UK
Sebastien Ehrhardt & Andrea Vedaldi
University College London, London, UK
Aron Monszpart & Niloy Mitra
Niantic, San Fransisco, USA
Aron Monszpart

Authors

Sebastien Ehrhardt
View author publications
You can also search for this author in PubMed Google Scholar
Aron Monszpart
View author publications
You can also search for this author in PubMed Google Scholar
Niloy Mitra
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Vedaldi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sebastien Ehrhardt .

Editor information

Editors and Affiliations

IIIT Hyderabad, Hyderabad, India
C. V. Jawahar
ANU, Canberra, ACT, Australia
Hongdong Li
Simon Fraser University, Burnaby, BC, Canada
Greg Mori
ETH Zurich, Zurich, Zürich, Switzerland
Konrad Schindler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ehrhardt, S., Monszpart, A., Mitra, N., Vedaldi, A. (2019). Unsupervised Intuitive Physics from Visual Observations. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11363. Springer, Cham. https://doi.org/10.1007/978-3-030-20893-6_44

Download citation

DOI: https://doi.org/10.1007/978-3-030-20893-6_44
Published: 29 May 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20892-9
Online ISBN: 978-3-030-20893-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics