Abstract
While learning models of intuitive physics is an active area of research, current approaches fall short of natural intelligences in one important regard: they require external supervision, such as explicit access to physical states, at training and sometimes even at test time. Some approaches sidestep these requirements by building models on top of handcrafted physical simulators. In both cases, however, methods cannot learn automatically new physical environments and their laws as humans do. In this work, we successfully demonstrate, for the first time, learning unsupervised predictors of physical states, such as the position of objects in an environment, directly from raw visual observations and without relying on simulators. We do so in two steps: (i) we learn to track dynamically-salient objects in videos using causality and equivariance, two non-generative unsupervised learning principles that do not require manual or external supervision. (ii) we demonstrate that the extracted positions are sufficient to successfully train visual motion predictors that can take the underlying environment into account. We validate our predictors on synthetic datasets; then, we introduce a new dataset, Roll4Real, consisting of real objects rolling on complex terrains (pool table, elliptical bowl, and random height-field). We show that it is possible to learn reliable object trajectory extrapolators from raw videos alone, without any external supervision and with no more prior knowledge than the choice of a convolutional neural network architecture.
S. Ehrhardt and A. Monszpart—Contributed equally.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous systems (2015). tensorflow.org
Agrawal, P., et al.: Learning to poke by poking: experiential learning of intuitive physics. In: Proceedings of NIPS, pp. 5074–5082 (2016)
Battaglia, P., et al.: Interaction networks for learning about objects, relations and physics. In: Proceedings of NIPS, pp. 4502–4510 (2016)
Battaglia, P., Hamrick, J., Tenenbaum, J.: Simulation as an engine of physical scene understanding. PNAS 110(45), 18327–18332 (2013)
Bhattacharyya, A., et al.: Long-term image boundary prediction. In: Thirty-Second AAAI Conference on Artificial Intelligence. AAAI (2018)
Bradski, G.: The OpenCV library. Dr. Dobb’s J. Softw. Tools 120, 122–125 (2000)
Chang, M.B., et al.: A compositional object-based approach to learning physical dynamics. In: Proceedings of ICLR (2017)
Chiappa, S., et al.: Recurrent environment simulators (2017)
Denil, M., et al.: Learning to perform physics experiments via deep reinforcement learning. In: Deep Reinforcement Learning Workshop, NIPS (2016)
Ehrhardt, S., et al.: Learning A Physical Long-term Predictor. arXiv e-prints arXiv:1703.00247, March 2017
Ehrhardt, S., et al.: Learning to Represent Mechanics via Long-term Extrapolation and Interpolation. arXiv preprint arXiv:1706.02179, June 2017
Eslami, S.A., et al.: Attend, infer, repeat: fast scene understanding with generative models. In: Advances in Neural Information Processing Systems, pp. 3225–3233 (2016)
Finn, C., et al.: Deep spatial autoencoders for visuomotor learning. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 512–519. IEEE (2016)
Fragkiadaki, K., et al.: Learning visual predictive models of physics for playing billiards. In: Proceedings of NIPS (2016)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 249–256 (2010)
Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., Brox, T.: Flownet 2.0: evolution of optical flow estimation with deep networks (2017)
Kansky, K., et al.: Schema networks: zero-shot transfer with a generative causal model of intuitive physics. In: International Conference on Machine Learning, pp. 1809–1818 (2017)
Ladický, L., et al.: Data-driven fluid simulations using regression forests. ACM Trans. Graph. (TOG) 34(6), 199 (2015)
Lee, A.X., et al.: Stochastic adversarial video prediction. arXiv preprint arXiv:1804.01523 (2018)
Lerer, A., Gross, S., Fergus, R.: Learning physical intuition of block towers by example. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 430–438 (2016)
Li, W., Leonardis, A., Fritz, M.: Visual stability prediction and its application to manipulation. In: AAAI (2017)
Luc, P., Neverova, N., Couprie, C., Verbeek, J., LeCun, Y.: Predicting deeper into the future of semantic segmentation. In: ICCV (2017)
Misra, I., Zitnick, C.L., Hebert, M.: Shuffle and learn: unsupervised learning using temporal order verification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 527–544. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_32
Monszpart, A., Thuerey, N., Mitra, N.: SMASH: physics-guided reconstruction of collisions from videos. ACM Trans. Graph. (TOG) 35(6), 1–14 (2016)
Mottaghi, R., et al.: Newtonian scene understanding: unfolding the dynamics of objects in static images. In: IEEE CVPR (2016)
Mrowca, D., et al.: Flexible Neural Representation for Physics Prediction. arXiv e-prints (2018)
Novotny, D., et al.: Self-supervised learning of geometrically stable features through probabilistic introspection (2018)
Oh, J., et al.: Action-conditional video prediction using deep networks in atari games. In: Advances in Neural Information Processing Systems, pp. 2863–2871 (2015)
Ondruska, P., Posner, I.: Deep tracking: seeing beyond seeing using recurrent neural networks. In: Proceedings of AAAI (2016)
Riochet, R., et al.: IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning. arXiv e-prints (2018)
Sanborn, A.N., Mansinghka, V.K., Griffiths, T.L.: Reconciling intuitive physics and newtonian mechanics for colliding objects. Psychol. Rev. 120(2), 411 (2013)
Sanchez-Gonzalez, A., et al.: Graph networks as learnable physics engines for inference and control (2018)
Stewart, R., Ermon, S.: Label-free supervision of neural networks with physics and domain knowledge. In: AAAI, pp. 2576–2582 (2017)
Thewlis, J., Bilen, H., Vedaldi, A.: Unsupervised learning of object frames by dense equivariant image labelling. In: Advances in Neural Information Processing Systems (NIPS), pp. 844–855 (2017)
Tompson, J., et al.: Accelerating Eulerian Fluid Simulation With Convolutional Networks. arXiv e-print arXiv:1607.03597 (2016)
Watters, N., et al.: Visual interaction networks: learning a physics simulator from video. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 4542–4550. Curran Associates, Inc. (2017)
Wu, J., et al.: Galileo: perceiving physical object properties by integrating a physics engine with deep learning. In: Proceedings of NIPS, pp. 127–135 (2015)
Wu, J., et al.: Physics 101: learning physical object properties from unlabeled videos. In: Proceedings of BMVC (2016)
Wu, J., et al.: Learning to see physics via visual de-animation. In: Guyon, I., et al. (eds.) Advances in Neural Information Processing Systems (NIPS) 30, pp. 153–164. Curran Associates, Inc. (2017)
Wu, J., et al.: Learning to see physics via visual de-animation. In: Proceedings of NIPS (2017)
Acknowledgements
The authors would like to gratefully acknowledge the support of ERC 677195-IDIU and ERC SmartGeometry StG-2013-335373 grants.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ehrhardt, S., Monszpart, A., Mitra, N., Vedaldi, A. (2019). Unsupervised Intuitive Physics from Visual Observations. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11363. Springer, Cham. https://doi.org/10.1007/978-3-030-20893-6_44
Download citation
DOI: https://doi.org/10.1007/978-3-030-20893-6_44
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-20892-9
Online ISBN: 978-3-030-20893-6
eBook Packages: Computer ScienceComputer Science (R0)