Abstract
Humans demonstrate a remarkable ability to parse complicated motion sequences into their constituent structures and motions. We investigate this problem, attempting to learn the structure of one or more articulated objects, given a time series of two-dimensional feature positions. We model the observed sequence in terms of “stick figure” objects, under the assumption that the relative joint angles between sticks can change over time, but their lengths and connectivities are fixed. The problem is formulated as a single probabilistic model that includes multiple sub-components: associating the features with particular sticks, determining the proper number of sticks, and finding which sticks are physically joined. We test the algorithm on challenging datasets of 2D projections of optical human motion capture and feature trajectories from real videos.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abdel-Malek, K., Arora, J., Beck, S., Bhatti, M., Carroll, J., Cook, T., Dasgupta, S., Grosland, N., Han, R., Kim, H., Lu, J., Swan, C., Williams, A., & Yang, J. Digital human modeling and virtual reality for FCS (Technical Report VSR-04.02). The Virtual Soldier Research (VSR) Program, Center for Computer-Aided Design, College of Engineering, The University of Iowa, October 2004.
Bray, M., Kohli, P., & Torr, P. (2006). Posecut: Simultaneous segmentation and 3d pose estimation of humans using dynamic graph-cuts. In ECCV (2), pp. 642–655.
Costeira, J., & Kanade, T. (1996). A multi-body factorization method for motion analysis. In Image understanding workshop (pp. 1013–1026).
Costeira, J. P., & Kanade, T. (1998). A multibody factorization method for independently moving-objects. International Journal of Computer Vision, 29(3), 159–179.
Cover, T.M., & Thomas, J.A. (1991). Elements of information theory. New York: Wiley-Interscience.
Culverhouse, P. F., & Wang, H. (2003). Robust motion segmentation by spectral clustering. In British machine vision conference (pp. 639–648).
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the em algorithm. Journal of the Royal Statistical Society, 39, 1–38.
Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315, 972–976.
Gear, C. W. (1998). Multibody grouping from motion images. International Journal of Computer Vision, 29(2), 133–150. doi:10.1023/A:1008026310903. ISSN 0920-5691.
Ghahramani, Z., & Hinton, G. E. (1996a). The EM algorithm for mixtures of factor analyzers (Technical Report CRG-TR-96-1). University of Toronto.
Ghahramani, Z., & Hinton, G. E. (1996b). Parameter estimation for linear dynamical systems (Technical Report CRG-TR-96-2). University of Toronto.
Golub, G. H., & Van Loan, C. F. (1996). Matrix computations. Baltimore: Johns Hopkins Press.
Gruber, A., & Weiss, Y. (2003). Factorization with uncertainty and missing data: Exploiting temporal coherence. In Thrun, S., Saul, L. K., & Schölkopf, B. (Eds.) Advances in Neural Information Processing Systems. Cambridge: MIT Press. ISBN0-262-20152-6.
Gruber, A., & Weiss, Y. (2004). Multibody factorization with uncertainty and missing data using the EM algorithm. In Proceedings of IEEE conference on computer vision and pattern recognition (pp. 707–714).
Hartley, R., & Zisserman, A. (2003). Multiple view geometry. Cambridge: Cambridge University Press.
Herda, L., Fua, P., Plankers, R., Boulic, R., & Thalmann, D. (2001). Using skeleton-based tracking to increase the reliability of optical motion capture. Human Movement Science Journal, 20(3), 313–341.
Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. Perception and Psychophysics, 14, 201–211.
Kirk, A. G., O’Brien, J. F., & Forsyth, D. A. (2005). Skeletal parameter estimation from optical motion capture data. In Proceedings of IEEE conference on computer vision and pattern recognition. Los Alamitos: IEEE Comput. Soc. ISBN 0-7695-2372-2.
Neal, R., & Hinton, G. (1998). A view of the em algorithm that justifies incremental, sparse, and other variants. In Jordan, M. I. (Ed.) Learning in graphical models. Norwell: Kluwer Academic.
Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: analysis and an algorithm. In Advances in neural information processing systems (NIPS).
Ross, D. A. (2008a). Learning probabilistic models for visual motion (PhD thesis). University of Toronto, Ontario, Canada.
Ross, D. A. (2008b). Learning probabilistic models for visual motion (PhD thesis). University of Toronto, Toronto, Ontario, Canada.
Ross, D. A., & Zemel, R. S. (2006). Learning parts-based representations of data. Journal of Machine Learning Research, 7, 2369–2397.
Ross, D. A., Tarlow, D., & Zemel, R. S. (2007). Learning articulated skeletons from motion. In Workshop on dynamical vision at ICCV.
Ross, D. A., Tarlow, D., & Zemel, R. S. (2008). Unsupervised learning of skeletons from motion. In Forsyth, D., Torr, P., & Zisserman, A. (Eds.) Proceedings of the 10th European conference on computer vision (ECCV 2008). Berlin: Springer.
Shi, J., & Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8), 888–905.
Shi, J., & Tomasi, C. (1994). Good features to track. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), (pp. 593–600).
Silaghi, M. C., Plankers, R., Boulic, R., Fua, P., & Thalmann, D. (1998). Local and global skeleton fitting techniques for optical motion capture, modeling and motion capture techniques for virtual environments. In Lecture notes in artificial intelligence (pp. 26–40). Berlin: Springer.
Sminchisescu, C., & Triggs, B. (2003). Estimating articulated human motion with covariance scaled sampling. International Journal of Robotics Research, 22(6), 371–393.
Song, Y., Goncalves, L., & Perona, P. (2003). Unsupervised learning of human motion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(7), 814–827.
Song, Y., Goncalves, L., & Perona, P. (2001). Learning probabilistic structure for human motion detection. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR) (pp. 771–777). Los Alamitos: IEEE Comput. Soc. ISBN 0-7695-1272-0.
Taycher, L., Fisher III, J. W., & Darrell, T. (2002). Recovering articulated model topology from observed rigid motion. In Becker, S., Thrun, S., & Obermayer, K. (Eds.) Advances in neural information processing systems (NIPS) (pp. 1311–1318). Cambridge: MIT Press.
Tomasi, C., & Kanade, T. (1992). Shape and motion from image streams under orthography: a factorization method. International Journal of Computer Vision, 9, 137–154.
Tresadern, P., & Reid, I. (2005). Articulated structure from motion by factorization. In CVPR ’05: proceedings of the 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05) (Vol. 2, pp. 1110–1115). Washington: IEEE Comput. Soc. doi:10.1109/CVPR.2005.75. ISBN 0-7695-2372-2.
Viklands, T. (2006). Algorithms for the weighted orthogonal Procrustes problem and other least squares problems (PhD thesis). Umeå University, Umeå, Sweden.
Weiss, Y. (1999). Segmentation using eigenvectors: a unifying view. In Proceedings of the international conference on computer vision (ICCV).
Yan, J., & Pollefeys, M. (2005a). Factorization-based approach to articulated motion recovery. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR).
Yan, J., & Pollefeys, M. (2005b). Articulated motion segmentation using ransac with priors. In Workshop on dynamical vision (ICCV).
Yan, J., & Pollefeys, M. (2006a). A general framework for motion segmentation: Independent, articulated, rigid, non-rigid, degenerate and non-degenerate. In Proceedings computer vision—ECCV 2006, 9th European conference on computer vision, Part III, Graz, Austria, May 7–13.
Yan, J., & Pollefeys, M. (2006b). Automatic kinematic chain building from feature trajectories of articulated objects. In Proceedings of IEEE conference on computer vision and pattern recognition (CVPR).
Yan, J., & Pollefeys, M. (2008). A factorization-based approach for articulated nonrigid shape, motion and kinematic chain recovery from video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(5), 865–877. ISSN 0162-8828. http://doi.ieeecomputersociety.org/10.1109/TPAMI.2007.70739.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ross, D.A., Tarlow, D. & Zemel, R.S. Learning Articulated Structure and Motion. Int J Comput Vis 88, 214–237 (2010). https://doi.org/10.1007/s11263-010-0325-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-010-0325-y