Abstract
This chapter discusses the needs for standard datasets in the articulated pose estimation and tracking communities. It describes the datasets that are currently available and the performance of state-of-the-art methods on them. We discuss issues of ground-truth collection and quality, complexity of appearance and poses, evaluation metrics and partitioning of data. We also discusses limitations of current datasets and possible directions in developing new datasets for future use.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Note that in some publications this dataset is also referred to as the “Iterative Image Parsing” (IIP) dataset.
- 2.
The dataset underwent modification since the publication of [16], see the documentation provided with the dataset for details.
- 3.
- 4.
In HumanEva synchronization was obtained through off-line optimization, and in HumanEva-II the video frames were synchronized in hardware.
- 5.
- 6.
- 7.
Optical motion capture systems are unable to deal with loose clothing that does not drape tightly over the limbs of the body.
- 8.
- 9.
- 10.
Those observations are abridged from the editorial written by Leonid Sigal and Michael J. Black [46].
References
Agarwal, A., Triggs, B.: 3d human pose from silhouettes by relevance vector regression. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2, pp. 882–888 (2004)
Agarwal, A., Triggs, B.: Recovering 3d human pose from monocular images. IEEE Trans. Pattern Anal. Mach. Intell. 28, 44–58 (2006)
Andriluka, M., Roth, S., Schiele, B.: Pictorial structures revisited: People detection and articulated pose estimation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2009)
Andriluka, M., Roth, S., Schiele, B.: Monocular 3d pose estimation and tracking by detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)
Belongie, S., Malik, J., Puzicha, J.: Shape context: A new descriptor for shape matching and object recognition. In: Advances in Neural Information Processing Systems (2000)
Bergtholdt, M., Kappes, J.H., Schmidt, S., Schnörr, C.: A study of parts-based object class detection using complete graphs. Int. J. Comput. Vis. 87(1–2), 93–117 (2010)
Bo, L., Sminchisescu, C.: Twin Gaussian processes for structured prediction. Int. J. Comput. Vis. 87(1–2), 28–52 (2010)
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: IEEE International Conference on Computer Vision (2009). http://www.eecs.berkeley.edu/~lbourdev/h3d/
Brubaker, M., Fleet, D., Hertzmann, A.: Physics-based person tracking using the anthropomorphic walker. Int. J. Comput. Vis. 87(1–2), 140–155 (2010)
Corazza, S., Mündermann, L., Gambaretto, E., Ferrigno, G., Andriacchi, T.: Markerless motion capture through visual hull, articulated ICP and subject specific model generation. Int. J. Comput. Vis. 87(1–2), 156–169 (2010)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2005)
Eichner, M., Ferrari, V.: Better appearance models for pictorial structures. In: British Machine Vision Conference (2009). http://www.vision.ee.ethz.ch/~calvin/ethz_pascal_stickmen/index.html
Eichner, M., Ferrari, V.: We are family: Joint pose estimation of multiple persons. In: European Conference on Computer Vision (2010)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010). http://pascallin.ecs.soton.ac.uk/challenges/VOC/
Felzenszwalb, P.F., Huttenlocher, D.P.: Pictorial structures for object recognition. Int. J. Comput. Vis. 61(1), 55–79 (2005)
Ferrari, V., Marin-Jimenez, M., Zisserman, A.: Progressive search space reduction for human pose estimation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008). http://www.robots.ox.ac.uk/~vgg/data/stickmen/index.html
Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial structures. IEEE Trans. Comput. C-22(1), 67–92 (1973)
Fossati, A., Dimitrijevic, M., Lepetit, V., Fua, P.: Bridging the gap between detection and tracking for 3D monocular video-based motion capture. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2007)
Freifeld, O., Weiss, A., Zuff, S., Black, M.J.: Contour people: A parameterized model of 2D articulated human shape. In: Computer Vision and Pattern Recognition (2010)
Gall, J., Rosenhahn, B., Brox, T., Seidel, H.-P.: Optimization and filtering for human motion capture. Int. J. Comput. Vis. 87(1–2), 75–92 (2010)
Gammeter, S., Ess, A., Jaeggli, T., Schindler, K., Leibe, B., Van Gool, L.: Articulated multi-body tracking under egomotion. In: European Conference on Computer Vision (2008)
Ganapathi, V., Plagemann, C., Koller, D., Thrun, S.: Real time motion capture using a single time-of-flight camera. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). http://ai.stanford.edu/~varung/cvpr10/
Gupta, A., Kembhavi, A., Davis, L.S.: Observing human–object interactions: Using spatial and functional compatibility for recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1775–1789 (2009)
Hasler, N., Rosenhahn, B., Thormahlen, T., Wand, M., Gall, J., Seidel, H.-P.: Markerless motion capture with unsynchronized moving cameras. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2009)
Hogg, D.: Model-based vision: a program to see a walking person. Image Vis. Comput. 1(1), 5–20 (1983)
Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)
Ionescu, C., Bo, L., Sminchisescu, C.: Structural SVM for visual localization and continuous state estimation. In: IEEE International Conference on Computer Vision (2009)
Jiang, H.: Human pose estimation using consistent max-covering. In: IEEE International Conference on Computer Vision (2009)
Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: British Machine Vision Conference (2010)
Johnson, S., Everingham, M.: Learning effective human pose estimation from inaccurate annotation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011)
Kjellström, H., Kragić, D., Black, M.J.: Tracking people interacting with objects. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)
Kumar, M.P., Zisserman, A., Torr, P.H.S.: Efficient discriminative learning of parts-based models. In: IEEE International Conference on Computer Vision (2009)
Lan, X., Huttenlocher, D.P.: Beyond trees: Common-factor models for 2d human pose recovery. In: IEEE International Conference on Computer Vision (2005)
Lee, C.-S., Elgammal, A.: Coupled visual and kinematic manifold models for tracking. Int. J. Comput. Vis. 87(1–2), 118–139 (2010)
Lee, M.W., Cohen, I.: Proposal maps driven MCMC for estimating human body pose in static images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2004)
Li, R., Tian, T.-P., Sclaroff, S., Yang, M.-H.: 3d human motion tracking with a coordinated mixture of factor analyzers. Int. J. Comput. Vis. 87(1–2), 170–190 (2010)
Liu, Y., Stoll, C., Gall, J., Seidel, H.P., Theobalt, C.: Markerless motion capture of interacting characters using multi-view image segmentation. In: Computer Vision and Pattern Recognition (2011)
Ning, H., Xu, W., Gong, Y., Huang, T.: Latent pose estimator for continuous action recognition. In: European Conference on Computer Vision, pp. 419–433 (2008)
Peursum, P., Venkatesh, S., West, G.: A study on smoothing for particle filtered 3d human body tracking. Int. J. Comput. Vis. 87(1–2), 53–74 (2010)
Pons-Moll, G., Baak, A., Helten, T., Müller, M., Seidel, H.-P., Rosenhahn, B.: Multisensor-fusion for 3d full-body human motion capture. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). http://www.tnt.uni-hannover.de/project/MPI08_Database/
Ramanan, D.: Learning to parse images of articulated bodies. In: Advances in Neural Information Processing Systems (2006). http://www.ics.uci.edu/~dramanan/papers/parse/people.zip
Ren, X., Berg, A.C., Malik, J.: Recovering human body configurations using pairwise constraints between parts. In: IEEE International Conference on Computer Vision (2005)
Sapp, B., Jordan, C., Taskar, B.: Adaptive pose priors for pictorial structures. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)
Shakhnarovich, G., Viola, P., Darrell, T.: Fast pose estimation with parameter-sensitive hashing. In: IEEE International Conference on Computer Vision, vol. 2, pp. 750–759 (2003)
Sigal, L., Balan, A.O., Black, M.J.: Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion. Int. J. Comput. Vis. 87(1–2), 4–27 (2010). http://vision.cs.brown.edu/humaneva/index.html
Sigal, L., Black, M.J.: Guest editorial: State of the art in image- and video-based human pose and motion estimation. Int. J. Comput. Vis. 87(1–2), 1–3 (2010)
Singh, V., Nevatia, R., Huang, C.: Efficient inference with multiple heterogeneous part detectors for human pose estimation. In: European Conference on Computer Vision (2010)
Sminchisescu, C., Kanaujia, A., Li, Z., Metaxas, D.: Conditional visual tracking in kernel space. In: Advances in Neural Information Processing Systems (2005)
Sminchisescu, C., Kanaujia, A., Metaxas, D.: Learning joint top–down and bottom–up processes for 3d visual inference. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)
Tian, T.-P., Sclaroff, S.: Fast globally optimal 2d human detection with loopy graph models. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)
Tran, D., Forsyth, D.: Improved human parsing with a full relational model. In: European Conference on Computer Vision (2010)
Urtasun, R., Darrell, T.: Local probabilistic regression for activity-independent human pose inference. In: IEEE International Conference on Computer Vision (2009)
Vlasic, D., Adelsberger, R., Vannucci, G., Barnwell, J., Gross, M., Matusik, W., Popović, J.: Practical motion capture in everyday surroundings. ACM Trans. Graph. 26(3), 35 (2007)
Wang, P., Rehg, J.M.: A modular approach to the analysis and evaluation of particle filters for figure tracking. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 790–797 (2006). http://www.cc.gatech.edu/~pingwang/Project/FigureTracking.html
Yang, Y., Ramanan, D.: Articulated pose estimation with flexible mixtures-of-parts. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011)
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human–object interaction activities. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010). http://ai.stanford.edu/~bangpeng/resource/mutual_context_annotation.rar
Zhang, J., Luo, J., Collins, R., Liu, Y.: Body localization in still images using hierarchical models and hybrid search. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag London Limited
About this chapter
Cite this chapter
Andriluka, M., Sigal, L., Black, M.J. (2011). Benchmark Datasets for Pose Estimation and Tracking. In: Moeslund, T., Hilton, A., Krüger, V., Sigal, L. (eds) Visual Analysis of Humans. Springer, London. https://doi.org/10.1007/978-0-85729-997-0_13
Download citation
DOI: https://doi.org/10.1007/978-0-85729-997-0_13
Publisher Name: Springer, London
Print ISBN: 978-0-85729-996-3
Online ISBN: 978-0-85729-997-0
eBook Packages: Computer ScienceComputer Science (R0)