COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation | SpringerLink
Skip to main content

COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15074))

Included in the following conference series:

  • 375 Accesses

Abstract

Estimating global human motion from moving cameras is challenging due to the entanglement of human and camera motions. To mitigate the ambiguity, existing methods leverage learned human motion priors, which however often result in oversmoothed motions with misaligned 2D projections. To tackle this problem, we propose COIN, a control-inpainting motion diffusion prior that enables fine-grained control to disentangle human and camera motions. Although pre-trained motion diffusion models encode rich motion priors, we find it non-trivial to leverage such knowledge to guide global motion estimation from RGB videos. COIN introduces a novel control-inpainting score distillation sampling method to ensure well-aligned, consistent, and high-quality motion from the diffusion prior within a joint optimization framework. Furthermore, we introduce a new human-scene relation loss to alleviate the scale ambiguity by enforcing consistency among the humans, camera, and scene. Experiments on three challenging benchmarks demonstrate the effectiveness of COIN, which outperforms the state-of-the-art methods in terms of global human motion estimation and camera motion estimation. As an illustrative example, COIN outperforms the state-of-the-art method by 33% in world joint position error (W-MPJPE) on the RICH dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Akhter, I., Black, M.J.: Pose-conditioned joint angle limits for 3D human pose reconstruction. In: CVPR (2015)

    Google Scholar 

  2. Aksan, E., Kaufmann, M., Hilliges, O.: Structured prediction helps 3D human motion modelling. In: ICCV (2019)

    Google Scholar 

  3. Barsoum, E., Kender, J., Liu, Z.: HP-GAN: probabilistic 3D human motion prediction via GAN. In: CVPR Workshops (2018)

    Google Scholar 

  4. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3D human pose and shape from a single image. In: ECCV (2016)

    Google Scholar 

  5. Cao, Z., Gao, H., Mangalam, K., Cai, Q.-Z., Vo, M., Malik, J.: Long-term human motion prediction with scene context. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 387–404. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_23

    Chapter  Google Scholar 

  6. Choi, H., Moon, G., Lee, K.M.: Pose2Mesh: graph convolutional network for 3D human pose and mesh recovery from a 2D human pose. In: ECCV (2020)

    Google Scholar 

  7. Choi, H., Moon, G., Lee, K.M.: Beyond static features for temporally consistent 3D human pose and shape from a video. In: CVPR (2021)

    Google Scholar 

  8. Choutas, V., Pavlakos, G., Bolkart, T., Tzionas, D., Black, M.J.: Monocular expressive body regression through body-driven attention. In: ECCV (2020)

    Google Scholar 

  9. Contributors, M.: OpenMMLab pose estimation toolbox and benchmark. https://github.com/open-mmlab/mmpose (2020)

  10. Dabral, R., Shimada, S., Jain, A., Theobalt, C., Golyanik, V.: Gravity-aware monocular 3D human-object reconstruction. In: ICCV (2021)

    Google Scholar 

  11. Fabbri, M., Lanzi, F., Calderara, S., Alletto, S., Cucchiara, R.: Compressed volumetric heatmaps for multi-person 3D pose estimation. In: CVPR (June 2020)

    Google Scholar 

  12. Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: ICCV (2015)

    Google Scholar 

  13. Gärtner, E., Andriluka, M., Xu, H., Sminchisescu, C.: Trajectory optimization for physics-based reconstruction of 3D human pose from monocular video. In: CVPR (2022)

    Google Scholar 

  14. Goel, S., Pavlakos, G., Rajasegaran, J., Kanazawa, A., Malik, J.: Humans in 4D: reconstructing and tracking humans with transformers. In: ICCV (2023)

    Google Scholar 

  15. Gopalakrishnan, A., Mali, A., Kifer, D., Giles, L., Ororbia, A.G.: A neural temporal model for human motion prediction. In: CVPR (2019)

    Google Scholar 

  16. Guler, R.A., Kokkinos, I.: HoloPose: holistic 3D human reconstruction in-the-wild. In: CVPR (2019)

    Google Scholar 

  17. Guzov, V., Mir, A., Sattler, T., Pons-Moll, G.: Human POSEitioning system (HPS): 3D human pose estimation and self-localization in large scenes from body-mounted sensors. In: CVPR (2021)

    Google Scholar 

  18. Harvey, F.G., Yurick, M., Nowrouzezahrai, D., Pal, C.: Robust motion in-betweening. ACM Trans. Graph. (TOG) 39(4), 60–1 (2020)

    Article  Google Scholar 

  19. Hassan, M., et al.: Stochastic scene-aware motion prediction. In: ICCV (2021)

    Google Scholar 

  20. Hassan, M., Choutas, V., Tzionas, D., Black, M.J.: Resolving 3D human pose ambiguities with 3D scene constraints. In: ICCV, pp. 2282–2292 (2019)

    Google Scholar 

  21. He, C., Saito, J., Zachary, J., Rushmeier, H., Zhou, Y.: NeMF: neural motion fields for kinematic animation. In: NeurIPS (2022)

    Google Scholar 

  22. Henning, D.F., Laidlow, T., Leutenegger, S.: Bodyslam: Joint camera localisation, mapping, and human motion tracking. In: ECCV (2022)

    Google Scholar 

  23. Hernandez, A., Gall, J., Moreno-Noguer, F.: Human motion prediction via spatio-temporal inpainting. In: CVPR (2019)

    Google Scholar 

  24. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPs (2020)

    Google Scholar 

  25. Huang, C.H.P., et al.: Capturing and inferring dense full-body human-scene contact. In: CVPR (2022)

    Google Scholar 

  26. Huang, S., et al.: Diffusion-based generation, optimization, and planning in 3D scenes. In: CVPR (2023)

    Google Scholar 

  27. Iqbal, U., Molchanov, P., Kautz, J.: Weakly-supervised 3D human pose learning via multi-view images in the wild. In: CVPR (2020)

    Google Scholar 

  28. Iqbal, U., Xie, K., Guo, Y., Kautz, J., Molchanov, P.: KAMA: 3D keypoint aware body mesh articulation. In: 3DV (2021)

    Google Scholar 

  29. Isogawa, M., Yuan, Y., O’Toole, M., Kitani, K.M.: Optical non-line-of-sight physics-based 3D human pose estimation. In: CVPR (2020)

    Google Scholar 

  30. Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-RNN: deep learning on spatio-temporal graphs. In: CVPR (2016)

    Google Scholar 

  31. Jiang, W., Kolotouros, N., Pavlakos, G., Zhou, X., Daniilidis, K.: Coherent reconstruction of multiple humans from a single image. In: CVPR (2020)

    Google Scholar 

  32. Joo, H., Neverova, N., Vedaldi, A.: Exemplar fine-tuning for 3D human pose fitting towards in-the-wild 3D human pose estimation. In: 3DV (2021)

    Google Scholar 

  33. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: CVPR (2018)

    Google Scholar 

  34. Kanazawa, A., Zhang, J.Y., Felsen, P., Malik, J.: Learning 3D human dynamics from video. In: CVPR (2019)

    Google Scholar 

  35. Karunratanakul, K., Preechakul, K., Aksan, E., Beeler, T., Suwajanakorn, S., Tang, S.: Optimizing diffusion noise can serve as universal motion priors. arXiv preprint arXiv:2312.11994 (2023)

  36. Kaufmann, M., Aksan, E., Song, J., Pece, F., Ziegler, R., Hilliges, O.: Convolutional autoencoders for human motion infilling. In: 3DV (2020)

    Google Scholar 

  37. Kaufmann, M., et al.: EMDB: the Electromagnetic Database of Global 3D Human Pose and Shape in the Wild. In: ICCV (2023)

    Google Scholar 

  38. Khurana, T., Dave, A., Ramanan, D.: Detecting invisible people. In: ICCV, pp. 3174–3184 (2021)

    Google Scholar 

  39. Kocabas, M., Athanasiou, N., Black, M.J.: VIBE: video inference for human body pose and shape estimation. In: CVPR (2020)

    Google Scholar 

  40. Kocabas, M., Huang, C.H.P., Tesch, J., Müller, L., Hilliges, O., Black, M.J.: SPEC: seeing people in the wild with an estimated camera. In: ICCV (2021)

    Google Scholar 

  41. Kocabas, M., et al.: PACE: human and motion estimation from in-the-wild videos. In: 3DV (2024)

    Google Scholar 

  42. Kolotouros, N., Pavlakos, G., Black, M.J., Daniilidis, K.: Learning to reconstruct 3D human pose and shape via model-fitting in the loop. In: ICCV (2019)

    Google Scholar 

  43. Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: CVPR (2019)

    Google Scholar 

  44. Kolotouros, N., Pavlakos, G., Jayaraman, D., Daniilidis, K.: Probabilistic modeling for human mesh recovery. In: ICCV (2021)

    Google Scholar 

  45. Kundu, J.N., Rakesh, M., Jampani, V., Venkatesh, R.M., Babu1, R.V.: Appearance consensus driven self-supervised human mesh recovery. In: ECCV (2020)

    Google Scholar 

  46. Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: closing the loop between 3D and 2D human representations. In: CVPR (2017)

    Google Scholar 

  47. Li, J., Bian, S., Xu, C., Liu, G., Yu, G., Lu, C.: D &d: learning human dynamics from dynamic camera. In: ECCV (2022)

    Google Scholar 

  48. Li, J., Xu, C., Chen, Z., Bian, S., Yang, L., Lu, C.: HybrIK: a hybrid analytical-neural inverse kinematics solution for 3D human pose and shape estimation. In: CVPR (2021)

    Google Scholar 

  49. Li, Z., Liu, J., Zhang, Z., Xu, S., Yan, Y.: CLIFF: carrying location information in full frames into human pose and shape estimation. In: ECCV (2022)

    Google Scholar 

  50. Li, Z., Zhou, Y., Xiao, S., He, C., Huang, Z., Li, H.: Auto-conditioned recurrent networks for extended complex human motion synthesis. arXiv preprint arXiv:1707.05363 (2017)

  51. Lin, K., Wang, L., Liu, Z.: End-to-end human pose and mesh reconstruction with transformers. In: CVPR (2021)

    Google Scholar 

  52. Liu, M., Yang, D., Zhang, Y., Cui, Z., Rehg, J.M., Tang, S.: 4D human body capture from egocentric video via 3D scene grounding. In: 3DV (2021)

    Google Scholar 

  53. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. SIGGRAPH Asia 34(6), 248:1–248:16 (2015)

    Google Scholar 

  54. Luo, Z., Golestaneh, S.A., Kitani, K.M.: 3D human motion estimation via motion compression and refinement. In: ACCV (2020)

    Google Scholar 

  55. Luo, Z., Hachiuma, R., Yuan, Y., Kitani, K.: Dynamics-regulated kinematic policy for egocentric pose estimation. NeurIPS 34 (2021)

    Google Scholar 

  56. Mahmood, N., Ghorbani, N., Troje, N.F., Pons-Moll, G., Black, M.J.: AMASS: archive of motion capture as surface shapes. In: ICCV (2019)

    Google Scholar 

  57. von Marcard, T., Henschel, R., Black, M., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: ECCV (2018)

    Google Scholar 

  58. Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: CVPR (2017)

    Google Scholar 

  59. Mehta, D., et al.: Monocular 3D human pose estimation in the wild using improved CNN supervision. In: 3DV (2017)

    Google Scholar 

  60. Mehta, D., et al.: XNect: real-time multi-person 3D motion capture with a single RGB camera. In: SIGGRAPH (2020)

    Google Scholar 

  61. Mehta, D., et al.: VNect: Real-time 3D human pose estimation with a single RGB camera. In: SIGGRAPH (2017)

    Google Scholar 

  62. Moon, G., Chang, J.Y., Lee, K.M.: Camera distance-aware top-down approach for 3D multi-person pose estimation from a single RGB image. In: ICCV (2019)

    Google Scholar 

  63. Moon, G., Lee, K.M.: I2L-MeshNet: image-to-Lixel prediction network for accurate 3D human pose and mesh estimation from a single RGB image. In: ECCV (2020)

    Google Scholar 

  64. Müller, L., Osman, A.A.A., Tang, S., Huang, C.H.P., Black, M.J.: On self contact and human pose. In: CVPR (2021)

    Google Scholar 

  65. Müller, L., Ye, V., Pavlakos, G., Black, M., Kanazawa, A.: Generative Proxemics: a prior for 3D social interaction from images. arXiv preprint arXiv:2306.09337 (2023)

  66. OpenSfM - a structure from motion library. https://github.com/mapillary/OpenSfM (2021). https://github.com/mapillary/OpenSfM

  67. Pavlakos, G., et al.: Expressive Body Capture: 3D hands, face, and body from a single image. In: CVPR (2019)

    Google Scholar 

  68. Pavlakos, G., Kolotouros, N., Daniilidis, K.: TexturePose: supervising human mesh estimation with texture consistency. In: ICCV (2019)

    Google Scholar 

  69. Pavlakos, G., Weber, E., Tancik, M., Kanazawa, A.: The one where they reconstructed 3D humans and environments in TV shows. In: ECCV (2022)

    Google Scholar 

  70. Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3D human pose and shape from a single color image. In: CVPR (2018)

    Google Scholar 

  71. Pavllo, D., Grangier, D., Auli, M.: QuaterNet: a quaternion-based recurrent model for human motion. In: BMVC (2018)

    Google Scholar 

  72. Payer, C., Neff, T., Bischof, H., Urschler, M., Stern, D.: Simultaneous multi-person detection and single-person pose estimation with a single heatmap regression network. In: ICCV PoseTrack Workshop (2017)

    Google Scholar 

  73. Petrovich, M., Black, M.J., Varol, G.: Action-conditioned 3D human motion synthesis with transformer VAE. In: ICCV (2021)

    Google Scholar 

  74. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: DreamFusion: Text-to-3D using 2D diffusion. In: ICLR (2023)

    Google Scholar 

  75. Reddy, N.D., Guigues, L., Pischulini, L., Eledath, J., Narasimhan, S.: TesseTrack: end-to-end learnable multi-person articulated 3D pose tracking. In: CVPR (2021)

    Google Scholar 

  76. Rempe, D., Birdal, T., Hertzmann, A., Yang, J., Sridhar, S., Guibas, L.J.: HuMoR: 3D human motion model for robust pose estimation. In: ICCV (2021)

    Google Scholar 

  77. Rogez, G., Weinzaepfel, P., Schmid, C.: LCR-Net: localization-classification-regression for human pose. In: CVPR (2017)

    Google Scholar 

  78. Rong, Y., Liu, Z., Li, C., Cao, K., Change Loy, C.: Delving deep into hybrid annotations for 3D human recovery in the wild. In: ICCV (2019)

    Google Scholar 

  79. Sárándi, I., Hermans, A., Leibe, B.: Learning 3D human pose estimation from dozens of datasets using a geometry-aware autoencoder to bridge between skeleton formats. In: WACV (2023)

    Google Scholar 

  80. Shimada, S., Golyanik, V., Xu, W., Theobalt, C.: PhysCap: physically plausible monocular 3D motion capture in real time. In: SIGGRAPH (2020)

    Google Scholar 

  81. Shin, S., Kim, J., Halilaj, E., Black, M.J.: WHAM: reconstructing world-grounded humans with accurate 3D motion. In: CVPR (2024)

    Google Scholar 

  82. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: ICML (2015)

    Google Scholar 

  83. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR (2021)

    Google Scholar 

  84. Song, J., Chen, X., Hilliges, O.: Human body model fitting by learned gradient descent. In: ECCV (2020)

    Google Scholar 

  85. Sun, Y., Bao, Q., Liu, W., Fu, Y., Black, M.J., Mei, T.: Monocular, one-stage, regression of multiple 3D people. In: ICCV (2021)

    Google Scholar 

  86. Sun, Y., Bao, Q., Liu, W., Mei, T., Black, M.J.: TRACE: 5D temporal regression of avatars with dynamic cameras in 3D environments. In: CVPR (2023)

    Google Scholar 

  87. Sun, Y., Liu, W., Bao, Q., Fu, Y., Mei, T., Black, M.J.: Putting people in their place: Monocular regression of 3D people in depth. In: CVPR (2022)

    Google Scholar 

  88. Sun, Y., Ye, Y., Liu, W., Gao, W., Fu, Y., , Mei, T.: Human mesh recovery from monocular images via a skeleton-disentangled representation. In: ICCV (2019)

    Google Scholar 

  89. Teed, Z., Deng, J.: DROID-SLAM: deep Visual SLAM for Monocular, Stereo, and RGB-D Cameras. In: NeurIPs (2021)

    Google Scholar 

  90. Teed, Z., Lipson, L., Deng, J.: Deep patch visual odometry. In: NeurIPS (2023)

    Google Scholar 

  91. Tevet, G., Raab, S., Gordon, B., Shafir, Y., Cohen-Or, D., Bermano, A.H.: Human motion diffusion model. In: ICLR 2023 (2022)

    Google Scholar 

  92. Villegas, R., Yang, J., Zou, Y., Sohn, S., Lin, X., Lee, H.: Learning to generate long-term future via hierarchical prediction. In: ICML (2017)

    Google Scholar 

  93. Weng, Z., Yeung, S.: Holistic 3D human and scene mesh estimation from single view images. In: CVPR (2021)

    Google Scholar 

  94. Xiang, D., Joo, H., Sheikh, Y.: Monocular total capture: Posing face, body and hands in the wild. In: CVPR (2019)

    Google Scholar 

  95. Xie, K., Wang, T., Iqbal, U., Guo, Y., Fidler, S., Shkurti, F.: Physics-based human motion estimation and synthesis from videos. In: ICCV (2021)

    Google Scholar 

  96. Xie, Y., Jampani, V., Zhong, L., Sun, D., Jiang, H.: OmniControl: control any joint at any time for human motion generation. In: ICLR (2024)

    Google Scholar 

  97. Xu, Y., Zhu, S.C., Tung, T.: DenseRaC: joint 3D pose and shape estimation by dense render-and-compare. In: ICCV (2019)

    Google Scholar 

  98. Yan, X., et al.: MT-VAE: learning motion transformations to generate multimodal human dynamics. In: ECCV (2018)

    Google Scholar 

  99. Ye, V., Pavlakos, G., Malik, J., Kanazawa, A.: Decoupling human and camera motion from videos in the wild. In: CVPR (2023)

    Google Scholar 

  100. Yuan, Y., Iqbal, U., Molchanov, P., Kitani, K., Kautz, J.: GLAMR: global occlusion-aware human mesh recovery with dynamic cameras. In: CVPR (2022)

    Google Scholar 

  101. Yuan, Y., Kitani, K.: Diverse trajectory forecasting with determinantal point processes. In: ICLR 2020 (2019)

    Google Scholar 

  102. Yuan, Y., Kitani, K.: DLow: diversifying latent flows for diverse human motion prediction. In: ECCV (2020)

    Google Scholar 

  103. Yuan, Y., Kitani, K.: Residual force control for agile human behavior imitation and extended motion synthesis. In: NeurIPS (2020)

    Google Scholar 

  104. Yuan, Y., Song, J., Iqbal, U., Vahdat, A., Kautz, J.: PhysDiff: physics-guided human motion diffusion model. In: ICCV (2023)

    Google Scholar 

  105. Yuan, Y., Wei, S.E., Simon, T., Kitani, K., Saragih, J.: SimPoE: simulated character control for 3D human pose estimation. In: CVPR (2021)

    Google Scholar 

  106. Zanfir, A., Bazavan, E.G., Xu, H., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: Weakly supervised 3D human pose and shape reconstruction with normalizing flows. In: ECCV (2020)

    Google Scholar 

  107. Zanfir, A., Marinoiu, E., Sminchisescu, C.: Monocular 3D pose and shape estimation of multiple people in natural scenes the importance of multiple scene constraints. In: CVPR (2018)

    Google Scholar 

  108. Zanfir, A., Marinoiu, E., Zanfir, M., Popa, A.I., Sminchisescu, C.: Deep network for the integrated 3D sensing of multiple people in natural images. In: NeurIPS (2018)

    Google Scholar 

  109. Zanfir, M., Zanfir, A., Bazavan, E.G., Freeman, W.T., Sukthankar, R., Sminchisescu, C.: THUNDR: transformer-based 3D human reconstruction with markers. In: ICCV (2021)

    Google Scholar 

  110. Zhang, H., et al.: PyMAF: 3D human pose and shape regression with pyramidal mesh alignment feedback loop. In: ICCV (2021)

    Google Scholar 

  111. Zhang, J., Yu, D., Liew, J.H., Nie, X., Feng, J.: Body meshes as points. In: CVPR (2021)

    Google Scholar 

  112. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models (2023)

    Google Scholar 

  113. Zhang, M., et al.: MotionDiffuse: text-driven human motion generation with diffusion model. arXiv preprint arXiv:2208.15001 (2022)

  114. Zhang, S., et al.: RoHM: robust human motion reconstruction via diffusion. In: CVPR (2024)

    Google Scholar 

  115. Zhang, S., Zhang, Y., Bogo, F., Pollefeys, M., Tang, S.: Learning motion priors for 4D human body capture in 3D scenes. In: ICCV (2021)

    Google Scholar 

  116. Zhang, T., Huang, B., Wang, Y.: Object-occluded human shape and pose estimation from a single color image. In: CVPR (2020)

    Google Scholar 

  117. Zhen, J., Fang, Q., Sun, J., Liu, W., Jiang, W., Bao, H., Zhou, X.: SMAP: single-shot multi-person absolute 3D pose estimation. In: ECCV (2020)

    Google Scholar 

  118. Zhou, Y., Habermann, M., Habibie, I., Tewari, A., Theobalt, C., Xu, F.: Monocular real-time full body capture with inter-part correlations. In: CVPR (2021)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiefeng Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 461 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, J. et al. (2025). COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15074. Springer, Cham. https://doi.org/10.1007/978-3-031-72640-8_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72640-8_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72639-2

  • Online ISBN: 978-3-031-72640-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics