Mirror world: creating digital twins of the space and persons from video streamings | The Visual Computer Skip to main content
Log in

Mirror world: creating digital twins of the space and persons from video streamings

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Creating digital twins of scenes has been widely studied in smart city applications. The development of 3D virtual games attracts many researchers. However, most products have high costs, neglect 3D recovery of the moving people, and lack structured analysis. In this paper, we present a mirror world based on PTZ cameras, which creates digital twins of the space and persons from video streamings, including real-time video registration for PTZ cameras and 3D recovery of the human pose and shape. Our goal is to quickly build a digital space to represent and visualize the physical world, leveraging the two tasks to improve the performance of structured scene understanding. We use the scene images from PTZ cameras to create the 3D scene model and propose an image edge alignment method to optimize the texture mismatching during real-time video registration. Then we propose a human analysis network for 3D recovery of the human pose and shape and add refinement to improve performance on two datasets. We place the 3D human bodies in the right position in the mirror world and utilize joint data from specific scenarios to drive the optimization of the system. Consequently, the mirror world provides 3D visualization and structured analysis of the scene with low cost, thereby enhancing users’ spatial understanding.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data and code availability

Some data that support the findings of this study are available from publicly accessible websites (LSP: https://sam.johnson.io/research//, MPII: http://human-pose.mpi-inf.mpg.de/, MS COCO: http://cocodataset.org/#download, Human3.6M: http://vision.imar.ro/human3.6m/description.php, MPI-INF-3DHP: https://vcai.mpi-inf.mpg.de/3dhp-dataset/). Other data cannot be shared due to informed consent regulations. Code cannot be shared due to informed consent regulations. Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

References

  1. Chen, J., Min, S., Zhou, Q.: Expression and visualization of cloverleaf junction in a 3-dimensional city model. GeoInformatica 4(4), 375–386 (2000)

    Article  Google Scholar 

  2. Tanin, E., Harwood, A., Samet, H., Nayar, D., Nutanong, S.: Building and querying a p2p virtual world. GeoInformatica 10(1), 91–116 (2006)

    Article  Google Scholar 

  3. Louwsma, J., Zlatanova, S., Lammeren, R.V., Oosterom, P.V.: Specifying and implementing constraints in GIS–with examples from a geo-virtual reality system. Geoinformatica 10, 531–550 (2006)

    Article  Google Scholar 

  4. Hildebrandt, D., Timm, R.: An assisting, constrained 3d navigation technique for multiscale virtual 3d city models. GeoInformatica 18(3), 537–567 (2014)

    Article  Google Scholar 

  5. Zhou, Y., Cao, M., You, J., Meng, M., Zhou, Z.: Mr video fusion: interactive 3d modeling and stitching on wide-baseline videos. In: the 24th ACM Symposium (2018)

  6. Sawhney, H.S., Arpa, A., Kumar, R., Samarasekera, S., Hanna, K.J.: Video flashlights: real time rendering of multiple videosfor immersive model visualization. In: Proceedings of the 13th Eurographics Workshop on Rendering Techniques, Pisa, Italy, June 26–28, 2002 (2002)

  7. Neumann, U., You, S., Hu, J., Jiang, B., Lee, J.: Augmented virtual environments (ave): dynamic fusion of imagery and 3d models. IEEE Virtual Reality 2003. DBLP (2003)

  8. Milosavljevic, A., Rancic, D., Dimitrijevic, A., Predic, B., Mihajlovic, V.: Integration of GIS and video surveillance. Int. J. Geogr. Inf. Sci. 30(9–10), 2089–2107 (2016)

    Google Scholar 

  9. Yang, Y., Chang, M.C., Tu, P., Lyu, S.: Seeing as it happens: Real time 3d video event visualization. In: IEEE International Conference on Image Processing (2015)

  10. Kim, K., Oh, S., Lee, J., Essa, I.A.: Augmenting aerial earth maps with dynamic information. In: IEEE International Symposium on Mixed & Augmented Reality (2009)

  11. Hu, Y., Wu, W., Zhou, Z.: Video driven pedestrian visualization with characteristic appearances, pp. 183–186 (2015). https://doi.org/10.1145/2821592.2821614

  12. Pan, C., Chen, Y., Wang, G.: Virtual-real fusion with dynamic scene from videos. In: 2016 International Conference on Cyberworlds (CW) (2016)

  13. Kanazawa, A., Black, M.J., Jacobs, D.W., Malik, J.: End-to-end recovery of human shape and pose. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (2018)

  14. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014). https://doi.org/10.1109/TPAMI.2013.248

    Article  Google Scholar 

  15. Mehta, D., Rhodin, H., Dan, C., Fua, P., Theobalt, C.: Monocular 3d human pose estimation in the wild using improved CNN supervision. In: IEEE Computer Society (2017)

  16. Takeuchi, Y.: Parallel cities. In: Proceedings of the 19th ACM Symposium on Virtual Reality Software and Technology. VRST ’13, pp. 189–192. Association for Computing Machinery, New York, NY, USA (2013). https://doi.org/10.1145/2503713.2503742

  17. Tao, F., Qi, Q.: Make more digital twins. Nature 573(7775), 490–491 (2019)

    Article  Google Scholar 

  18. Wache, H., Dinter, B.: The digital twin-birth of an integrated system in the digital age. In: Hawaii International Conference on System Sciences (HICSS 53 2020) (2020)

  19. Segal, M., Korobkin, C., van Widenfelt, R., Foran, J., Haeberli, P.: Fast shadows and lighting effects using texture mapping. SIGGRAPH Comput. Graph. 26(2), 249–252 (1992). https://doi.org/10.1145/142920.134071

    Article  Google Scholar 

  20. Decamp, P., Shaw, G., Kubat, R., Roy, D.: [ACM Press the International Conference—Firenze, Italy (2010.10.25–2010.10.29)] Proceedings of the International Conference on Multimedia—mm \(\ddot{1}0\)—An Immersive System for Browsing and Visualizing Surveillance Video (2010)

  21. Loper, M., Mahmood, N., Romero, J., Pons-Moll, G., Black, M.J.: SMPL: a skinned multi-person linear model. ACM Trans. Graph. 10(1145/2816795), 2818013 (2015)

    Google Scholar 

  22. Kolotouros, N., Pavlakos, G., Black, M., Daniilidis, K.: Learning to reconstruct 3d human pose and shape via model-fitting in the loop. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2020)

  23. Kolotouros, N., Pavlakos, G., Daniilidis, K.: Convolutional mesh regression for single-image human shape reconstruction. In: IEEE (2019)

  24. Tan, J.K.V., Budvytis, I., Cipolla, R.: Indirect Deep Structured Learning for 3D Human Body Shape and Pose Prediction (2017)

  25. Tekin, B., Katircioglu, I., Salzmann, M., Lepetit, V., Fua, P.: Structured prediction of 3d human pose with deep neural networks. arXiv preprint arXiv:1605.05180 (2016)

  26. Zhou, X., Sun, X., Zhang, W., Liang, S., Wei, Y.: Deep kinematic pose regression. In: European Conference on Computer Vision, pp. 186–201. Springer (2016)

  27. Pavlakos, G., Zhu, L., Zhou, X., Daniilidis, K.: Learning to estimate 3d human pose and shape from a single color image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 459–468 (2018)

  28. Omran, M., Lassner, C., Pons-Moll, G., Gehler, P., Schiele, B.: Neural body fitting: unifying deep learning and model based human pose and shape estimation. In: 2018 International Conference on 3D Vision (3DV), pp. 484–494. IEEE (2018)

  29. Zhao, L., Peng, X., Tian, Y., Kapadia, M., Metaxas, D.N.: Semantic graph convolutional networks for 3d human pose regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3425–3435 (2019)

  30. Moon, G., Lee, K.M.: I2l-meshnet: image-to-lixel prediction network for accurate 3d human pose and mesh estimation from a single RGB image. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pp. 752–768. Springer (2020)

  31. Arnab, A., Doersch, C., Zisserman, A.: Exploiting temporal context for 3d human pose estimation in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3395–3404 (2019)

  32. Li, S., Ke, L., Pratama, K., Tai, Y.-W., Tang, C.-K., Cheng, K.-T.: Cascaded deep monocular 3d human pose estimation with evolutionary training data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6173–6183 (2020)

  33. Zanfir, A., Marinoiu, E., Sminchisescu, C.: Monocular 3d pose and shape estimation of multiple people in natural scenes-the importance of multiple scene constraints. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2148–2157 (2018)

  34. Bogo, F., Kanazawa, A., Lassner, C., Gehler, P., Romero, J., Black, M.J.: Keep it SMPL: automatic estimation of 3d human pose and shape from a single image. In: European Conference on Computer Vision, pp. 561–578. Springer (2016)

  35. Lassner, C., Romero, J., Kiefel, M., Bogo, F., Black, M.J., Gehler, P.V.: Unite the people: Closing the loop between 3d and 2d human representations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6050–6059 (2017)

  36. Mehta, D., Sridhar, S., Sotnychenko, O., Rhodin, H., Shafiei, M., Seidel, H.-P., Xu, W., Casas, D., Theobalt, C.: VNect: Real-time 3d human pose estimation with a single RGB camera. ACM Trans. Graph. (TOG) 36(4), 1–14 (2017)

    Article  Google Scholar 

  37. Chen, D., Song, Y., Liang, F., Ma, T., Zhu, X., Jia, T.: 3d human body reconstruction based on SMPL model. Vis. Comput. 39, 1–14 (2022)

    Google Scholar 

  38. Brau, E., Jiang, H.: 3d human pose estimation via deep learning from 2d annotations. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 582–591. IEEE (2016)

  39. Fang, H.-S., Xu, Y., Wang, W., Liu, X., Zhu, S.-C.: Learning pose grammar to encode human body configuration for 3d pose estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

  40. Tome, D., Russell, C., Agapito, L.: Lifting from the deep: Convolutional 3d pose estimation from a single image. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2500–2509 (2017)

  41. Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y.: Towards 3d human pose estimation in the wild: a weakly-supervised approach. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 398–407 (2017)

  42. Zhou, X., Zhu, M., Pavlakos, G., Leonardos, S., Derpanis, K.G., Daniilidis, K.: Monocap: Monocular human motion capture using a CNN coupled with a geometric prior. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 901–914 (2018)

    Article  Google Scholar 

  43. Tripathi, S., Ranade, S., Tyagi, A., Agrawal, A.: Posenet3d: Unsupervised 3d human shape and pose estimation. arXiv e-prints, 2003 (2020)

  44. Wang, K., Zhang, G., Yang, J.: 3d human pose and shape estimation with dense correspondence from a single depth image. Vis. Comput. 39, 1–13 (2022)

    Google Scholar 

  45. Wu, J., Hu, D., Xiang, F., Yuan, X., Su, J.: 3d human pose estimation by depth map. Vis. Comput. 36(7), 1401–1410 (2020)

    Article  Google Scholar 

  46. Lenz, R.K., Tsai, R.Y.: Techniques for calibration of the scale factor and image center for high accuracy 3-d machine vision metrology. IEEE Trans. Pattern Anal. Mach. Intell. 10(5), 713–720 (1988)

    Article  Google Scholar 

  47. Li, M., Lavest, J.-M.: Some aspects of zoom lens camera calibration. IEEE Trans. Pattern Anal. Mach. Intell. 18(11), 1105–1110 (1996)

  48. Bay, H., Tuytelaars, T., Van Gool, L.: Surf: Speeded up robust features. In: European Conference on Computer Vision, pp. 404–417. Springer (2006)

  49. Liang, X., Gong, K., Shen, X., Lin, L.: Look into person: joint body parsing & pose estimation network and a new benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 41(4), 871–885 (2018)

    Article  Google Scholar 

  50. He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630–645. Springer (2016)

  51. Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)

  52. Johnson, S., Everingham, M.: Clustered pose and nonlinear appearance models for human pose estimation. In: Bmvc, vol. 2, p. 5. Citeseer (2010)

  53. Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2d human pose estimation: New benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693 (2014)

  54. Lin, T., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Zitnick, C.: Microsoft Coco: Common Objects in Context in European Conference on Computer Cision, pp. 740–755. Springer, Cham (2014)

  55. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)

  56. Martinez, J., Hossain, R., Romero, J., Little, J.J.: A simple yet effective baseline for 3d human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2640–2649 (2017)

  57. Yang, W., Ouyang, W., Wang, X., Ren, J., Li, H., Wang, X.: 3d human pose estimation in the wild by adversarial learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5255–5264 (2018)

  58. Hossain, M.R.I., Little, J.J.: Exploiting temporal information for 3d human pose estimation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 68–84 (2018)

  59. Li, J., Xu, C., Chen, Z., Bian, S., Yang, L., Lu, C.: Hybrik: A hybrid analytical-neural inverse kinematics solution for 3d human pose and shape estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3383–3393 (2021)

  60. Wandt, B., Rosenhahn, B.: Repnet: Weakly supervised training of an adversarial reprojection network for 3d human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7782–7791 (2019)

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of China under Grant No. 62272018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhong Zhou.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work; there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (mp4 128035 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiang, L., Cai, L., Wu, W. et al. Mirror world: creating digital twins of the space and persons from video streamings. Vis Comput 40, 6689–6704 (2024). https://doi.org/10.1007/s00371-023-03193-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-03193-2

Keywords