Abstract
Roots play a critical role in the functioning of plants. However, it is still challenging to generate detailed 3D models of thin and complicated plant roots, due to the complexity of the structure and the limited textures. Limited by the difficulty of realization and inaccessibility of labeled data for training, few works have been put in exploring this problem using deep neural networks. To overcome this limitation, this paper presents a structure-from-motion based deep neural network structure for plant root reconstruction in a self-supervised manner, which can be applied by mobile phone platforms. In the training process of deep structure-from-motion, each depth is constrained from the depth map and predicted relative poses from their adjacent frames captured by the mobile phone cameras, and the LSTM-based network after CNN for pose estimation is learnt from the ego-motion constraints by further exploiting the temporal relationship between consecutive frames. IMU unit in the mobile phone is further utilized to improve the pose estimation network by continuously updating the correct scales from the gyroscope and accelerometer moment. Our proposed approach is able to solve the scale ambiguity in recovering the absolute scale of the real plant roots so that the approach can promote the performance of camera pose estimation and scene reconstruction jointly. The experimental results on both real plant root dataset and the rendered synthetic root dataset demonstrate the superior performance of our method compared with the classical and state-of-the-art learning-based structure-from-motion methods.
Similar content being viewed by others
References
Agarwal S, Furukawa Y, Snavely N, Simon I, Curless B, Seitz SM, Szeliski R (2011) Building rome in a day. Commun ACM 54(10):105–112
Agarwal S, Snavely N, Simon I, Seitz SM, Szeliski R (2009) Building rome in a day. In: IEEE international conference on computer vision, pp 72–79
Almalioglu Y, Saputra MRU, de Gusmao PP, Markham A, Trigoni N (2019) Ganvo: unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. In: 2019 International conference on robotics and automation (ICRA), pp 5474–5480. IEEE
Beardsley P, Torr P, Zisserman A (1996) 3d model acquisition from extended image sequences. In: European conference on computer vision, pp 683–695. Springer
Bian JW, Li Z, Wang N, Zhan H, Shen C, Cheng MM, Reid I (2019) Unsupervised scale-consistent depth and ego-motion learning from monocular video. arXiv:1908.10553
Bloesch M, Burri M, Omari S, Hutter M, Siegwart R (2017) Iterated extended kalman filter based visual-inertial odometry using direct photometric feedback. The Int J of Rob Res 36(10):1053–1072
Bloesch M, Omari S, Hutter M, Siegwart R (2015) Robust visual inertial odometry using a direct ekf-based approach. In: 2015 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 298–304. IEEE
Chen J, Ngo CW (2016) Deep-based ingredient recognition for cooking recipe retrieval. In: Proceedings of the 24th ACM international conference on multimedia, pp 32–41
Chen JJ, Ngo CW, Chua TS (2017) Cross-modal recipe retrieval with rich food attributes. In: Proceedings of the 25th ACM international conference on multimedia, pp 1771–1779
Choy CB, Xu D, Gwak J, Chen K, Savarese S (2016) 3d-r2n2: a unified approach for single and multi-view 3d object reconstruction. In: European conference on computer vision, pp 628–644. Springer
Cui H, Gao X, Shen S, Hu Z (2017) Hsfm: hybrid structure-from-motion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1212–1221
Dellaert F, Seitz SM, Thorpe CE, Thrun S (2000) Structure from motion without correspondence. In: Proceedings IEEE conference on computer vision and pattern recognition. CVPR 2000 (Cat. No. PR00662), vol 2, pp 557–564. IEEE
Fan H, Su H, Guibas LJ (2017) A point set generation network for 3d object reconstruction from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 605– 613
Farenzena M, Fusiello A, Gherardi R (2009) Structure-and-motion pipeline on a hierarchical cluster tree. In: 2009 IEEE 12th international conference on computer vision workshops, ICCV workshops, pp 1489–1496. IEEE
Faugeras OD, Luong QT, Maybank SJ (1992) Camera self-calibration: theory and experiments. In: European conference on computer vision, pp 321–334. Springer
Feng T, Gu D (2019) Sganvo: unsupervised deep visual odometry and depth estimation with stacked generative adversarial networks. IEEE Rob Autom Lett 4(4):4431–4437
Fischler MA, Bolles RC (1981) Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun ACM 24(6):381–395
Frahm JM, Fite-Georgel P, Gallup D, Johnson T, Raguram R, Wu C, Jen YH, Dunn E, Clipp B, Lazebnik S et al (2010) Building rome on a cloudless day. In: European conference on computer vision, pp 368–381. Springer
Garg R, BG VK, Carneiro G, Reid I (2016) Unsupervised cnn for single view depth estimation: geometry to the rescue. In: ECCV, pp 740–756
Gherardi R, Farenzena M, Fusiello A (2010) Improving the efficiency of hierarchical structure-and-motion. In: 2010 IEEE computer society conference on computer vision and pattern recognition, pp 1594–1600. IEEE
Godard C, Mac Aodha O, Brostow GJ (2017) Unsupervised monocular depth estimation with left-right consistency. In: CVPR
Godard C, Mac Aodha O, Firman M, Brostow G (2019) Digging into self-supervised monocular depth estimation. ICCV
Haas JK (2014) A history of the unity game engine
Hartley R, Zisserman A (2003) Multiple view geometry in computer vision. Cambridge University Press
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp 1026–1034
Huang W, Liu H, Wan W (2020) Online initialization and extrinsic spatial-temporal calibration for monocular visual-inertial odometry. arXiv:2004.05534
Jiang N, Cui Z, Tan P (2013) A global linear method for camera pose registration. In: Proceedings of the IEEE international conference on computer vision, pp 481–488
Jones ES, Soatto S (2011) Visual-inertial navigation, mapping and localization: a scalable real-time causal approach. The Int J Rob Res 30(4):407–430
Khan M, Gemenet DC, Villordon A (2016) Root system architecture and abiotic stress tolerance: current knowledge in root and tuber crops. Front Plant Sci 7:1584
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Leutenegger S, Furgale P, Rabaud V, Chli M, Konolige K, Siegwart R (2013) Keyframe-based visual-inertial slam using nonlinear optimization. Proceedings of robotis science and systems (RSS) 2013
Leutenegger S, Lynen S, Bosse M, Siegwart R, Furgale P (2015) Keyframe-based visual–inertial odometry using nonlinear optimization. The Int J Rob Res 34(3):314–334
Li K, Ma J, Li H, Han Y, Yue X, Chen Z, Yang J (2019) Discern depth under foul weather: estimate pm2.5 for depth inference. IEEE Trans Industr Inform
Li M, Mourikis AI (2013) High-precision, consistent ekf-based visual-inertial odometry. The Int J Rob Res 32(6):690–711
Li X, Hou Y, Wu Q, Wang P, Li W (2019) Dvonet: unsupervised monocular depth estimation and visual odometry. In: 2019 IEEE visual communications and image processing (VCIP), pp 1–4. IEEE
Ma J, Li K, Han Y, Du P, Yang J (2018) Image-based pm2. 5 estimation and its application on depth estimation. In: 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1857–1861. IEEE
Moulon P, Monasse P, Marlet R (2012) Adaptive structure from motion with a contrario model estimation. In: Asian conference on computer vision, pp 257–270. Springer
Moulon P, Monasse P, Marlet R (2013) Global fusion of relative motions for robust, accurate and scalable structure from motion. In: Proceedings of the IEEE international conference on computer vision, pp 3248–3255
Mourikis AI, Roumeliotis SI (2007) A multi-state constraint kalman filter for vision-aided inertial navigation. In: Proceedings 2007 IEEE international conference on robotics and automation, pp 3565–3572. IEEE
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Nath Kundu J, Krishna Uppala P, Pahuja A, Venkatesh Babu R (2018) Adadepth: unsupervised content congruent adaptation for depth estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2656–2665
Newcombe RA, Lovegrove SJ, Davison AJ (2011) Dtam: dense tracking and mapping in real-time. In: 2011 International conference on computer vision, pp 2320–2327. IEEE
Poggi M, Tosi F, Mattoccia S (2018) Learning monocular depth estimation with unsupervised trinocular assumptions. In: 2018 International conference on 3d vision (3DV), pp 324–333. IEEE
Pollefeys M, Koch R, Van Gool L (1999) Self-calibration and metric reconstruction inspite of varying and unknown intrinsic camera parameters. Int J Comput Vis 32(1):7–25
Pollefeys M, Nistér D, Frahm JM, Akbarzadeh A, Mordohai P, Clipp B, Engels C, Gallup D, Kim SJ, Merrell P et al (2008) Detailed real-time urban 3d reconstruction from video. Int J Comput Vis 78(2-3):143–167
Qin T, Li P, Shen S (2018) Vins-mono: a robust and versatile monocular visual-inertial state estimator. IEEE Trans Robot 34(4):1004–1020
Rogers ED, Benfey PN (2015) Regulation of plant root system architecture: implications for crop advancement. Curr Opin Biotechnol 32:93–98
Schonberger JL, Frahm JM (2016) Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4104–4113
Snavely N (2011) Scene reconstruction and visualization from internet photo collections: a survey. IPSJ Trans Comput Vis Appl 3:44–66
Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3d. In: ACM Siggraph 2006 papers, pp 835–846
Sweeney C, Sattler T, Hollerer T, Turk M, Pollefeys M (2015) Optimizing the viewing graph for structure-from-motion. In: Proceedings of the IEEE international conference on computer vision, pp 801–809
Tanskanen P, Naegeli T, Pollefeys M, Hilliges O (2015) Semi-direct ekf-based monocular visual-inertial odometry. In: 2015 IEEE/RSJ International conference on intelligent robots and systems (IROS), pp 6073–6078. IEEE
Thrun S, Burgard W, Fox D (2005) Probabilistic robotics. 2005. Massachusetts Institute of Technology, USA
Triggs B, McLauchlan PF, Hartley RI, Fitzgibbon AW (1999) Bundle adjustment—a modern synthesis. In: International workshop on vision algorithms, pp 298–372. Springer
Ummenhofer B, Zhou H, Uhrig J, Mayer N, Ilg E, Dosovitskiy A, Brox T (2017) Demon: depth and motion network for learning monocular stereo. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5038–5047
Wang D, Pan Q, Zhao C, Hu J, Liu L, Tian L (2016) Slam-based cooperative calibration for optical sensors array with gps/imu aided. In: 2016 International conference on unmanned aircraft systems (ICUAS), pp 615–623. IEEE
Wang K, Shen S (2018) Mvdepthnet: real-time multiview depth estimation neural network. In: 2018 International conference on 3d vision (3DV), pp 248–257. IEEE
Wang N, Zhang Y, Li Z, Fu Y, Liu W, Jiang YG (2018) Pixel2mesh: generating 3d mesh models from single rgb images. In: Proceedings of the European conference on computer vision (ECCV), pp 52–67
Wang S, Clark R, Wen H, Trigoni N (2017) Deepvo: towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE International conference on robotics and automation (ICRA), pp 2043–2050. IEEE
Weiss SM (2012) Vision based navigation for micro helicopters. Ph.D. thesis, ETH Zurich
Wilson K, Snavely N (2014) Robust global translations with 1dsfm. In: European conference on computer vision, pp 61–75. Springer
Wu A, Han Y (2018) Multi-modal circulant fusion for video-to-language and backward. In: IJCAI, vol 3, p 8
Wu C (2013) Towards linear-time incremental structure from motion. In: 2013 International conference on 3d vision-3DV 2013, pp 127–134. IEEE
Wu C, Agarwal S, Curless B, Seitz SM (2011) Multicore bundle adjustment. In: CVPR, pp 3057–3064. IEEE
Wu C et al (2011) Visualsfm: a visual structure from motion system
Yang B, Wen H, Wang S, Clark R, Markham A, Trigoni N (2017) 3d object reconstruction from a single depth view with adversarial learning. In: Proceedings of the IEEE international conference on computer vision, pp 679–688
Yin Z, Shi J (2018) Geonet: unsupervised learning of dense depth, optical flow and camera pose. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1983–1992
You Y, Wang Y, Chao WL, Garg D, Pleiss G, Hariharan B, Campbell M, Weinberger KQ (2019) Pseudo-lidar++: accurate depth for 3d object detection in autonomous driving. arXiv:1906.06310
Zebedin L, Bauer J, Karner K, Bischof H (2008) Fusion of feature-and area-based information for urban buildings modeling from aerial imagery. In: European conference on computer vision, pp 873–886. Springer
Zhan H, Garg R, Weerasekera CS, Li K, Agarwal H, Reid I (2018) Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In: CVPR
Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1851–1858
Zhu S, Shen T, Zhou L, Zhang R, Wang J, Fang T, Quan L (2017) Parallel structure from motion from local increment to global averaging. arXiv:1702.08601
Zhu S, Zhang R, Zhou L, Shen T, Fang T, Tan P, Quan L (2018) Very large-scale global sfm by distributed motion averaging. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4568–4577
Zou Y, Luo Z, Huang JB (2018) Df-net: unsupervised joint learning of depth and flow using cross-task consistency. In: Proceedings of the European conference on computer vision (ECCV), pp 36–53
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lu, Y., Wang, Y., Chen, Z. et al. 3D plant root system reconstruction based on fusion of deep structure-from-motion and IMU. Multimed Tools Appl 80, 17315–17331 (2021). https://doi.org/10.1007/s11042-020-10069-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-10069-3