Abstract
Semantic mapping is still challenging for household collaborative robots. Deep learning models have proved their capability to extract semantics from the scene and learn robot odometry. For interfacing semantic information with robot odometry, existing approaches extract both semantics and robot odometry separately and then integrate them using fusion techniques. Such approaches face many issues while integration, and the mapping procedure requires a lot of memory and resources to process the information. In an attempt to produce accurate semantic mapping with resource-limited devices, this paper proposes an efficient deep learning-based model to simultaneously estimate robot odometry by using monocular sequence frames and detecting objects in the frames. The proposed model includes two main components: using a YOLOv3 object detector as a backbone and a convolutional long short-term (Conv-LSTM) recurrent neural network to model the changes in camera pose. The unique advantage of the proposed model is that it boycotts the need for data association and the requirement of multi-sensor fusion. We conducted the experiments on a LoCoBot robot in a laboratory environment, attaining satisfactory results with such limited computational resources. Additionally, we tested the proposed method on the Kitti dataset, reaching an average test loss of 15.93 on various sequences. The experiments are documented in this video https://www.youtube.com/watch?v=hnmqwxpaTEw.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Han X, Li S, Wang X, Zhou W (2021) Semantic mapping for mobile robots in indoor scenes: a survey. Information 12(2):21, 4734. https://doi.org/10.3390/s21144734
Chen Y, Zhang J, Lou Y (2021) Topological and semantic map generation for mobile robot indoor navigation. In: International conference on intelligent robotics and applications. Springer, Cham, pp 337–347
Maolanon P, Sukvichai K, Chayopitak N, Takahashi A (2019) Indoor room identify and mapping with virtual based slam using furnitures and household objects relationship based on cnns. In: 10th International conference of information and communication technology for embedded systems (IC-ICTES), pp 1-6. IEEE
Narita G, Seno T, Ishikawa T, Kaji Y (2019) Panopticfusion: Online volumetric semantic mapping at the level of stuff and things. In: 2019 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 4205-4212. IEEE
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 779-788. IEEE
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision (ECCV). Springer, Cham, pp 21–37
Stadnik AV, Sazhin PS, Hnatic S (2020) Comparative performance analysis of neural network real-time object detections in different implementations. In: EPJ web of conferences, Vol. 226, p 02020. EDP Sciences
Wang S, Clark R, Wen H, Trigoni N (2017) Deepvo: Towards end-to-end visual odometry with deep recurrent convolutional neural networks. In: 2017 IEEE international conference on robotics and automation (ICRA), pp 2043-2050. IEEE
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Liu Y, Sun P, Wergeles N, Shang Y (2021) A survey and performance evaluation of deep learning methods for small object detection. Expert Syst Appl 172:114602. https://doi.org/10.1016/j.eswa.2021.114602
Bouguettaya A, Yu Q, Liu X, Zhou X, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42(5):2785–2797
Krul S, Pantos C, Frangulea M, Valente J (2021) Visual SLAM for indoor livestock and farming using a small drone with a monocular camera: a feasibility study. Drones 5(2):41
Alsadik B, Karam S (2021) The simultaneous localization and mapping (SLAM)-An overview. Surv Geospatial Eng J 2(01):01–12
Ismail H, Roy R, Sheu LJ, Chieng WH, Tang LC (2022) Exploration-based SLAM (e-SLAM) for the indoor mobile robot using lidar. Sensors 22(4):1689
Pham TT, Reid I, Latif Y, Gould S (2015) Hierarchical higher-order regression forest fields: an application to 3d indoor scene labelling. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp. 2246-2254. IEEE
Mozos OM, Triebel R, Jensfelt P, Rottmann A, Burgard W (2007) Supervised semantic labeling of places using information extracted from sensor data. Robot Autonom Syst 55(5):391–402
Vineet V, Miksik O, Lidegaard M, Nießner M, Golodetz S, Prisacariu VA, Torr PH (2015, May) Incremental dense semantic stereo fusion for large-scale semantic scene reconstruction. In: 2015 IEEE international conference on robotics and automation (ICRA), pp. 75-82. IEEE
Kundu A, Li Y, Dellaert F, Li F, Rehg JM (2014) Joint semantic segmentation and 3d reconstruction from monocular video. In: European conference on computer vision (ECCV). Springer, Cham, pp 703–718
Nistér D, Naroditsky O, Bergen J (2004) Visual odometry. In: Proceedings of the 2004 IEEE conference on computer vision and pattern recognition (CVPR), Vol 1, pp. 1-8. IEEE
Kerl C, Sturm J, Cremers D (2013) Dense visual SLAM for RGB-D cameras. In: 2013 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 2100-2106. IEEE
Taketomi T, Uchiyama H, Ikeda S (2017) Visual SLAM algorithms: a survey from 2010 to 2016. IPSJ Trans Comput Vis Appl 9(1):1–11
Davison AJ, Reid ID, Molton ND, Stasse O (2007) MonoSLAM: real-time single camera SLAM. IEEE Trans Pattern Anal Mach Intell 29(6):1052–1067
Davison AJ (2003) Real-time simultaneous localisation and mapping with a single camera. In: IEEE international conference on computer vision, Vol. 3, pp 1403-1403. IEEE
Davison AJ, Reid ID, Molton ND, Stasse O (2007) MonoSLAM: real-time single camera SLAM. IEEE Trans Pattern Anal Mach Intell 29(6):1052–1067
Civera J, Davison AJ, Montiel JM (2008) Inverse depth parametrization for monocular SLAM. IEEE Trans Robot 24(5):932–945
Martinez-Cantin R, Castellanos JA (2005) Unscented SLAM for large-scale outdoor environments. In: 2005 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 3427-3432. IEEE
Chekhlov D, Pupilli M, Mayol-Cuevas W, Calway A (2006) Real-time and robust monocular SLAM using predictive multi-resolution descriptors. In: International symposium on visual computing, pp 276-285. Springer
Holmes S, Klein G, Murray DW (2008) A square root unscented Kalman filter for visual monoSLAM. In: 2008 IEEE International conference on robotics and automation (ICRA), pp 3710-3716. IEEE
Klein G, Murray D (2007) Parallel tracking and mapping for small AR workspaces. In: 6th IEEE and ACM international symposium on mixed and augmented reality, pp. 225-234. IEEE
Klein G, Murray D (2008) Improving the agility of keyframe-based SLAM. In: European conference on computer vision (ECCV), pp. 802-815. Springer
Geiger A, Ziegler J, Stiller C (2011) Stereoscan: Dense 3d reconstruction in real-time. In: 2011 IEEE intelligent vehicles symposium (IV), pp 963-968. IEEE
Mur-Artal R, Montiel JMM, Tardos JD (2015) ORB-SLAM: a versatile and accurate monocular SLAM system. IEEE Trans Robot 31(5):1147–1163
Newcombe RA, Lovegrove SJ, Davison AJ (2011) DTAM: Dense tracking and mapping in real-time. In: 2011 International conference on computer vision (CVPR), pp 2320-2327. IEEE
Abdel-Nasser M, Mahmoud K (2019) Accurate photovoltaic power forecasting models using deep LSTM-RNN. Neural Comput Appl 31(7):2727–2740
Jiao J, Jiao J, Mo Y, Liu W, Deng Z (2019) MagicVO: an end-to-end hybrid CNN and bi-LSTM method for monocular visual odometry. IEEE Access 7:94118–94127
Alzaidy R, Caragea C, Giles CL (2019) Bi-LSTM-CRF sequence labeling for keyphrase extraction from scholarly documents. In: The world wide web conference, pp 2551-2557
Pandey T, Pena D, Byrne J, Moloney D (2021) Leveraging deep learning for visual odometry using optical flow. Sensors 21(4):1313. https://doi.org/10.3390/s21041313
Ban X, Wang H, Chen T, Wang Y and Xiao Y (2021) Monocular Visual Odometry Based on Depth and Optical Flow Using Deep Learning. IEEE Trans Instrument Measure 70:1-19, Art no. 2501619. https://doi.org/10.1109/TIM.2020.3024011
Lalapura VS, Amudha J, Satheesh HS (2021) Recurrent neural networks for edge intelligence: a survey. ACM Comput Surv (CSUR) 54(4):1–38
Abdel-Nasser M, Mahmoud K, Lehtonen M (2021) HIFA: promising heterogeneous solar irradiance forecasting approach based on Kernel mapping. IEEE Access 9:144906–144915
Liu Y, Wang H, Wang J, Wang X (2021) Unsupervised monocular visual odometry based on confidence evaluation. IEEE Trans Intell Transp Syst. Early access, 1-10, https://doi.org/10.1109/TITS.2021.3053412
Coughlan J, Yuille AL (2000) The Manhattan world assumption: Regularities in scene statistics which enable Bayesian inference. In: Proceedings of the 13th international conference on neural information processing systems, pp 809-815
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Zitnick CL (2014) September) Microsoft coco: Common objects in context. In: European conference on computer vision (ECCV). Springer, Cham, pp 740–755
Pyrobot (accessed date: 21 March 2022). https://pyrobot.org
Keselman L, Iselin Woodfill J, Grunnet-Jepsen A, Bhowmik A (2017) Intel realsense stereoscopic depth cameras. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1-10
Geiger A, Lenz P, Stiller C, Urtasun R (2013) Vision meets robotics: the Kitti dataset. Int J Robot Res 32(11):1231–1237
Han X, Li S, Wang X, Zhou W (2021) Semantic mapping for mobile robots in indoor scenes: a survey. Information 12(2):92. https://doi.org/10.3390/info12020092
Zeng Z, Zhou Y, Jenkins OC, Desingh K (2018) Semantic mapping with simultaneous object detection and localization. In: 2018 IEEE/RSJ IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 911-918. IEEE
Mazurek P, Hachaj T (2021) SLAM-OR: simultaneous localization, mapping and object recognition using video sensors data in open environments from the sparse points cloud. Sensors 21(14):4734
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Singh, A., Narula, R., Rashwan, H.A. et al. Efficient deep learning-based semantic mapping approach using monocular vision for resource-limited mobile robots. Neural Comput & Applic 34, 15617–15631 (2022). https://doi.org/10.1007/s00521-022-07273-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-07273-7