Abstract
The 3D object recognition has become one of hot topics in computer vision with the increasing of application scenarios of 3D data, such as robotic systems, autonomous driving, and security check systems using active millimeter wave. Although 3D convolutional neural network (CNN) has achieved some good results in 3D object recognition, its key performances such as computational efficiency and realtimeness still need to be improved due to its huge amount of parameters of 3D convolutions. In this paper, we present a powerful tool LVNet which is a lightweight volumetric CNN designed for real-time and high-performance recognition of 3D objects. Meanwhile, all of standard 3D convolutions are replaced with depthwise separable convolutions in the LVNet so as to reduce the model size and computation complexity. Furthermore, the attention mechanism is combined with the depthwise separable convolutions to compensate for the performance loss caused by the reduction of parameter number. In order to further improve the performance of LVNet, some auxiliary methods are employed also, such as data augmentation with multiple rotations of objects and information fusion of different orientations. A series of experimental results on public datasets show that the proposed LVNet achieves competitive recognition performance with less burden of computation and memory.
Similar content being viewed by others
Data Availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Rani S, Lakhwani K, Kumar S (2022) Three dimensional objects recognition & pattern recognition technique; related challenges: A review. Multimed Tools Appl 81(12):17303–17346
Li B, Zhang Y, Sun F (2022) Deep residual neural network based PointNet for 3D object part segmentation. Multimed Tools Appl 81:11933–11947
Zhong Y, Sun Z, Luo S, Sun Y, Wang Y (2022) Video supervised for 3D reconstruction from single image. Multimed Tools Appl 81(11):15061–15083
Liang J, Zhou T, Liu D, Wang W (2023) CLUSTSEG: Clustering for Universal Segmentation. arXiv:2305.02187
Wang W, Liang J, Liu D (2022) Learning equivariant segmentation with instance-unique querying. Adv Neural Inf Process Syst 35:12826–12840
Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945-953
Xu Y, Zheng C, Xua R, Quan Y, Ling H (2021) Multi-View 3D Shape Recognition via Correspondence-Aware Deep Learning. IEEE Trans Image Process 30:5299–5312
Qi CR, Su H, Mo K, Guibas LJ (2017) PointNet: Deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652-660
Qi CR, Yi L, Su H, Guibas LJ (2017) PointNet++: Deep hierarchical feature learning on point sets in a metric space. arXiv:1706.02413
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3D ShapeNets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912-1920
Maturana D, Scherer S (2015) VoxNet: A 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 922-928
Sedaghat N, Zolfaghari M, Amiri E, Brox T (2016) Orientation-boosted voxel nets for 3D object recognition. arXiv:1604.03351
Qi CR, Su H, NieSSner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5648-5656
Brock A, Lim T, Ritchie JM, Weston N (2016) Generative and discriminative voxel modeling with convolutional neural networks. arXiv:1608.04236
Wang C, Cheng M, Sohel F, Bennamoun M, Li J (2019) NormalNet: A voxel-based CNN for 3D object classification and retrieval. Neurocomputing 323:139–147
Kumawat S, Raman S (2019) LP-3DCNN: Unveiling local phase in 3D convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4903-4912
Zhi S, Liu Y, Li X, Guo Y (2017) LightNet: A Lightweight 3D Convolutional Neural Network for Real-Time 3D Object Recognition. In: Proceedings of the workshop on 3D object retrieval, pp 9-16
Ma C, Guo Y, Lei Y, An W (2018) Binary volumetric convolutional neural networks for 3-D object recognition. IEEE Trans Instrum Meas 68(1):38–48
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6450-6459
Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In: Proceedings of the European conference on computer vision (ECCV), pp 305-321
Li L, Qin S, Lu Z, Zhang D, Xu K, Hu Z (2021) Real-time one-shot learning gesture recognition based on lightweight 3D Inception-ResNet with separable convolutions. Pattern Anal Appl 24(3):1173–1192
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Hu Z, Hu Y, Liu J, Wu B, Han D, Kurfess T (2018) 3D separable convolutional neural network for dynamic hand gesture recognition. Neurocomputing 318:151–161
Liu T, Wang J, Huang X, Lu Y, Bao J (2022) 3DSMDA-Net: An improved 3DCNN with separable structure and multi-dimensional attention for welding status recognition. J Manuf Syst 62:811–822
Liu D, Liang J, Geng T, Loui A, Zhou T (2023) Tripartite feature enhanced pyramid network for dense prediction. IEEE Trans Image Process 32:2678–2692
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132-7141
Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3-19
De Deuge M, Quadros A, Hung C, Douillard B (2013) Unsupervised feature learning for classification of outdoor 3D scans. In: Australasian conference on robitics and automation, pp 1-9
Chang AX, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H (2015) ShapeNet: An information-rich 3D model repository. arXiv:1512.03012
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026-1034
Hegde V, Zadeh R (2016) FusionNet: 3D object classification using multiple data representations. arXiv:1607.05695
Gomez-Donoso F, Escalona F, Cazorla M (2020) Par3DNet: Using 3DCNNs for object recognition on tridimensional partial views. Appl Sci 10(10):3409
Liu M, Shi Y, Zheng L, Xu K, Huang H, Manocha D (2019) Recurrent 3D attentional networks for end-to-end active object recognition. Comput Vis Med 5(1):91–104
Han C, Wang Q, Cui Y, Cao Z, Wang W, Qi S, Liu D (2023) E2VPT: An effective and efficient approach for visual prompt tuning. arXiv:2307.13770
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, L., Qin, S., Yang, N. et al. LVNet: A lightweight volumetric convolutional neural network for real-time and high-performance recognition of 3D objects. Multimed Tools Appl 83, 61047–61063 (2024). https://doi.org/10.1007/s11042-023-17816-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17816-2