LVNet: A lightweight volumetric convolutional neural network for real-time and high-performance recognition of 3D objects | Multimedia Tools and Applications Skip to main content
Log in

LVNet: A lightweight volumetric convolutional neural network for real-time and high-performance recognition of 3D objects

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The 3D object recognition has become one of hot topics in computer vision with the increasing of application scenarios of 3D data, such as robotic systems, autonomous driving, and security check systems using active millimeter wave. Although 3D convolutional neural network (CNN) has achieved some good results in 3D object recognition, its key performances such as computational efficiency and realtimeness still need to be improved due to its huge amount of parameters of 3D convolutions. In this paper, we present a powerful tool LVNet which is a lightweight volumetric CNN designed for real-time and high-performance recognition of 3D objects. Meanwhile, all of standard 3D convolutions are replaced with depthwise separable convolutions in the LVNet so as to reduce the model size and computation complexity. Furthermore, the attention mechanism is combined with the depthwise separable convolutions to compensate for the performance loss caused by the reduction of parameter number. In order to further improve the performance of LVNet, some auxiliary methods are employed also, such as data augmentation with multiple rotations of objects and information fusion of different orientations. A series of experimental results on public datasets show that the proposed LVNet achieves competitive recognition performance with less burden of computation and memory.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data Availability

The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

References

  1. Rani S, Lakhwani K, Kumar S (2022) Three dimensional objects recognition & pattern recognition technique; related challenges: A review. Multimed Tools Appl 81(12):17303–17346

    Article  Google Scholar 

  2. Li B, Zhang Y, Sun F (2022) Deep residual neural network based PointNet for 3D object part segmentation. Multimed Tools Appl 81:11933–11947

    Article  Google Scholar 

  3. Zhong Y, Sun Z, Luo S, Sun Y, Wang Y (2022) Video supervised for 3D reconstruction from single image. Multimed Tools Appl 81(11):15061–15083

    Article  Google Scholar 

  4. Liang J, Zhou T, Liu D, Wang W (2023) CLUSTSEG: Clustering for Universal Segmentation. arXiv:2305.02187

  5. Wang W, Liang J, Liu D (2022) Learning equivariant segmentation with instance-unique querying. Adv Neural Inf Process Syst 35:12826–12840

    Google Scholar 

  6. Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3D shape recognition. In: Proceedings of the IEEE international conference on computer vision, pp 945-953

  7. Xu Y, Zheng C, Xua R, Quan Y, Ling H (2021) Multi-View 3D Shape Recognition via Correspondence-Aware Deep Learning. IEEE Trans Image Process 30:5299–5312

    Article  Google Scholar 

  8. Qi CR, Su H, Mo K, Guibas LJ (2017) PointNet: Deep learning on point sets for 3D classification and segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 652-660

  9. Qi CR, Yi L, Su H, Guibas LJ (2017) PointNet++: Deep hierarchical feature learning on point sets in a metric space. arXiv:1706.02413

  10. Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J (2015) 3D ShapeNets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1912-1920

  11. Maturana D, Scherer S (2015) VoxNet: A 3D convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp 922-928

  12. Sedaghat N, Zolfaghari M, Amiri E, Brox T (2016) Orientation-boosted voxel nets for 3D object recognition. arXiv:1604.03351

  13. Qi CR, Su H, NieSSner M, Dai A, Yan M, Guibas LJ (2016) Volumetric and multi-view CNNs for object classification on 3D data. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5648-5656

  14. Brock A, Lim T, Ritchie JM, Weston N (2016) Generative and discriminative voxel modeling with convolutional neural networks. arXiv:1608.04236

  15. Wang C, Cheng M, Sohel F, Bennamoun M, Li J (2019) NormalNet: A voxel-based CNN for 3D object classification and retrieval. Neurocomputing 323:139–147

    Article  Google Scholar 

  16. Kumawat S, Raman S (2019) LP-3DCNN: Unveiling local phase in 3D convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4903-4912

  17. Zhi S, Liu Y, Li X, Guo Y (2017) LightNet: A Lightweight 3D Convolutional Neural Network for Real-Time 3D Object Recognition. In: Proceedings of the workshop on 3D object retrieval, pp 9-16

  18. Ma C, Guo Y, Lei Y, An W (2018) Binary volumetric convolutional neural networks for 3-D object recognition. IEEE Trans Instrum Meas 68(1):38–48

    Article  Google Scholar 

  19. Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6450-6459

  20. Xie S, Sun C, Huang J, Tu Z, Murphy K (2018) Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In: Proceedings of the European conference on computer vision (ECCV), pp 305-321

  21. Li L, Qin S, Lu Z, Zhang D, Xu K, Hu Z (2021) Real-time one-shot learning gesture recognition based on lightweight 3D Inception-ResNet with separable convolutions. Pattern Anal Appl 24(3):1173–1192

    Article  Google Scholar 

  22. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861

  23. Hu Z, Hu Y, Liu J, Wu B, Han D, Kurfess T (2018) 3D separable convolutional neural network for dynamic hand gesture recognition. Neurocomputing 318:151–161

    Article  Google Scholar 

  24. Liu T, Wang J, Huang X, Lu Y, Bao J (2022) 3DSMDA-Net: An improved 3DCNN with separable structure and multi-dimensional attention for welding status recognition. J Manuf Syst 62:811–822

    Article  Google Scholar 

  25. Liu D, Liang J, Geng T, Loui A, Zhou T (2023) Tripartite feature enhanced pyramid network for dense prediction. IEEE Trans Image Process 32:2678–2692

    Article  Google Scholar 

  26. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132-7141

  27. Woo S, Park J, Lee J-Y, Kweon IS (2018) CBAM: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3-19

  28. De Deuge M, Quadros A, Hung C, Douillard B (2013) Unsupervised feature learning for classification of outdoor 3D scans. In: Australasian conference on robitics and automation, pp 1-9

  29. Chang AX, Funkhouser T, Guibas L, Hanrahan P, Huang Q, Li Z, Savarese S, Savva M, Song S, Su H (2015) ShapeNet: An information-rich 3D model repository. arXiv:1512.03012

  30. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE international conference on computer vision, pp 1026-1034

  31. Hegde V, Zadeh R (2016) FusionNet: 3D object classification using multiple data representations. arXiv:1607.05695

  32. Gomez-Donoso F, Escalona F, Cazorla M (2020) Par3DNet: Using 3DCNNs for object recognition on tridimensional partial views. Appl Sci 10(10):3409

    Article  Google Scholar 

  33. Liu M, Shi Y, Zheng L, Xu K, Huang H, Manocha D (2019) Recurrent 3D attentional networks for end-to-end active object recognition. Comput Vis Med 5(1):91–104

    Article  Google Scholar 

  34. Han C, Wang Q, Cui Y, Cao Z, Wang W, Qi S, Liu D (2023) E2VPT: An effective and efficient approach for visual prompt tuning. arXiv:2307.13770

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lianwei Li.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, L., Qin, S., Yang, N. et al. LVNet: A lightweight volumetric convolutional neural network for real-time and high-performance recognition of 3D objects. Multimed Tools Appl 83, 61047–61063 (2024). https://doi.org/10.1007/s11042-023-17816-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-17816-2

Keywords

Navigation