PointCSE: Context-sensitive encoders for efficient 3D object detection from point cloud | International Journal of Machine Learning and Cybernetics Skip to main content
Log in

PointCSE: Context-sensitive encoders for efficient 3D object detection from point cloud

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Few modern 3D object detectors achieve fast inference speed and high accuracy at the same time. To achieve high performance, they usually directly operate on raw point clouds, or convert point clouds to 3D representation and then apply 3D convolution. However, those methods come with sizable computation overhead and complex operations. As for high-speed 2D-representation-based 3D detectors, their performance is still restricted. In this paper, we investigate how to leverage context knowledge to empower the 2D representation of point clouds for computation and memory-efficient 3D object detection with state-of-the-art performance. The proposed encoder has two parts: a context-sensitive point sampling network and a point set learning network. Specifically, our point sampling network samples points with dense localization information. With high-quality sampled points, we are allowed to utilize a deeper point set learning network to aggregate semantic details in a light manner. The proposed encoder is lightweight and very supportive of hardware acceleration like TensorRT and TVM. Extensive experiments on the KITTI benchmark show the proposed encoder called PointCSE outperforms prior real-time encoders by a large margin with 1.5\(\times\) memory reduction; it also achieves state-of-the-art performance with 49 FPS inference speed (4\(\times\) speedup on average compared to previous best methods).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 1907–1915

  2. Chen Y, Liu S, Shen X, Jia J (2019) Fast point r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9775–9784

  3. Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? the kitti vision benchmark suite. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 3354–3361

  4. He C, Zeng H, Huang J, Hua XS, Zhang L (2020) Structure aware single-stage 3d object detection from point cloud. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11873–11882

  5. Ku J, Mozifian M, Lee J, Harakeh A, Waslander SL (2018) Joint 3d proposal generation and object detection from view aggregation. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp 1–8

  6. Lang AH, Vora S, Caesar H, Zhou L, Yang J, Beijbom O (2019) Pointpillars: Fast encoders for object detection from point clouds. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 12697–12705

  7. Li Y, Bu R, Sun M, Wu W, Di X, Chen B (2018) Pointcnn: Convolution on x-transformed points. In: Advances in Neural Information Processing Systems, pp 820–830

  8. Liang M, Yang B, Chen Y, Hu R, Urtasun R (2019) Multi-task multi-sensor fusion for 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7345–7353

  9. Liang M, Yang B, Wang S, Urtasun R (2018) Deep continuous fusion for multi-sensor 3d object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 641–656

  10. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  11. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  12. Ngiam J, Caine B, Han W, Yang B, Chai Y, Sun P, Zhou Y, Yi X, Alsharif O, Nguyen P, et al (2019) Starnet: Targeted computation for object detection in point clouds. arXiv preprint arXiv:1908.11069

  13. Qi CR, Litany O, He K, Guibas LJ (2019) Deep hough voting for 3d object detection in point clouds. arXiv preprint arXiv:1904.09664

  14. Qi CR, Liu W, Wu C, Su H, Guibas LJ (2018) Frustum pointnets for 3d object detection from rgb-d data. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 918–927

  15. Qi CR, Su H, Mo K, Guibas LJ (2017) Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 652–660

  16. Qi CR, Yi L, Su H, Guibas LJ (2017) Pointnet++: Deep hierarchical feature learning on point sets in a metric space. In: Advances in neural information processing systems, pp 5099–5108

  17. Shi S, Wang X, Li H (2019) Pointrcnn: 3d object proposal generation and detection from point cloud. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–779

  18. Simony M, Milzy S, Amendey K, Gross HM (2018) Complex-yolo: an euler-region-proposal for real-time 3d object detection on point clouds. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 0–0

  19. Wang Z, Jia K (2019) Frustum convnet: Sliding frustums to aggregate local point-wise features for amodal 3d object detection. arXiv preprint arXiv:1903.01864

  20. Yan Y, Mao Y, Li B (2018) Second: Sparsely embedded convolutional detection. Sensors 18(10):3337

    Article  Google Scholar 

  21. Yang B, Liang M, Urtasun R (2018) Hdnet: Exploiting hd maps for 3d object detection. In: Conference on Robot Learning, pp 146–155

  22. Yang B, Luo W, Urtasun R (2018) Pixor: Real-time 3d object detection from point clouds. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp 7652–7660

  23. Yang Z, Sun Y, Liu S, Shen X, Jia J (2019) Std: Sparse-to-dense 3d object detector for point cloud. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1951–1960

  24. Ye M, Xu S, Cao T (2020) Hvnet: Hybrid voxel network for lidar based 3d object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1631–1640

  25. Zhao W, Zhang S, Guan Z, Luo H, Tang L, Peng J, Fan J (2020) 6d object pose estimation via viewpoint relation reasoning. Neurocomputing 389:9–17

    Article  Google Scholar 

  26. Zhao W, Zhang S, Guan Z, Zhao W, Peng J, Fan J (2020) Learning deep network for detecting 3d object keypoints and 6d poses. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

  27. Zhou Y, Tuzel O (2018) Voxelnet: End-to-end learning for point cloud based 3d object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4490–4499

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuoliang Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported in part by The National Key Research and Development Program of China (Grant Nos: 2018AAA0101400), in part by The National Nature Science Foundation of China (Grant Nos: 62036009, U1909203, 61936006, 61973271).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, K., Xu, G., Liu, Z. et al. PointCSE: Context-sensitive encoders for efficient 3D object detection from point cloud. Int. J. Mach. Learn. & Cyber. 13, 39–47 (2022). https://doi.org/10.1007/s13042-021-01342-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-021-01342-4

Keywords

Navigation