Road object detection: a comparative study of deep learning-based algorithms | Multimedia Tools and Applications Skip to main content

Advertisement

Log in

Road object detection: a comparative study of deep learning-based algorithms

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Deep learning field has progressed the vision-based surround perception and has become the most trending area in the field of Intelligent Transportation System (ITS). Many deep learning-based algorithms using two-dimensional images have become an essential tool for autonomous vehicles with object detection, tracking, and segmentation for road target detection, primarily including pedestrians, vehicles, traffic lights, and traffic signs. Autonomous vehicles rely heavily on visual data to classify and generalize target objects which can satisfy pedestrians’ and other vehicles’ safety requirements in their environment. In real-time, outstanding results are obtained by deep learning-based algorithms for object detection. While several studies have thoroughly examined different types of deep learning-based object detection methods, there are a few comparable studies that either test the detection speed or accuracy of the object detection algorithms. In addition to speed and accuracy, autonomous driving also depends on model size and energy efficiency. However, there is a lack of comparison on various such metrics among existing deep learning-based methods. This article aims to provide a detailed and systematic comparative analysis of five independent mainstream deep learning-based algorithms for road object detection, namely the R-FCN, Mask R-CNN, SSD, RetinaNet, and YOLOv4 on a large-scale Berkeley DeepDrive (BDD100K) dataset. The experimental results are analyzed using the mean Average Precision (mAP) value and inference time. Additionally, various practical metrics, such as model size, computational complexity, and energy efficiency of deep learning-based models are precisely computed. Furthermore, the performance of each algorithm is evaluated under different road environmental conditions at various times of day and night. The comparison presented in this article helps to gain insight into the strengths and limitations of the popular deep learning-based algorithms under practical constraints with their real-time deployment feasibility. Code is publicly available at: https://github.com/bharatmahaur/ComparativeStudy

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th {USENIX}, symposium on operating systems design and implementation ({OSDI}), vol 16, pp 265–283

  2. Aziz L, bin Haji Salam S, Ayub S (2020) Exploring deep learning-based architecture, strategies, applications and current trends in generic object detection. A comprehensive review. IEEE Access

  3. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:200410934

  4. Braun M, Krebs S, Flohr F, Gavrila DM (2019) Eurocity persons: a novel benchmark for person detection in traffic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(8):1844–1861

    Article  Google Scholar 

  5. Broggi A, Cardarelli E, Cattani S, Medici P, Sabbatelli M (2014) Vehicle detection for autonomous parking using a soft-cascade adaboost classifier. In: IEEE Intelligent vehicles symposium proceedings. IEEE, pp 912–917

  6. Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11621–11631

  7. Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162

  8. Chen Z, Chen K, Chen J (2013) Vehicle and pedestrian detection using support vector machine and histogram of oriented gradients features. In: International conference on computer sciences and applications. IEEE, pp 365–368

  9. Chen L, Lin S, Lu X, Cao D, Wu H, Guo C, Liu C, Wang FY (2021) Deep neural network based vehicle and pedestrian detection for autonomous driving. A survey. IEEE Transactions on Intelligent Transportation Systems

  10. Coppola P, Silvestri F (2019) Autonomous vehicles and future mobility solutions. In: Autonomous vehicles and future mobility

  11. Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. arXiv:160506409

  12. Devi S, Malarvezhi P, Dayana R, Vadivukkarasi K (2020) A comprehensive survey on autonomous driving cars: a perspective view. Wirel Pers Commun 114:3

    Article  Google Scholar 

  13. Feng D, Haase-Schütz C, Rosenbaum L, Hertlein H, Glaeser C, Timm F, Wiesbeck W, Dietmayer K (2020) Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans Intell Transp Syst 22(3):1341–1360

    Article  Google Scholar 

  14. Feng D, Harakeh A, Waslander S, Dietmayer K (2020) A review and comparative study on probabilistic object detection in autonomous driving. arXiv:201110671

  15. Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Martinez-Gonzalez P, Garcia-Rodriguez J (2018) A survey on deep learning techniques for image and video semantic segmentation. Appl Soft Comput 70:41–65

    Article  Google Scholar 

  16. Geiger A, Lenz P, Stiller C, Urtasun R (2012) Are we ready for autonomous driving? The Kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR)

  17. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  18. Gupta A, Mahaur B (2021) An improved dv-maxhop localization algorithm for wireless sensor networks. Wirel Pers Commun 117(3):2341–2357

    Article  Google Scholar 

  19. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  20. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  21. He X, Zhao K, Chu X (2021) Automl: a survey of the state-of-the-art. Knowl-Based Syst 212:106622

    Article  Google Scholar 

  22. Hnewa M, Radha H (2020) Object detection under rainy conditions for autonomous vehicles: a review of state-of-the-art and emerging techniques. IEEE Signal Process Mag 38(1):53–67

    Article  Google Scholar 

  23. Huang Y, Chen Y (2020) Autonomous driving with deep learning: a survey of state-of-art technologies. arXiv:200606091

  24. Huang X, Cheng X, Geng Q, Cao B, Zhou D, Wang P, Lin Y, Yang R (2018) The apolloscape dataset for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 954–960

  25. Husain AA, Maity T, Yadav RK (2019) Vehicle detection in intelligent transport system under a hazy environment: a survey. IET Image Process 14(1):1–10

    Article  Google Scholar 

  26. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678

  27. Jocher G et al (2021) ultralytics/yolov3: v9.5.0 - YOLOv5 v5.0 release compatibility update for YOLOv3. https://doi.org/10.5281/zenodo.4681234

  28. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inform Process Syst 25:1097–1105

    Google Scholar 

  29. Kuutti S, Bowden R, Jin Y, Barber P, Fallah S (2020) A survey of deep learning applications to autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems

  30. Lee Y, Hwang Jw, Lee S, Bae Y, Park J (2019) An energy and gpu-computation efficient backbone network for real-time object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0

  31. Li H, Wang J, Xu L, Zhang S, Tao Y (2021) Efficient and accurate object detection for 3d point clouds in intelligent visual internet of things. Multimed Tools Appl, pp –38

  32. Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: Proceedings international conference on image processing. IEEE, vol 1, pp I–I

  33. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  34. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  35. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  36. Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318

    Article  Google Scholar 

  37. Majid Azimi S (2018) Shuffledet: real-time vehicle detection network in on-board embedded uav imagery. In: Proceedings of the European Conference on Computer Vision (ECCV) workshops, pp 0–0

  38. Minaee S, Boykov YY, Porikli F, Plaza AJ, Kehtarnavaz N, Terzopoulos D (2021) Image segmentation using deep learning. A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence

  39. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  40. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  41. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:180402767

  42. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. arXiv:150601497

  43. Rybski PE, Huber D, Morris DD, Hoffman R (2010) Visual classification of coarse vehicle orientation using histogram of oriented gradients features. In: IEEE Intelligent vehicles symposium. IEEE, pp 921–928

  44. Silva PB, Andrade M, Ferreira S (2020) Machine learning applied to road safety modeling: a systematic literature review. Journal of traffic and transportation engineering (English edition)

  45. Simhambhatla R, Okiah K, Kuchkula S, Slater R (2019) Self-driving cars: evaluation of deep learning techniques for object detection in different driving conditions. SMU Data Sci Rev 2(1):23

    Google Scholar 

  46. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556

  47. Sivaraman S, Trivedi MM (2013) Looking at vehicles on the road: a survey of vision-based vehicle detection, tracking, and behavior analysis. IEEE Trans Intell Transp Syst 14(4):1773–1795

    Article  Google Scholar 

  48. Srivastava S, Narayan S, Mittal S (2021) A survey of deep learning techniques for vehicle detection from images. Journal of Systems Architecture, 102152

  49. Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning. PMLR, pp 1139–1147

  50. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  51. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790

  52. Tang Y, Zhang C, Gu R, Li P, Yang B (2017) Vehicle detection and recognition for intelligent traffic surveillance system. Multimed Tools Applic 76(4):5817–5832

    Article  Google Scholar 

  53. Wang H, Yu Y, Cai Y, Chen X, Chen L, Liu Q (2019) A comparative study of state-of-the-art deep learning algorithms for vehicle detection. IEEE Intell Transp Syst Mag 11(2):82–95

    Article  Google Scholar 

  54. Wu B, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 129–137

  55. Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X (2020) A review of object detection based on deep learning. Multimed Tools Appl 79(33):23729–23791

    Article  Google Scholar 

  56. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

  57. Yu F, Chen H, Wang X, Xian W, Chen Y, Liu F, Madhavan V, Darrell T (2020) Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2636–2645

  58. Zaidi SSA, Ansari MS, Aslam A, Kanwal N, Asghar M, Lee B (2021) A survey of modern deep learning based object detection models. arXiv:210411892

  59. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212

  60. Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: a survey. arXiv:190505055

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Navjot Singh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix:

Appendix:

As stated in the related works section of this article, there is some overlap with [53]. There are eight main differences between the proposed article and [53]. (1) This article compares up-to-date algorithms; for instance, Mask R-CNN [19] was released in early 2017, but the authors in [53] focused on Faster R-CNN [42]. (2) Additionally, this article considers four primary classes for road object detection, namely pedestrians, vehicles, traffic signs, and traffic lights on a diverse large-scale BDD100K dataset [57]. In contrast, [53] considered only a single class, namely vehicles on a small dataset. (3) In addition to basic augmentation methods, unlike [53], this article applies Mosaic data augmentation [27] during training of each model. (4) This article computes inference time and speed on CPU and GPU, while [53] evaluated frame per second. (5) Moreover, this article provides a detailed comparison on various practical metrics, such as model size, memory footprint, and energy efficiency. In contrast, the authors in [53] do not provide any comparison on such metrics. (6) This article uses precise numerical representation for all metrics, including sensitivity, specificity, and computational complexity. In contrast, [53] provided subjective representation to sensitivity, specificity, and complexity. However, it is unclear what each level of indicator represents and how these metrics were evaluated in [53]. (7) Furthermore, this article computes generalization ability based on mAP values on different road weather scenarios at various times of day and night, whereas [53] provided subjective analysis on the generalization ability of models. (8) Lastly, unlike [53], this article provides a more in-depth and systematic analysis with supporting tables and figures.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahaur, B., Singh, N. & Mishra, K.K. Road object detection: a comparative study of deep learning-based algorithms. Multimed Tools Appl 81, 14247–14282 (2022). https://doi.org/10.1007/s11042-022-12447-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12447-5

Keywords

Navigation