Abstract
Deep learning field has progressed the vision-based surround perception and has become the most trending area in the field of Intelligent Transportation System (ITS). Many deep learning-based algorithms using two-dimensional images have become an essential tool for autonomous vehicles with object detection, tracking, and segmentation for road target detection, primarily including pedestrians, vehicles, traffic lights, and traffic signs. Autonomous vehicles rely heavily on visual data to classify and generalize target objects which can satisfy pedestrians’ and other vehicles’ safety requirements in their environment. In real-time, outstanding results are obtained by deep learning-based algorithms for object detection. While several studies have thoroughly examined different types of deep learning-based object detection methods, there are a few comparable studies that either test the detection speed or accuracy of the object detection algorithms. In addition to speed and accuracy, autonomous driving also depends on model size and energy efficiency. However, there is a lack of comparison on various such metrics among existing deep learning-based methods. This article aims to provide a detailed and systematic comparative analysis of five independent mainstream deep learning-based algorithms for road object detection, namely the R-FCN, Mask R-CNN, SSD, RetinaNet, and YOLOv4 on a large-scale Berkeley DeepDrive (BDD100K) dataset. The experimental results are analyzed using the mean Average Precision (mAP) value and inference time. Additionally, various practical metrics, such as model size, computational complexity, and energy efficiency of deep learning-based models are precisely computed. Furthermore, the performance of each algorithm is evaluated under different road environmental conditions at various times of day and night. The comparison presented in this article helps to gain insight into the strengths and limitations of the popular deep learning-based algorithms under practical constraints with their real-time deployment feasibility. Code is publicly available at: https://github.com/bharatmahaur/ComparativeStudy
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M et al (2016) Tensorflow: a system for large-scale machine learning. In: 12th {USENIX}, symposium on operating systems design and implementation ({OSDI}), vol 16, pp 265–283
Aziz L, bin Haji Salam S, Ayub S (2020) Exploring deep learning-based architecture, strategies, applications and current trends in generic object detection. A comprehensive review. IEEE Access
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:200410934
Braun M, Krebs S, Flohr F, Gavrila DM (2019) Eurocity persons: a novel benchmark for person detection in traffic scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(8):1844–1861
Broggi A, Cardarelli E, Cattani S, Medici P, Sabbatelli M (2014) Vehicle detection for autonomous parking using a soft-cascade adaboost classifier. In: IEEE Intelligent vehicles symposium proceedings. IEEE, pp 912–917
Caesar H, Bankiti V, Lang AH, Vora S, Liong VE, Xu Q, Krishnan A, Pan Y, Baldan G, Beijbom O (2020) nuscenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11621–11631
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Chen Z, Chen K, Chen J (2013) Vehicle and pedestrian detection using support vector machine and histogram of oriented gradients features. In: International conference on computer sciences and applications. IEEE, pp 365–368
Chen L, Lin S, Lu X, Cao D, Wu H, Guo C, Liu C, Wang FY (2021) Deep neural network based vehicle and pedestrian detection for autonomous driving. A survey. IEEE Transactions on Intelligent Transportation Systems
Coppola P, Silvestri F (2019) Autonomous vehicles and future mobility solutions. In: Autonomous vehicles and future mobility
Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. arXiv:160506409
Devi S, Malarvezhi P, Dayana R, Vadivukkarasi K (2020) A comprehensive survey on autonomous driving cars: a perspective view. Wirel Pers Commun 114:3
Feng D, Haase-Schütz C, Rosenbaum L, Hertlein H, Glaeser C, Timm F, Wiesbeck W, Dietmayer K (2020) Deep multi-modal object detection and semantic segmentation for autonomous driving: datasets, methods, and challenges. IEEE Trans Intell Transp Syst 22(3):1341–1360
Feng D, Harakeh A, Waslander S, Dietmayer K (2020) A review and comparative study on probabilistic object detection in autonomous driving. arXiv:201110671
Garcia-Garcia A, Orts-Escolano S, Oprea S, Villena-Martinez V, Martinez-Gonzalez P, Garcia-Rodriguez J (2018) A survey on deep learning techniques for image and video semantic segmentation. Appl Soft Comput 70:41–65
Geiger A, Lenz P, Stiller C, Urtasun R (2012) Are we ready for autonomous driving? The Kitti vision benchmark suite. In: Conference on computer vision and pattern recognition (CVPR)
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Gupta A, Mahaur B (2021) An improved dv-maxhop localization algorithm for wireless sensor networks. Wirel Pers Commun 117(3):2341–2357
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
He X, Zhao K, Chu X (2021) Automl: a survey of the state-of-the-art. Knowl-Based Syst 212:106622
Hnewa M, Radha H (2020) Object detection under rainy conditions for autonomous vehicles: a review of state-of-the-art and emerging techniques. IEEE Signal Process Mag 38(1):53–67
Huang Y, Chen Y (2020) Autonomous driving with deep learning: a survey of state-of-art technologies. arXiv:200606091
Huang X, Cheng X, Geng Q, Cao B, Zhou D, Wang P, Lin Y, Yang R (2018) The apolloscape dataset for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 954–960
Husain AA, Maity T, Yadav RK (2019) Vehicle detection in intelligent transport system under a hazy environment: a survey. IET Image Process 14(1):1–10
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia, pp 675–678
Jocher G et al (2021) ultralytics/yolov3: v9.5.0 - YOLOv5 v5.0 release compatibility update for YOLOv3. https://doi.org/10.5281/zenodo.4681234
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inform Process Syst 25:1097–1105
Kuutti S, Bowden R, Jin Y, Barber P, Fallah S (2020) A survey of deep learning applications to autonomous vehicle control. IEEE Transactions on Intelligent Transportation Systems
Lee Y, Hwang Jw, Lee S, Bae Y, Park J (2019) An energy and gpu-computation efficient backbone network for real-time object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0
Li H, Wang J, Xu L, Zhang S, Tao Y (2021) Efficient and accurate object detection for 3d point clouds in intelligent visual internet of things. Multimed Tools Appl, pp –38
Lienhart R, Maydt J (2002) An extended set of haar-like features for rapid object detection. In: Proceedings international conference on image processing. IEEE, vol 1, pp I–I
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2020) Deep learning for generic object detection: a survey. Int J Comput Vis 128(2):261–318
Majid Azimi S (2018) Shuffledet: real-time vehicle detection network in on-board embedded uav imagery. In: Proceedings of the European Conference on Computer Vision (ECCV) workshops, pp 0–0
Minaee S, Boykov YY, Porikli F, Plaza AJ, Kehtarnavaz N, Terzopoulos D (2021) Image segmentation using deep learning. A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:180402767
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. arXiv:150601497
Rybski PE, Huber D, Morris DD, Hoffman R (2010) Visual classification of coarse vehicle orientation using histogram of oriented gradients features. In: IEEE Intelligent vehicles symposium. IEEE, pp 921–928
Silva PB, Andrade M, Ferreira S (2020) Machine learning applied to road safety modeling: a systematic literature review. Journal of traffic and transportation engineering (English edition)
Simhambhatla R, Okiah K, Kuchkula S, Slater R (2019) Self-driving cars: evaluation of deep learning techniques for object detection in different driving conditions. SMU Data Sci Rev 2(1):23
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:14091556
Sivaraman S, Trivedi MM (2013) Looking at vehicles on the road: a survey of vision-based vehicle detection, tracking, and behavior analysis. IEEE Trans Intell Transp Syst 14(4):1773–1795
Srivastava S, Narayan S, Mittal S (2021) A survey of deep learning techniques for vehicle detection from images. Journal of Systems Architecture, 102152
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning. PMLR, pp 1139–1147
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10781–10790
Tang Y, Zhang C, Gu R, Li P, Yang B (2017) Vehicle detection and recognition for intelligent traffic surveillance system. Multimed Tools Applic 76(4):5817–5832
Wang H, Yu Y, Cai Y, Chen X, Chen L, Liu Q (2019) A comparative study of state-of-the-art deep learning algorithms for vehicle detection. IEEE Intell Transp Syst Mag 11(2):82–95
Wu B, Iandola F, Jin PH, Keutzer K (2017) Squeezedet: unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 129–137
Xiao Y, Tian Z, Yu J, Zhang Y, Liu S, Du S, Lan X (2020) A review of object detection based on deep learning. Multimed Tools Appl 79(33):23729–23791
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Yu F, Chen H, Wang X, Xian W, Chen Y, Liu F, Madhavan V, Darrell T (2020) Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2636–2645
Zaidi SSA, Ansari MS, Aslam A, Kanwal N, Asghar M, Lee B (2021) A survey of modern deep learning based object detection models. arXiv:210411892
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212
Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: a survey. arXiv:190505055
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix:
Appendix:
As stated in the related works section of this article, there is some overlap with [53]. There are eight main differences between the proposed article and [53]. (1) This article compares up-to-date algorithms; for instance, Mask R-CNN [19] was released in early 2017, but the authors in [53] focused on Faster R-CNN [42]. (2) Additionally, this article considers four primary classes for road object detection, namely pedestrians, vehicles, traffic signs, and traffic lights on a diverse large-scale BDD100K dataset [57]. In contrast, [53] considered only a single class, namely vehicles on a small dataset. (3) In addition to basic augmentation methods, unlike [53], this article applies Mosaic data augmentation [27] during training of each model. (4) This article computes inference time and speed on CPU and GPU, while [53] evaluated frame per second. (5) Moreover, this article provides a detailed comparison on various practical metrics, such as model size, memory footprint, and energy efficiency. In contrast, the authors in [53] do not provide any comparison on such metrics. (6) This article uses precise numerical representation for all metrics, including sensitivity, specificity, and computational complexity. In contrast, [53] provided subjective representation to sensitivity, specificity, and complexity. However, it is unclear what each level of indicator represents and how these metrics were evaluated in [53]. (7) Furthermore, this article computes generalization ability based on mAP values on different road weather scenarios at various times of day and night, whereas [53] provided subjective analysis on the generalization ability of models. (8) Lastly, unlike [53], this article provides a more in-depth and systematic analysis with supporting tables and figures.
Rights and permissions
About this article
Cite this article
Mahaur, B., Singh, N. & Mishra, K.K. Road object detection: a comparative study of deep learning-based algorithms. Multimed Tools Appl 81, 14247–14282 (2022). https://doi.org/10.1007/s11042-022-12447-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12447-5