Abstract
Vehicle detection is a critical task that involves identifying and localizing vehicles in a traffic scenario. However, the traditional approach of one-to-one set matching for label assignment, where each ground-truth bounding box is assigned to one specific query, can lead to sparse positive samples. To address this issue, we drew inspiration from contrastive learning and employed contrasting samples generated by feature augmentation, rather than supplementing the complex one-to-many matching in label assignment. Our proposed approach was evaluated on the publicly available GM traffic dataset and Hangzhou traffic dataset, and the results demonstrate that our approach outperforms other state-of-the-art methods, with average precision (AP) improvements of 1.0% and 1.1%, respectively. Overall, our approach effectively handles the sparsity of positive samples in vehicle detection and achieves better performance than existing methods.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of data and materials
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Tian Y, Chen T, Cheng G, Yu S, Li X, Li J, Yang B (2022) Global context assisted structure-aware vehicle retrieval. IEEE Trans Intell Transp Syst 23(1):165–174
Liu H, Nie H, Zhang Z, Li Y-F (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in human-computer interaction. Neurocomputing 433:310–322
Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) Mfdnet: collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Trans Multimedia 24:2449–2460
Liu H, Liu T, Zhang Z, Sangaiah AK, Yang B, Li Y (2022) Arhpe: asymmetric relation-aware representation learning for head pose estimation in industrial human-computer interaction. IEEE Trans Industr Inf 18(10):7107–7117
Tian Y, Gelernter J, Wang X, Chen W, Gao J, Zhang Y, Li X (2018) Lane marking detection via deep convolutional neural network. Neurocomputing 280:46–55
Tian Y, Wang H, Wang X (2017) Object localization via evaluation multi-task learning. Neurocomputing 253:34–41
Chen X, Wei F, Zeng G, et al (2022) Conditional detr v2: efficient detection transformer with box queries. arXiv:2207.08914
Zhang H, Li F, Liu S, Su H, Zhu J, Ni LM, Shum H-Y (2023) Dino: Detr with improved denoising anchor boxes for end-to-end object detection. In: Proceedings of the international conference on learning representations, pp 460–470
Li F, Zhang H, Liu S, et al (2022) Dn-detr: accelerate detr training by introducing query denoising. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13619–13627
Liu S, Li F, Zhang H, Yang X, Qi X, Su H, Zhu J, Zhang L (2022) Dab-detr: dynamic anchor boxes are better queries for detr. In: Proceedings of the international conference on learning representation, pp 213–229
Zhang R, Tian Y, Xu Z, Liu D (2023) Design of anchor boxes and data augmentation for transformer-based vehicle localization. J Vis Commun Image Represent 90:103711
Wang Y, Zhang X, Yang T, et al (2022) Anchor detr: query design for transformer-based object detection. In: Proceedings of the AAAI conference on artificial intelligence, pp 302–311
Yang J, Li C, Zhang P, Dai X, Xiao B, Yuan L, Gao J (2021) Focal attention for long-range interactions in vision transformers. In: Proceedings of the advances in neural information processing systems, pp 2172–2180
Liu Z, Lin Y, Cao Y, Hu H, Wei Y, Zhang Z, Lin S, Guo B (2021) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Zhang S, Wang X, Wang J, Pang J, Lyu C, Zhang W, Luo P, Chen K (2023) Dense distinct query for end-to-end object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7329–7338
Ouyang-Zhang J, Cho JH, Zhou X, Krähenbühl P (2022) Nms strikes back. arXiv:2212.06137
Tian Y, Wang X, Wu J, Wang R, Yang B (2019) Multi-scale hierarchical residual network for dense captioning. J Artif Intell Res 64:181–196
Chen Q, Chen X, Zeng G, Wang J (2022) Group detr: fast detr training with group-wise one-to-many label assignment. arXiv:2207.13085
Zong Z, Song G, Liu Y (2023) Detrs with collaborative hybrid assignments training. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1009–1018
Jia D, Yuan Y, He H, Wu X, Yu H, Lin W, Sun L, Zhang C, Hu H (2023) Detrs with hybrid matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 19702–19712
Zhang R, Tian Y, Liu D (2022) Uncertainty region discovery and model refinement for domain adaptation in road detection. IEEE Intell Trans Syst Mag 2–10
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: Proceedings of the European conference on computer vision, pp 213–229
Meng D, Chen X, Fan Z, et al (2021) Conditional detr for fast training convergence. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3651–3660
Liu H, Liu T, Chen Y, Zhang Z, Li Y-F (2022) Ehpe: skeleton cues-based gaussian coordinate encoding for efficient human pose estimation. IEEE Trans Multimedia 1–12
Liu T, Liu H, Yang B, Zhang Z (2023) Ldcnet: limb direction cuesaware network for flexible human pose estimation in industrial behavioral biometrics systems. IEEE Trans Industrial Inform 1–11
Liu H, Zhang C, Deng Y, Xie B, Liu T, Zhang Z, Li Y-F (2023) Transifc: invariant cues-aware feature concentration learning for efficient fine-grained bird image classification. IEEE Trans Multimedia 1–14
Tian Y, Gelernter J, Wang X, Li J, Yu Y (2019) Traffic sign detection using a multi-scale recurrent attention network. IEEE Trans Intell Transp Syst 20(12):4466–4475
Liu T, Liu H, Li Y, Zhang Z, Liu S (2018) Efficient blind signal reconstruction with wavelet transforms regularization for educational robot infrared vision sensing. IEEE/ASME Trans Mechatron 24(1):384–394
Liu T, Liu H, Li Y-F, Chen Z, Zhang Z, Liu S (2019) Flexible ftir spectral imaging enhancement for industrial robot infrared vision sensing. IEEE Trans Industr Inf 16(1):544–554
Liu T, Wang J, Yang B, Wang X (2021) Ngdnet: Nonuniform gaussian-label distribution learning for infrared head pose estimation and on-task behavior understanding in the classroom. Neurocomputing 436:210–220
Zhang H, Chang H, Ma B, Wang N, Chen X (2020) Dynamic r-cnn: towards high quality object detection via dynamic training. In: Proceedings of the european conference on computer vision, pp 260–275
Oksuz K, Cam BC, Akbas E, Kalkan S (2021) Rank & sort loss for object detection and instance segmentation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 3009–3018
Kim K, Lee, HS (2020) Probabilistic anchor assignment with iou prediction for object detection. In: Proceedings of the european conference on computer vision, pp 355–371
Li S, He C, Li R, Zhang L (2022) A dual weighting label assignment scheme for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9387–9396
Li S, Li M, Li R, He C, Zhang L (2023) One-to-few label assignment for end-to-end dense detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7350–7359
Tian Y, Zhang Y, Chen W-G, Liu D, Wang H, Xu H, Han J, Ge Y (2022) 3d tooth instance segmentation learning objectness and affinity in point cloud. ACM Trans Multimed Comput Commun Appl 18(4):1–16
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez, AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the advances in neural information processing Systems
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Proceedings of the advances in neural information processing systems, pp 1009–1018
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L, et al (2019) Pytorch: an imperative style, high-performance deep learning library. In: Proceedings of the advances in neural information processing systems, pp 759–768
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the international conference on artificial intelligence and statistics, pp 249–256
Lin, T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 658–666
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Tian Y, Hu W, Jiang H, Wu J (2019) Densely connected attentional pyramid residual network for human pose estimation. Neurocomputing 347:13–23
Tian Y, Cao Y, Wu J, Hu W, Song C, Yang T (2019) Multi-cue combination network for action-based video classification. IET Comput Vision 13(6):542–548
Tian Y, Zhang Y, Zhou D, Cheng G, Chen W-G, Wang R (2020) Triple attention network for video segmentation. Neurocomputing 417:202–211
Tian Y, Cheng G, Gelernter J, Yu S, Song C, Yang B (2020) Joint temporal context exploitation and active learning for video segmentation. Pattern Recogn 100:107158
Liu D, Tian Y, Xu Z, Jian G (2022) Handling occlusion in prohibited item detection from x-ray images. Neural Comput Appl 34(22):20285–20298
Acknowledgements
The authors would like to thank AJE (www.aje.com) for its language editing assistance during the preparation of this manuscript.
Funding
This work was supported by the National Natural Science Foundation of China (61976188, 61972351 and 62111530300), the Special Project for Basic Business Expenses of Zhejiang Provincial Colleges and Universities (JRK22003), and the Opening Foundation of State Key Laboratory of Virtual Reality Technology and System of Beihang University (VRLAB2023B02).
Author information
Authors and Affiliations
Contributions
Erjun Sun: Formal analysis, Writing - original draft preparation. Di Zhou: Conceptualization, Methodology, Writing - review & editing. Zhaocheng Xu: Software, Data curation, Writing - review & editing. Jie Sun: Writing - review & editing. Xun Wang: Writing - review & editing.
Corresponding author
Ethics declarations
Conflict of interest/Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, E., Zhou, D., Xu, Z. et al. Contrastive label assignment in vehicle detection. Appl Intell 53, 29713–29722 (2023). https://doi.org/10.1007/s10489-023-05023-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-05023-3