On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines | SpringerLink
Skip to main content

On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Reliable usage of object detectors require them to be calibrated—a crucial problem that requires careful attention. Recent approaches towards this involve (1) designing new loss functions to obtain calibrated detectors by training them from scratch, and (2) post-hoc Temperature Scaling (TS) that learns to scale the likelihood of a trained detector to output calibrated predictions. These approaches are then evaluated based on a combination of Detection Expected Calibration Error (D-ECE) and Average Precision. In this work, via extensive analysis and insights, we highlight that these recent evaluation frameworks, evaluation metrics, and the use of TS have notable drawbacks leading to incorrect conclusions. As a step towards fixing these issues, we propose a principled evaluation framework to jointly measure calibration and accuracy of object detectors. We also tailor efficient and easy-to-use post-hoc calibration approaches such as Platt Scaling and Isotonic Regression specifically for object detection task. Contrary to the common notion, our experiments show that once designed and evaluated properly, post-hoc calibrators, which are extremely cheap to build and use, are much more powerful and effective than the recent train-time calibration methods. To illustrate, D-DETR with our post-hoc Isotonic Regression calibrator outperforms the recent train-time state-of-the-art calibration method Cal-DETR by more than 7 D-ECE on the COCO dataset. Additionally, we propose improved versions of the recently proposed Localization-aware ECE and show the efficacy of our method on these metrics. Code is available at: https://github.com/fiveai/detection_calibration.

S. Kuzucu and K. Oksuz—Equal contributions. SK contributed during his internship at Five AI Oxford team.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 8465
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 10581
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ayer, M., Brunk, H.D., Ewing, G.M., Reid, W.T., Silverman, E.: An empirical distribution function for sampling with incomplete information. Ann. Math. Stat. 641–647 (1955)

    Google Scholar 

  2. Barlow, R.E., Brunk, H.D.: The isotonic regression problem and its dual. J. Am. Stat. Assoc. 67(337), 140–147 (1972)

    Article  MathSciNet  Google Scholar 

  3. Best, M.J., Chakravarti, N.: Active set algorithms for isotonic regression; a unifying framework. Math. Program. 47(1), 425–439 (1990)

    Article  MathSciNet  Google Scholar 

  4. Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP). IEEE (2016). https://doi.org/10.1109/icip.2016.7533003

  5. Bolya, D., Foley, S., Hays, J., Hoffman, J.: Tide: a general toolbox for identifying object detection errors. In: The IEEE European Conference on Computer Vision (ECCV) (2020)

    Google Scholar 

  6. Caesar, H., et al.: nuscenes: a multimodal dataset for autonomous driving. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11621–11631 (2020)

    Google Scholar 

  7. Chen, K., et al.: MMDetection: open mmlab detection toolbox and benchmark. arXiv arxiv:1906.07155 (2019)

  8. Cheng, J., Vasconcelos, N.: Calibrating deep neural networks by pairwise constraints. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  9. Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  10. Dai, Z., Cai, B., Lin, Y., Chen, J.: Unsupervised pre-training for detection transformers. IEEE Trans. Pattern Anal. Mach. Intell. 1–11 (2022). https://doi.org/10.1109/tpami.2022.3216514

  11. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision (IJCV) 88(2), 303–338 (2010)

    Article  Google Scholar 

  12. Fang, Y., et al.: Eva: exploring the limits of masked visual representation learning at scale. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)

    Google Scholar 

  13. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? the kitti vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

    Google Scholar 

  14. Grigorescu, S., Trasnea, B., Cocias, T., Macesanu, G.: A survey of deep learning techniques for autonomous driving. J. Field Rob. 37(3), 362–386 (2019). https://doi.org/10.1002/rob.21918

    Article  Google Scholar 

  15. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 1321–1330. PMLR (2017)

    Google Scholar 

  16. Gupta, A., Dollar, P., Girshick, R.: Lvis: a dataset for large vocabulary instance segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)

    Google Scholar 

  17. Harakeh, A., Waslander, S.L.: Estimating and evaluating regression predictive uncertainty in deep object detectors. In: International Conference on Learning Representations (ICLR) (2021)

    Google Scholar 

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  19. Hebbalaguppe, R., Prakash, J., Madan, N., Arora, C.: A stitch in time saves nine: a train-time regularizing loss for improved neural network calibration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16081–16090 (2022)

    Google Scholar 

  20. Hekler, A., Brinker, T.J., Buettner, F.: Test time augmentation meets post-hoc calibration: uncertainty quantification under real-world conditions. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 12, pp. 14856–14864 (2023). https://doi.org/10.1609/aaai.v37i12.26735. https://ojs.aaai.org/index.php/AAAI/article/view/26735

  21. Hendrycks, D., Dietterich, T.: Benchmarking neural network robustness to common corruptions and perturbations. In: International Conference on Learning Representations (ICLR) (2019)

    Google Scholar 

  22. Jin, C., et al.: Object recognition in medical images via anatomy-guided deep learning. Med. Image Anal. 81, 102527 (2022). https://doi.org/10.1016/j.media.2022.102527. https://www.sciencedirect.com/science/article/pii/S1361841522001748

  23. Joy, T., Pinto, F., Lim, S.N., Torr, P.H., Dokania, P.K.: Sample-dependent adaptive temperature scaling for improved calibration. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 12, pp. 14919–14926 (2023). https://doi.org/10.1609/aaai.v37i12.26742. https://ojs.aaai.org/index.php/AAAI/article/view/26742

  24. Karimi, D., Dou, H., Warfield, S.K., Gholipour, A.: Deep learning with noisy labels: exploring techniques and remedies in medical image analysis. Med. Image Anal. 65, 101759 (2020). https://doi.org/10.1016/j.media.2020.101759. https://www.sciencedirect.com/science/article/pii/S1361841520301237

  25. Kim, K., Lee, H.S.: Probabilistic anchor assignment with iou prediction for object detection. In: The European Conference on Computer Vision (ECCV) (2020)

    Google Scholar 

  26. Kirillov, A., et al.: Segment anything. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4015–4026 (2023)

    Google Scholar 

  27. Kumar, A., Liang, P.S., Ma, T.: Verified uncertainty calibration. In: Advances in Neural Information Processing Systems (NeurIPS), vol. 32 (2019)

    Google Scholar 

  28. Kumar, N., et al.: A multi-organ nucleus segmentation challenge. IEEE Trans. Med. Imaging 39(5), 1380–1391 (2020). https://doi.org/10.1109/TMI.2019.2947628

    Article  Google Scholar 

  29. Kumar, N., Verma, R., Sharma, S., Bhargava, S., Vahadane, A., Sethi, A.: A dataset and a technique for generalized nuclear segmentation for computational pathology. IEEE Trans. Med. Imaging 36(7), 1550–1560 (2017). https://doi.org/10.1109/TMI.2017.2677499

    Article  Google Scholar 

  30. Kuppers, F., Kronenberger, J., Shantia, A., Haselhoff, A.: Multivariate confidence calibration for object detection. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)

    Google Scholar 

  31. Li, L.H., et al.: Grounded language-image pre-training. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2022)

    Google Scholar 

  32. Li, X., et al.: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. In: Advances in Neural Information Processing Systems (NeurIPS) (2020)

    Google Scholar 

  33. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  34. Liu, B., Ayed, I.B., Galdran, A., Dolz, J.: The devil is in the margin: margin-based label smoothing for network calibration. In: CVPR (2022)

    Google Scholar 

  35. Liu, D.C., Nocedal, J.: On the limited memory bfgs method for large scale optimization. Math. Program. 45(1-3), 503–528 (1989). http://dblp.uni-trier.de/db/journals/mp/mp45.html#LiuN89

  36. Liu, S., et al.: Grounding dino: marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499 (2023)

  37. Lu, Y., Lu, C., Tang, C.K.: Online video object detection using association lstm. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 2363–2371 (2017). https://doi.org/10.1109/ICCV.2017.257

  38. Ma, X., Blaschko, M.B.: Meta-cal: well-controlled post-hoc calibration by ranking. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 139, pp. 7235–7245. PMLR (2021). https://proceedings.mlr.press/v139/ma21a.html

  39. Mehrtash, A., Wells, W.M., Tempany, C.M., Abolmaesumi, P., Kapur, T.: Confidence calibration and predictive uncertainty estimation for deep medical image segmentation. IEEE Trans. Med. Imaging 39(12), 3868–3878 (2020). https://doi.org/10.1109/tmi.2020.3006437

    Article  Google Scholar 

  40. Mukhoti, J., Kulharia, V., Sanyal, A., Golodetz, S., Torr, P., Dokania, P.: Calibrating deep neural networks using focal loss. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 15288–15299. Curran Associates, Inc. (2020). https://proceedings.neurips.cc/paper/2020/file/aeb7b30ef1d024a76f21a1d40e30c302-Paper.pdf

  41. Munir, M.A., Khan, M.H., Khan, S., Khan, F.S.: Bridging precision and confidence: a train-time loss for calibrating object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11474–11483 (2023)

    Google Scholar 

  42. Munir, M.A., Khan, M.H., Sarfraz, M., Ali, M.: Towards improving calibration in object detection under domain shift. In: Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A. (eds.) Advances in Neural Information Processing Systems, vol. 35, pp. 38706–38718. Curran Associates, Inc. (2022). https://proceedings.neurips.cc/paper_files/paper/2022/file/fcd812a51b8f8d05cfea22e3c9c4b369-Paper-Conference.pdf

  43. Munir, M.A., Khan, S., Khan, M.H., Ali, M., Khan, F.: Cal-DETR: calibrated detection transformer. In: Thirty-seventh Conference on Neural Information Processing Systems (2023). https://openreview.net/forum?id=4SkPTD6XNP

  44. Nixon, J., Dusenberry, M.W., Zhang, L., Jerfel, G., Tran, D.: Measuring calibration in deep learning. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)

    Google Scholar 

  45. Oksuz, K., Cam, B.C., Akbas, E., Kalkan, S.: Localization recall precision (LRP): a new performance metric for object detection. In: The European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

  46. Oksuz, K., Cam, B.C., Akbas, E., Kalkan, S.: Rank & sort loss for object detection and instance segmentation. In: The International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  47. Oksuz, K., Cam, B.C., Kalkan, S., Akbas, E.: One metric to measure them all: localisation recall precision (lrp) for evaluating visual detection tasks. IEEE Trans. Pattern Anal. Mach. Intell. 44, 9446–9463 (2021)

    Article  Google Scholar 

  48. Oksuz, K., Joy, T., Dokania, P.K.: Towards building self-aware object detectors via reliable uncertainty quantification and calibration. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2023)

    Google Scholar 

  49. Oksuz, K., Kuzucu, S., Joy, T., Dokania, P.K.: Mocae: mixture of calibrated experts significantly improves object detection. arXiv preprint arXiv:2309.14976 (2023)

  50. Otani, M., Togashi, R., Nakashima, Y., Rahtu, E., Heikkilä, J., Satoh, S.: Optimal correction cost for object detection evaluation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 21107–21115 (2022)

    Google Scholar 

  51. Ovadia, Y., et al.: Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  52. Pathiraja, B., Gunawardhana, M., Khan, M.H.: Multiclass confidence and localization calibration for object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023)

    Google Scholar 

  53. Pedregosa, F., et al.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  Google Scholar 

  54. Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv. Large Margin Classif. 10 (2000)

    Google Scholar 

  55. Popordanoska, T., Tiulpin, A., Blaschko, M.B.: Beyond classification: definition and density-based estimation of calibration in object detection. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 585–594 (2024)

    Google Scholar 

  56. Rahimi, A., Mensink, T., Gupta, K., Ajanthan, T., Sminchisescu, C., Hartley, R.: Post-hoc calibration of neural networks by g-layers. arXiv preprint arXiv:2006.12807 (2020)

  57. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  58. Rezatofighi, H., Nguyen, T.T.D., Vo, B., Vo, B., Savarese, S., Reid, I.D.: How trustworthy are the existing performance evaluations for basic vision tasks? arXiv:2008.03533 (2020)

  59. Sakaridis, C., Dai, D., Van Gool, L.: Semantic foggy scene understanding with synthetic data. Int. J. Comput. Vision 126(9), 973–992 (2018). https://doi.org/10.1007/s11263-018-1072-8

    Article  Google Scholar 

  60. Shao, S., et al.: Objects365: a large-scale, high-quality dataset for object detection. In: IEEE/CVF International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  61. Sun, P., et al.: Scalability in perception for autonomous driving: waymo open dataset. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2446–2454 (2020)

    Google Scholar 

  62. Wang, D.B., Feng, L., Zhang, M.L.: Rethinking calibration of deep neural networks: do not be afraid of overconfidence. In: Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems, vol. 34, pp. 11809–11820. Curran Associates, Inc. (2021). https://proceedings.neurips.cc/paper_files/paper/2021/file/61f3a6dbc9120ea78ef75544826c814e-Paper.pdf

  63. Wang, D.B., Feng, L., Zhang, M.L.: Rethinking calibration of deep neural networks: do not be afraid of overconfidence. In: Advances in Neural Information Processing Systems (NeurIPS) (2021)

    Google Scholar 

  64. Yan, K., Wang, X., Lu, L., Summers, R.M.: Deeplesion: automated deep mining, categorization and detection of significant radiology image findings using large-scale clinical lesion annotations. arXiv preprint arXiv:1710.01766 (2017)

  65. Yu, F., et al.: Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  66. Zadrozny, B., Elkan, C.: Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 694–699 (2002)

    Google Scholar 

  67. Zhang, H., et al.: Dino: detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 (2022)

  68. Zhang, H., Wang, Y., Dayoub, F., Sünderhauf, N.: Varifocalnet: an iou-aware dense object detector. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2021)

    Google Scholar 

  69. Zhang, J., Yao, W., Chen, X., Feng, L.: Transferable post-hoc calibration on pretrained transformers in noisy text classification. Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 11, pp. 13940–13948 (2023). https://doi.org/10.1609/aaai.v37i11.26632. https://ojs.aaai.org/index.php/AAAI/article/view/26632

  70. Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  71. Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. In: International Conference on Learning Representations (ICLR) (2021)

    Google Scholar 

  72. Zong, Z., Song, G., Liu, Y.: Detrs with collaborative hybrid assignments training. In: IEEE/CVF International Conference on Computer Vision (ICCV) (2023)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kemal Oksuz .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 647 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kuzucu, S., Oksuz, K., Sadeghi, J., Dokania, P.K. (2025). On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15079. Springer, Cham. https://doi.org/10.1007/978-3-031-72664-4_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72664-4_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72663-7

  • Online ISBN: 978-3-031-72664-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics