{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,2,21]],"date-time":"2025-02-21T14:50:50Z","timestamp":1740149450230,"version":"3.37.3"},"reference-count":37,"publisher":"MDPI AG","issue":"23","license":[{"start":{"date-parts":[[2021,12,3]],"date-time":"2021-12-03T00:00:00Z","timestamp":1638489600000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/creativecommons.org\/licenses\/by\/4.0\/"}],"funder":[{"DOI":"10.13039\/501100001809","name":"National Natural Science Foundation of China","doi-asserted-by":"publisher","award":["61772527, 61976210, 62076235, 62002356"],"id":[{"id":"10.13039\/501100001809","id-type":"DOI","asserted-by":"publisher"}]},{"name":"Key-Areas Research and Development Program of \tGuangdong Province","award":["2020B010165001"]}],"content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["Sensors"],"abstract":"Bounding box estimation by overlap maximization has improved the state of the art of visual tracking significantly, yet the improvement in robustness and accuracy is restricted by the limited reference information, i.e., the initial target. In this paper, we present DCOM, a novel bounding box estimation method for visual tracking, based on distribution calibration and overlap maximization. We assume every dimension in the modulation vector follows a Gaussian distribution, so that the mean and the variance can borrow from those of similar targets in large-scale training datasets. As such, sufficient and reliable reference information can be obtained from the calibrated distribution, leading to a more robust and accurate target estimation. Additionally, an updating strategy for the modulation vector is proposed to adapt the variation of the target object. Our method can be built on top of off-the-shelf networks without finetuning and extra parameters. It yields state-of-the-art performance on three popular benchmarks, including GOT-10k, LaSOT, and NfS while running at around 40 FPS, confirming its effectiveness and efficiency.<\/jats:p>","DOI":"10.3390\/s21238100","type":"journal-article","created":{"date-parts":[[2021,12,6]],"date-time":"2021-12-06T08:10:38Z","timestamp":1638778238000},"page":"8100","source":"Crossref","is-referenced-by-count":2,"title":["Enhanced Bounding Box Estimation with Distribution Calibration for Visual Tracking"],"prefix":"10.3390","volume":"21","author":[{"ORCID":"https:\/\/orcid.org\/0000-0003-3097-1433","authenticated-orcid":false,"given":"Bin","family":"Yu","sequence":"first","affiliation":[{"name":"School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China"},{"name":"National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing 100190, China"}]},{"given":"Ming","family":"Tang","sequence":"additional","affiliation":[{"name":"National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing 100190, China"}]},{"given":"Guibo","family":"Zhu","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China"},{"name":"National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing 100190, China"}]},{"given":"Jinqiao","family":"Wang","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China"},{"name":"National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing 100190, China"},{"name":"ObjectEye Inc., Beijing 100078, China"}]},{"given":"Hanqing","family":"Lu","sequence":"additional","affiliation":[{"name":"School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China"},{"name":"National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95 Zhongguancun East Road, Beijing 100190, China"}]}],"member":"1968","published-online":{"date-parts":[[2021,12,3]]},"reference":[{"doi-asserted-by":"crossref","unstructured":"Wang, G., Luo, C., Sun, X., Xiong, Z., and Zeng, W. (2020, January 14\u201319). Tracking by instance detection: A meta-learning approach. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WC, USA.","key":"ref_1","DOI":"10.1109\/CVPR42600.2020.00632"},{"doi-asserted-by":"crossref","unstructured":"Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., and Yan, J. (2019, January 15\u201320). Siamrpn++: Evolution of siamese visual tracking with very deep networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.","key":"ref_2","DOI":"10.1109\/CVPR.2019.00441"},{"doi-asserted-by":"crossref","unstructured":"Ma, C., Huang, J.B., Yang, X., and Yang, M.H. (2015, January 7\u201313). Hierarchical convolutional features for visual tracking. Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile.","key":"ref_3","DOI":"10.1109\/ICCV.2015.352"},{"unstructured":"Zheng, L., Tang, M., Chen, Y., Wang, J., and Lu, H. (November, January 27). Fast-deepKCF Without Boundary Effect. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","key":"ref_4"},{"doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Shahbaz Khan, F., and Felsberg, M. (2017, January 21\u201326). Eco: Efficient convolution operators for tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.","key":"ref_5","DOI":"10.1109\/CVPR.2017.733"},{"doi-asserted-by":"crossref","unstructured":"Song, Y., Ma, C., Wu, X., Gong, L., Bao, L., Zuo, W., Shen, C., Lau, R.W., and Yang, M.H. (2018, January 18\u201323). Vital: Visual tracking via adversarial learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","key":"ref_6","DOI":"10.1109\/CVPR.2018.00937"},{"doi-asserted-by":"crossref","unstructured":"Tang, M., Yu, B., Zhang, F., and Wang, J. (2018, January 18\u201323). High-speed tracking with multi-kernel correlation filters. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","key":"ref_7","DOI":"10.1109\/CVPR.2018.00512"},{"doi-asserted-by":"crossref","unstructured":"Yu, B., Tang, M., Zheng, L., Zhu, G., Wang, J., Feng, H., Feng, X., and Lu, H. (2021, January 20\u201323). High-Performance Discriminative Tracking With Transformers. Proceedings of the IEEE International Conference on Computer Vision, Cambridge, MA, USA.","key":"ref_8","DOI":"10.1109\/ICCV48922.2021.00971"},{"unstructured":"Yu, B., Tang, M., Zheng, L., Zhu, G., Wang, J., and Lu, H. (November, January 29). High-Performance Discriminative Tracking with Target-Aware Feature Embeddings. Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision, Beijing, China.","key":"ref_9"},{"doi-asserted-by":"crossref","unstructured":"Wang, D., Wang, J.G., and Xu, K. (2021). Deep Learning for Object Detection, Classification and Tracking in Industry Applications. Sensors, 21.","key":"ref_10","DOI":"10.3390\/s21217349"},{"doi-asserted-by":"crossref","unstructured":"Auguste, A., Kaddah, W., Elbouz, M., Oudinet, G., and Alfalou, A. (2021). Behavioral Analysis and Individual Tracking Based on Kalman Filter: Application in an Urban Environment. Sensors, 21.","key":"ref_11","DOI":"10.3390\/s21217234"},{"doi-asserted-by":"crossref","unstructured":"Li, Y., and Zhu, J. (2014, January 6\u20137). A scale adaptive kernel correlation filter tracker with feature integration. Proceedings of the European Conference on Computer Vision, Zurich, Switzerland.","key":"ref_12","DOI":"10.1007\/978-3-319-16181-5_18"},{"key":"ref_13","doi-asserted-by":"crossref","first-page":"1561","DOI":"10.1109\/TPAMI.2016.2609928","article-title":"Discriminative scale space tracking","volume":"39","author":"Danelljan","year":"2016","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"doi-asserted-by":"crossref","unstructured":"Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H. (2016, January 11\u201314). Fully-convolutional siamese networks for object tracking. Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands.","key":"ref_14","DOI":"10.1007\/978-3-319-48881-3_56"},{"doi-asserted-by":"crossref","unstructured":"Li, B., Yan, J., Wu, W., Zhu, Z., and Hu, X. (2018, January 18\u201323). High performance visual tracking with siamese region proposal network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.","key":"ref_15","DOI":"10.1109\/CVPR.2018.00935"},{"doi-asserted-by":"crossref","unstructured":"Chen, Z., Zhong, B., Li, G., Zhang, S., and Ji, R. (2020). Siamese Box Adaptive Network for Visual Tracking. arXiv.","key":"ref_16","DOI":"10.1109\/CVPR42600.2020.00670"},{"doi-asserted-by":"crossref","unstructured":"Danelljan, M., Bhat, G., Khan, F.S., and Felsberg, M. (2019, January 14\u201319). Atom: Accurate tracking by overlap maximization. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WC, USA.","key":"ref_17","DOI":"10.1109\/CVPR.2019.00479"},{"unstructured":"Bhat, G., Danelljan, M., Gool, L.V., and Timofte, R. (November, January 27). Learning discriminative model prediction for tracking. Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea.","key":"ref_18"},{"doi-asserted-by":"crossref","unstructured":"Danelljan, M., Gool, L.V., and Timofte, R. (2020, January 13\u201319). Probabilistic regression for visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","key":"ref_19","DOI":"10.1109\/CVPR42600.2020.00721"},{"doi-asserted-by":"crossref","unstructured":"Zheng, L., Tang, M., Chen, Y., Wang, J., and Lu, H. (2020, January 23\u201328). Learning Feature Embeddings for Discriminant Model based Tracking. Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK.","key":"ref_20","DOI":"10.1007\/978-3-030-58555-6_45"},{"key":"ref_21","doi-asserted-by":"crossref","first-page":"1562","DOI":"10.1109\/TPAMI.2019.2957464","article-title":"Got-10k: A large high-diversity benchmark for generic object tracking in the wild","volume":"43","author":"Huang","year":"2019","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"unstructured":"Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., and Ling, H. (November, January 27). Lasot: A high-quality benchmark for large-scale single object tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seoul, Korea.","key":"ref_22"},{"doi-asserted-by":"crossref","unstructured":"Kiani Galoogahi, H., Fagg, A., Huang, C., Ramanan, D., and Lucey, S. (2017, January 21\u201326). Need for speed: A benchmark for higher frame rate object tracking. Proceedings of the IEEE International Conference on Computer Vision, Honolulu, HI, USA.","key":"ref_23","DOI":"10.1109\/ICCV.2017.128"},{"unstructured":"Yang, S., Liu, L., and Xu, M. (2021). Free lunch for few-shot learning: Distribution calibration. arXiv.","key":"ref_24"},{"key":"ref_25","doi-asserted-by":"crossref","first-page":"583","DOI":"10.1109\/TPAMI.2014.2345390","article-title":"High-speed tracking with kernelized correlation filters","volume":"37","author":"Henriques","year":"2014","journal-title":"IEEE Trans. Pattern Anal. Mach. Intell."},{"doi-asserted-by":"crossref","unstructured":"Nam, H., and Han, B. (2016, January 27\u201330). Learning multi-domain convolutional neural networks for visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","key":"ref_26","DOI":"10.1109\/CVPR.2016.465"},{"doi-asserted-by":"crossref","unstructured":"Huang, C., Lucey, S., and Ramanan, D. (2017, January 22\u201329). Learning policies for adaptive tracking with deep feature cascades. Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy.","key":"ref_27","DOI":"10.1109\/ICCV.2017.21"},{"doi-asserted-by":"crossref","unstructured":"Wang, Q., Zhang, L., Bertinetto, L., Hu, W., and Torr, P.H. (2019, January 14\u201319). Fast online object tracking and segmentation: A unifying approach. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WC, USA.","key":"ref_28","DOI":"10.1109\/CVPR.2019.00142"},{"doi-asserted-by":"crossref","unstructured":"Bhat, G., Danelljan, M., Van Gool, L., and Timofte, R. (2020, January 23\u201328). Know your surroundings: Exploiting scene information for object tracking. Proceedings of the European Conference on Computer Vision, Glasgow, UK.","key":"ref_29","DOI":"10.1007\/978-3-030-58592-1_13"},{"doi-asserted-by":"crossref","unstructured":"Guo, D., Wang, J., Cui, Y., Wang, Z., and Chen, S. (2020, January 13\u201319). SiamCAR: Siamese Fully Convolutional Classification and Regression for Visual Tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","key":"ref_30","DOI":"10.1109\/CVPR42600.2020.00630"},{"doi-asserted-by":"crossref","unstructured":"Voigtlaender, P., Luiten, J., Torr, P.H., and Leibe, B. (2020, January 13\u201319). Siam r-cnn: Visual tracking by re-detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA.","key":"ref_31","DOI":"10.1109\/CVPR42600.2020.00661"},{"doi-asserted-by":"crossref","unstructured":"Yan, B., Zhang, X., Wang, D., Lu, H., and Yang, X. (2021, January 19\u201325). Alpha-refine: Boosting tracking performance by precise bounding box estimation. Proceedings of the IEEE\/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA.","key":"ref_32","DOI":"10.1109\/CVPR46437.2021.00525"},{"doi-asserted-by":"crossref","unstructured":"He, K., Zhang, X., Ren, S., and Sun, J. (2016, January 27\u201330). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.","key":"ref_33","DOI":"10.1109\/CVPR.2016.90"},{"key":"ref_34","doi-asserted-by":"crossref","first-page":"1","DOI":"10.1007\/s11432-019-2710-7","article-title":"Progressive rectification network for irregular text recognition","volume":"63","author":"Gao","year":"2020","journal-title":"Sci. China Inf. Sci."},{"doi-asserted-by":"crossref","unstructured":"Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23\u201328). Rich feature hierarchies for accurate object detection and semantic segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.","key":"ref_35","DOI":"10.1109\/CVPR.2014.81"},{"doi-asserted-by":"crossref","unstructured":"Jiang, B., Luo, R., Mao, J., Xiao, T., and Jiang, Y. (2018, January 8\u201314). Acquisition of localization confidence for accurate object detection. Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany.","key":"ref_36","DOI":"10.1007\/978-3-030-01264-9_48"},{"unstructured":"Tukey, J.W. (1977). Exploratory Data Analysis, Addison-Wesley Publishing Company Reading.","key":"ref_37"}],"container-title":["Sensors"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/23\/8100\/pdf","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,1,7]],"date-time":"2025-01-07T13:59:02Z","timestamp":1736258342000},"score":1,"resource":{"primary":{"URL":"https:\/\/www.mdpi.com\/1424-8220\/21\/23\/8100"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2021,12,3]]},"references-count":37,"journal-issue":{"issue":"23","published-online":{"date-parts":[[2021,12]]}},"alternative-id":["s21238100"],"URL":"https:\/\/doi.org\/10.3390\/s21238100","relation":{},"ISSN":["1424-8220"],"issn-type":[{"type":"electronic","value":"1424-8220"}],"subject":[],"published":{"date-parts":[[2021,12,3]]}}}