A framework of specialized knowledge distillation for Siamese tracker on challenging attributes | Machine Vision and Applications Skip to main content
Log in

A framework of specialized knowledge distillation for Siamese tracker on challenging attributes

  • Research
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

In recent years, Siamese network-based trackers have achieved significant improvements in real-time tracking. Despite their success, performance bottlenecks caused by unavoidably complex scenarios in target-tracking tasks are becoming increasingly non-negligible. For example, occlusion and fast motion are factors that can easily cause tracking failures and are labeled in many high-quality tracking databases as challenging attributes. In addition, Siamese trackers tend to suffer from high memory costs, which restricts their applicability to mobile devices with tight memory budgets. To address these issues, we propose a Specialized teachers Distilled Siamese Tracker (SDST) framework to learn a student tracker, which is small, fast, and has enhanced performance in challenging attributes. SDST introduces two types of teachers for multi-teacher distillation: general teacher and specialized teachers. The former imparts basic knowledge to the students. The latter is used to transfer specialized knowledge to students, which helps improve their performance in challenging attributes. For students to efficiently capture critical knowledge from the two types of teachers, SDST is equipped with a carefully designed multi-teacher knowledge distillation model. Our model contains two processes: general teacher-student knowledge transfer and specialized teachers-student knowledge transfer. Extensive empirical evaluations of several popular Siamese trackers demonstrated the generality and effectiveness of our framework. Moreover, the results on Large-scale Single Object Tracking (LaSOT) show that the proposed method achieves a significant improvement of more than 2–4% in most challenging attributes. SDST also maintained high overall performance while achieving compression rates of up to 8x and framerates of 252 FPS and obtaining outstanding accuracy on all challenging attributes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Yi, Wu., J.L., Yang, M.-H,: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015)

  2. Fan, H., Lin, L., Yang, F., Chu, P., Deng, G., Yu, S., Bai, H., Xu, Y., Liao, C., Ling, H.: Lasot: A high-quality benchmark for large-scale single object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5374–5383 (2019)

  3. Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 1420–1429 (2016)

  4. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part II 14, 850–865 (2016). Springer

  5. Dong, X., Shen, J.: Triplet loss in siamese network for object tracking. In: Proceedings of the European conference on computer vision (ECCV), 459–474 (2018)

  6. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with siamese region proposal network. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 8971–8980 (2018)

  7. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware siamese networks for visual object tracking. In: Proceedings of the European conference on computer vision (ECCV), 101–117 (2018)

  8. Gao, J., Zhang, T., Xu, C.: Graph convolutional tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4649–4659 (2019)

  9. Choi, J., Kwon, J., Lee, K.M.: Deep meta learning for real-time target-aware visual tracking. In: Proceedings of the IEEE/CVF international conference on computer vision, 911–920 (2019)

  10. Kristan, M., Leonardis, A., Matas, J., Felsberg, M., Pflugfelder, R., Čehovin Zajc, L., Vojir, T., Bhat, G., Lukezic, A., Eldesokey, A., : The sh visual object tracking vot2018 challenge results. In: Proceedings of the European conference on computer vision (ECCV) Workshops, 0–0 (2018ixt)

  11. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of siamese visual tracking with very deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4282–4291 (2019)

  12. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  13. Liu, Y., Zhang, W., Wang, J.: Adaptive multi-teacher multi-level knowledge distillation. Neurocomputing 415, 106–113 (2020)

    Article  Google Scholar 

  14. Meng, Z., Yao, X., Sun, L.: Multi-task distillation: Towards mitigating the negative transfer in multi-task learning. In: 2021 IEEE international conference on image processing (ICIP), pp. 389–393 (2021). IEEE

  15. Bolme, D.S., Beveridge, J.R., Draper, B.A., Lui, Y.M.: Visual object tracking using adaptive correlation filters. In: 2010 IEEE Computer society conference on computer vision and pattern recognition, pp. 2544–2550 (2010). IEEE

  16. Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part V 14, 472–488 (2016). Springer

  17. Liang, T., Li, B., Wang, M., Tan, H., Luo, Z.: A closer look at the joint training of object detection and re-identification in multi-object tracking. IEEE Trans. Image Process. 32, 267–280 (2022)

    Article  Google Scholar 

  18. Wang, N., Yeung, D.-Y.: Learning a deep compact image representation for visual tracking. Advances in neural information processing systems 26 (2013)

  19. Qi, Y., Zhang, S., Qin, L., Huang, Q., Yao, H., Lim, J., Yang, M.-H.: Hedging deep features for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 41(5), 1116–1130 (2018)

    Article  Google Scholar 

  20. Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2805–2813 (2017)

  21. Dong, X., Shen, J., Wang, W., Liu, Y., Shao, L., Porikli, F.: Hyperparameter optimization for tracking with continuous deep q-learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 518–527 (2018)

  22. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28 (2015)

  23. Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 1328–1338 (2019)

  24. Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 7952–7961 (2019)

  25. Xu, Y., Wang, Z., Li, Z., Yuan, Y., Yu, G.: Siamfc++: Towards robust and accurate visual tracking with target estimation guidelines. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 12549–12556 (2020)

  26. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)

  27. Yan, B., Peng, H., Wu, K., Wang, D., Fu, J., Lu, H.: Lighttrack: Finding lightweight neural networks for object tracking via one-shot architecture search. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15180–15189 (2021)

  28. Xue, Y., Jin, G., Shen, T., Tan, L., Yang, J., Hou, X.: Mobiletrack: Siamese efficient mobile network for high-speed uav tracking. IET Image Proc. 16(12), 3300–3313 (2022)

    Article  Google Scholar 

  29. Gu, F., Lu, J., Cai, C., Zhu, Q., Ju, Z.: Eantrack: an efficient attention network for visual tracking. IEEE Trans. Autom. Sci. Eng. (2023). https://doi.org/10.1109/TASE.2023.3319676

    Article  Google Scholar 

  30. Gu, F., Lu, J., Cai, C.: Rpformer: a robust parallel transformer for visual tracking in complex scenes. IEEE Trans. Instrum. Meas. 71, 1–14 (2022)

    Google Scholar 

  31. Gu, F., Lu, J., Cai, C., Zhu, Q., Ju, Z.: Vtst: efficient visual tracking with a stereoscopic transformer. IEEE Trans. Emerg. Top. Comput. Intell. (2024). https://doi.org/10.1109/TETCI.2024.3360303

    Article  Google Scholar 

  32. Gu, F., Lu, J., Cai, C., Zhu, Q., Ju, Z.: Rtsformer: a robust toroidal transformer with spatiotemporal features for visual tracking. IEEE Trans. Hum. Mach. Syst. (2024). https://doi.org/10.1109/THMS.2024.3370582

    Article  Google Scholar 

  33. Gopal, G.Y., Amer, M.A.: Separable self and mixed attention transformers for efficient object tracking. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp. 6708–6717 (2024)

  34. Blatter, P., Kanakis, M., Danelljan, M., Gool, L.V.: Efficient visual tracking with exemplar transformers. In: 2023 IEEE/CVF winter conference on applications of computer vision (WACV), pp. 1571–1581 (2023)

  35. Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 43(5), 1562–1577 (2019)

    Article  Google Scholar 

  36. Yuan, D., Shu, X., Liu, Q., He, Z.: Aligned spatial-temporal memory network for thermal infrared target tracking. IEEE Trans. Circuits Syst. II Express Briefs 70(3), 1224–1228 (2022)

    Google Scholar 

  37. Xue, Y., Jin, G., Shen, T., Tan, L., Wang, N., Gao, J., Wang, L.: Smalltrack: wavelet pooling and graph enhanced classification for uav small object tracking. IEEE Trans. Geosci. Remote Sens. (2023). https://doi.org/10.1109/TGRS.2023.3305728

    Article  Google Scholar 

  38. Shen, J., Liu, Y., Dong, X., Lu, X., Khan, F.S., Hoi, S.: Distilled siamese networks for visual tracking. IEEE Trans. Pattern Anal. Mach. Intell. 44(12), 8896–8909 (2021)

    Article  Google Scholar 

  39. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)

  40. Zagoruyko, S., Komodakis, N.: Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016)

  41. Yim, J., Joo, D., Bae, J., Kim, J.: A gift from knowledge distillation: fast optimization, network minimization and transfer learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4133–4141 (2017)

  42. Park, S., Kwak, N.: Feature-level ensemble knowledge distillation for aggregating knowledge from multiple networks. In: ECAI 2020, pp. 1411–1418. IOS Press (2020)

  43. You, S., Xu, C., Xu, C., Tao, D.: Learning from multiple teacher networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pp. 1285–1294 (2017)

  44. Tan, X., Ren, Y., He, D., Qin, T., Zhao, Z., Liu, T.-Y.: Multilingual neural machine translation with knowledge distillation. arXiv preprint arXiv:1902.10461 (2019)

  45. Liang, T., Wang, M., Chen, J., Chen, D., Luo, Z., Leung, V.C.: Compressing the multiobject tracking model via knowledge distillation. IEEE Trans. Comput. Social Syst. (2023). https://doi.org/10.1109/TCSS.2023.3293882

    Article  Google Scholar 

  46. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012)

  47. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  48. Choi, J., Chang, H.J., Fischer, T., Yun, S., Lee, K., Jeong, J., Demiris, Y., Choi, J.Y.: Context-aware deep feature compression for high-speed visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 479–488 (2018)

  49. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4293–4302 (2016)

  50. Huang, L., Zhao, X., Huang, K.: Globaltrack: A simple and strong baseline for long-term tracking. In: Proceedings of the AAAI conference on artificial intelligence. 34, 11037–11044 (2020)

  51. Borsuk, V., Vei, R., Kupyn, O., Martyniuk, T., Krashenyi, I., Matas, J.: Fear: Fast, efficient, accurate and robust visual tracker. In: European conference on computer vision, pp. 644–663 (2022). Springer

Download references

Acknowledgements

This work was supported by JST CREST Grant Number JPMJCR22D1 and JSPS KAKENHI Grant Number JP22H00551, Japan.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yiding Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Shimada, A., Minematsu, T. et al. A framework of specialized knowledge distillation for Siamese tracker on challenging attributes. Machine Vision and Applications 35, 94 (2024). https://doi.org/10.1007/s00138-024-01578-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-024-01578-4

Keywords

Navigation