COMET: Context-Aware IoU-Guided Network for Small Object Tracking | SpringerLink
Skip to main content

COMET: Context-Aware IoU-Guided Network for Small Object Tracking

  • Conference paper
  • First Online:
Computer Vision – ACCV 2020 (ACCV 2020)

Abstract

We consider the problem of tracking an unknown small target from aerial videos of medium to high altitudes. This is a challenging problem, which is even more pronounced in unavoidable scenarios of drastic camera motion and high density. To address this problem, we introduce a context-aware IoU-guided tracker (COMET) that exploits a multitask two-stream network and an offline reference proposal generation strategy. The proposed network fully exploits target-related information by multi-scale feature learning and attention modules. The proposed strategy introduces an efficient sampling strategy to generalize the network on the target and its parts without imposing extra computational complexity during online tracking. These strategies contribute considerably in handling significant occlusions and viewpoint changes. Empirically, COMET outperforms the state-of-the-arts in a range of aerial view datasets that focusing on tracking small objects. Specifically, COMET outperforms the celebrated ATOM tracker by an average margin of \(6.2\%\) (and \(7\%\)) in precision (and success) score on challenging benchmarks of UAVDT, VisDrone-2019, and Small-90.

S. M. Marvasti-Zadeh and J. Khaghani—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Du, D., Zhu, P., Wen, L., Bian, X., Ling, H., et al.: VisDrone-SOT2019: the vision meets drone single object tracking challenge results. In: Proceedings of ICCVW (2019)

    Google Scholar 

  2. Du, D., et al.: The unmanned aerial vehicle benchmark: object detection and tracking. In: Proceedings of ECCV, pp. 375–391 (2018)

    Google Scholar 

  3. Marvasti-Zadeh, S.M., Cheng, L., Ghanei-Yakhdan, H., Kasaei, S.: Deep learning for visual tracking: a comprehensive survey. IEEE Trans. Intell. Trans. Syst. 1–26 (2021). https://doi.org/10.1109/TITS.2020.3046478

  4. Bonatti, R., Ho, C., Wang, W., Choudhury, S., Scherer, S.: Towards a robust aerial cinematography platform: localizing and tracking moving targets in unstructured environments. In: Proceedings of IROS, pp. 229–236 (2019)

    Google Scholar 

  5. Zhang, H., Wang, G., Lei, Z., Hwang, J.: Eye in the sky: drone-based object tracking and 3D localization. In: Proceedings of Multimedia, pp. 899–907 (2019)

    Google Scholar 

  6. Du, D., Zhu, P., et al.: VisDrone-SOT2019: the vision meets drone single object tracking challenge results. In: Proceedings of ICCVW (2019)

    Google Scholar 

  7. Zhu, P., Wen, L., Du, D., Bian, X., Hu, Q., Ling, H.: Vision meets drones: past, present and future (2020)

    Google Scholar 

  8. Zhu, P., Wen, L., Du, D., et al.: VisDrone-VDT2018: the vision meets drone video detection and tracking challenge results. In: Proceedings of ECCVW, pp. 496–518 (2018)

    Google Scholar 

  9. Yu, H., Li, G., Zhang, W., et al.: The unmanned aerial vehicle benchmark: object detection, tracking and baseline. Int. J. Comput. 128(5), 1141–1159 (2019)

    Google Scholar 

  10. Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27

    Chapter  Google Scholar 

  11. Liu, C., Ding, W., Yang, J., et al.: Aggregation signature for small object tracking. IEEE Trans. Image Process. 29, 1738–1747 (2020)

    Article  MathSciNet  Google Scholar 

  12. Wu, Y., Lim, J., Yang, M.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1834–1848 (2015)

    Article  Google Scholar 

  13. Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans. Image Process. 24, 5630–5644 (2015)

    Article  MathSciNet  Google Scholar 

  14. Tong, K., Wu, Y., Zhou, F.: Recent advances in small object detection based on deep learning: a review. Image Vis. Comput. 97, 103910 (2020)

    Google Scholar 

  15. LaLonde, R., Zhang, D., Shah, M.: ClusterNet: detecting small objects in large scenes by exploiting spatio-temporal information. In: Proceedings of CVPR (2018)

    Google Scholar 

  16. Bai, Y., Zhang, Y., Ding, M., Ghanem, B.: SOD-MTGAN: small object detection via multi-task generative adversarial network. In: Proceedings of ECCV (2018)

    Google Scholar 

  17. Huang, Z., Fu, C., Li, Y., Lin, F., Lu, P.: Learning aberrance repressed correlation filters for real-time UAV tracking. In: Proceedings of IEEE ICCV, pp. 2891–2900 (2019)

    Google Scholar 

  18. Fu, C., Huang, Z., Li, Y., Duan, R., Lu, P.: Boundary effect-aware visual tracking for UAV with online enhanced background learning and multi-frame consensus verification. In: Proceedings of IROS, pp. 4415–4422 (2019)

    Google Scholar 

  19. Li, F., Fu, C., Lin, F., Li, Y., Lu, P.: Training-set distillation for real-time UAV object tracking. In: Proceedings of ICRA, pp. 1–7 (2020)

    Google Scholar 

  20. Li, Y., Fu, C., Huang, Z., Zhang, Y., Pan, J.: Keyfilter-aware real-time UAV object tracking. In: Proceedings of ICRA (2020)

    Google Scholar 

  21. Li, Y., Fu, C., Ding, F., Huang, Z., Lu, G.: AutoTrack: towards high-performance visual tracking for UAV with automatic spatio-temporal regularization. In: Proceedings of IEEE CVPR (2020)

    Google Scholar 

  22. Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ATOM: accurate tracking by overlap maximization. In: Proceedings of CVPR (2019)

    Google Scholar 

  23. Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_45

    Chapter  Google Scholar 

  24. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56

    Chapter  Google Scholar 

  25. Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of IEEE CVPR, pp. 5000–5008 (2017)

    Google Scholar 

  26. Dong, X., Shen, J.: Triplet loss in Siamese network for object tracking. In: Proceedings of ECCV, pp. 472–488 (2018)

    Google Scholar 

  27. Danelljan, M., Robinson, A., Khan, F.S., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Proceedings of ECCV, pp. 472–488 (2016)

    Google Scholar 

  28. Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: ECO: efficient convolution operators for tracking. In: Proceedings of IEEE CVPR, pp. 6931–6939 (2017)

    Google Scholar 

  29. Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Proceedings of ECCV, pp. 103–119 (2018)

    Google Scholar 

  30. Zhang, Z., Peng, H.: Deeper and wider Siamese networks for real-time visual tracking (2019)

    Google Scholar 

  31. Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking (2018)

    Google Scholar 

  32. Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of IEEE CVPR (2019)

    Google Scholar 

  33. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: Proceedings of IEEE CVPR (2019)

    Google Scholar 

  34. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: Proceedings of IEEE CVPR, pp. 8971–8980 (2018)

    Google Scholar 

  35. Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of IEEE ICCV (2019)

    Google Scholar 

  36. Liu, W., et al.: SSD: single shot MultiBox detector. In: Proceedings of ECCV, pp. 21–37 (2016)

    Google Scholar 

  37. Fu, C., Liu, W., Ranga, A., Tyagi, A., Berg, A.: DSSD: deconvolutional single shot detector (2017)

    Google Scholar 

  38. Cui, L., et al.: MDSSD: multi-scale deconvolutional single shot detector for small objects (2018)

    Google Scholar 

  39. Lim, J.S., Astrid, M., Yoon, H.J., Lee, S.I.: Small object detection using context and attention (2019)

    Google Scholar 

  40. Yang, X., et al.: SCRDet: towards more robust detection for small, cluttered and rotated objects. In: Proceedings IEEE ICCV (2019)

    Google Scholar 

  41. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018)

    Google Scholar 

  42. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE CVPR, pp. 580–587 (2014)

    Google Scholar 

  43. Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Proceedings of IEEE ECCV, pp. 816–832 (2018)

    Google Scholar 

  44. Park, J., Woo, S., Lee, J.Y., Kweon, I.S.: BAM: bottleneck attention module. In: Proceedings of BMVC, pp. 147–161 (2018)

    Google Scholar 

  45. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE CVPR, pp. 770–778 (2016)

    Google Scholar 

  46. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of CVPR, pp. 2818–2826 (2016)

    Google Scholar 

  47. Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020). https://doi.org/10.1109/TPAMI.2019.2913372

    Article  Google Scholar 

  48. Fan, H., et al.: LaSOT: a high-quality benchmark for large-scale single object tracking. In: Proceedings of IEEE CVPR (2019)

    Google Scholar 

  49. Huang, L., Zhao, X., Huang, K.: GOT-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 1 (2019). https://doi.org/10.1109/TPAMI.2019.2957464

  50. He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of ICCV, pp. 1026–1034 (2015)

    Google Scholar 

  51. Galoogahi, H.K., Fagg, A., Huang, C., Ramanan, D., Lucey, S.: Need for speed: a benchmark for higher frame rate object tracking. In: Proceedings of IEEE ICCV, pp. 1134–1143 (2017)

    Google Scholar 

  52. Kingma, D.P., Ba, J.: ADAM: a method for stochastic optimization. In: Proceedings of ICLR (2014)

    Google Scholar 

  53. Danelljan, M., Gool, L.V., Timofte, R.: Probabilistic regression for visual tracking. In: Proceedings of IEEE CVPR (2020)

    Google Scholar 

  54. Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: object-aware anchor-free tracking. In: Proceedings of ECCV (2020)

    Google Scholar 

  55. Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R.W., Yang, M.H.: CREST: convolutional residual learning for visual tracking. In: Proceedings of ICCV, pp. 2574–2583 (2017)

    Google Scholar 

  56. Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of IEEE CVPR, pp. 4293–4302 (2016)

    Google Scholar 

  57. Fan, H., Ling, H.: Parallel tracking and verifying. IEEE Trans. Image Process. 28, 4130–4144 (2019)

    Article  Google Scholar 

  58. Zhang, T., Xu, C., Yang, M.H.: Multi-task correlation particle filter for robust object tracking. In: Proceedings of IEEE CVPR, pp. 4819–4827 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Seyed Mojtaba Marvasti-Zadeh .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (mp4 93098 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Marvasti-Zadeh, S.M., Khaghani, J., Ghanei-Yakhdan, H., Kasaei, S., Cheng, L. (2021). COMET: Context-Aware IoU-Guided Network for Small Object Tracking. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12623. Springer, Cham. https://doi.org/10.1007/978-3-030-69532-3_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69532-3_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69531-6

  • Online ISBN: 978-3-030-69532-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics