Abstract
We consider the problem of tracking an unknown small target from aerial videos of medium to high altitudes. This is a challenging problem, which is even more pronounced in unavoidable scenarios of drastic camera motion and high density. To address this problem, we introduce a context-aware IoU-guided tracker (COMET) that exploits a multitask two-stream network and an offline reference proposal generation strategy. The proposed network fully exploits target-related information by multi-scale feature learning and attention modules. The proposed strategy introduces an efficient sampling strategy to generalize the network on the target and its parts without imposing extra computational complexity during online tracking. These strategies contribute considerably in handling significant occlusions and viewpoint changes. Empirically, COMET outperforms the state-of-the-arts in a range of aerial view datasets that focusing on tracking small objects. Specifically, COMET outperforms the celebrated ATOM tracker by an average margin of \(6.2\%\) (and \(7\%\)) in precision (and success) score on challenging benchmarks of UAVDT, VisDrone-2019, and Small-90.
S. M. Marvasti-Zadeh and J. Khaghani—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Du, D., Zhu, P., Wen, L., Bian, X., Ling, H., et al.: VisDrone-SOT2019: the vision meets drone single object tracking challenge results. In: Proceedings of ICCVW (2019)
Du, D., et al.: The unmanned aerial vehicle benchmark: object detection and tracking. In: Proceedings of ECCV, pp. 375–391 (2018)
Marvasti-Zadeh, S.M., Cheng, L., Ghanei-Yakhdan, H., Kasaei, S.: Deep learning for visual tracking: a comprehensive survey. IEEE Trans. Intell. Trans. Syst. 1–26 (2021). https://doi.org/10.1109/TITS.2020.3046478
Bonatti, R., Ho, C., Wang, W., Choudhury, S., Scherer, S.: Towards a robust aerial cinematography platform: localizing and tracking moving targets in unstructured environments. In: Proceedings of IROS, pp. 229–236 (2019)
Zhang, H., Wang, G., Lei, Z., Hwang, J.: Eye in the sky: drone-based object tracking and 3D localization. In: Proceedings of Multimedia, pp. 899–907 (2019)
Du, D., Zhu, P., et al.: VisDrone-SOT2019: the vision meets drone single object tracking challenge results. In: Proceedings of ICCVW (2019)
Zhu, P., Wen, L., Du, D., Bian, X., Hu, Q., Ling, H.: Vision meets drones: past, present and future (2020)
Zhu, P., Wen, L., Du, D., et al.: VisDrone-VDT2018: the vision meets drone video detection and tracking challenge results. In: Proceedings of ECCVW, pp. 496–518 (2018)
Yu, H., Li, G., Zhang, W., et al.: The unmanned aerial vehicle benchmark: object detection, tracking and baseline. Int. J. Comput. 128(5), 1141–1159 (2019)
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27
Liu, C., Ding, W., Yang, J., et al.: Aggregation signature for small object tracking. IEEE Trans. Image Process. 29, 1738–1747 (2020)
Wu, Y., Lim, J., Yang, M.: Object tracking benchmark. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1834–1848 (2015)
Liang, P., Blasch, E., Ling, H.: Encoding color information for visual tracking: algorithms and benchmark. IEEE Trans. Image Process. 24, 5630–5644 (2015)
Tong, K., Wu, Y., Zhou, F.: Recent advances in small object detection based on deep learning: a review. Image Vis. Comput. 97, 103910 (2020)
LaLonde, R., Zhang, D., Shah, M.: ClusterNet: detecting small objects in large scenes by exploiting spatio-temporal information. In: Proceedings of CVPR (2018)
Bai, Y., Zhang, Y., Ding, M., Ghanem, B.: SOD-MTGAN: small object detection via multi-task generative adversarial network. In: Proceedings of ECCV (2018)
Huang, Z., Fu, C., Li, Y., Lin, F., Lu, P.: Learning aberrance repressed correlation filters for real-time UAV tracking. In: Proceedings of IEEE ICCV, pp. 2891–2900 (2019)
Fu, C., Huang, Z., Li, Y., Duan, R., Lu, P.: Boundary effect-aware visual tracking for UAV with online enhanced background learning and multi-frame consensus verification. In: Proceedings of IROS, pp. 4415–4422 (2019)
Li, F., Fu, C., Lin, F., Li, Y., Lu, P.: Training-set distillation for real-time UAV object tracking. In: Proceedings of ICRA, pp. 1–7 (2020)
Li, Y., Fu, C., Huang, Z., Zhang, Y., Pan, J.: Keyfilter-aware real-time UAV object tracking. In: Proceedings of ICRA (2020)
Li, Y., Fu, C., Ding, F., Huang, Z., Lu, G.: AutoTrack: towards high-performance visual tracking for UAV with automatic spatio-temporal regularization. In: Proceedings of IEEE CVPR (2020)
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ATOM: accurate tracking by overlap maximization. In: Proceedings of CVPR (2019)
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_45
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Valmadre, J., Bertinetto, L., Henriques, J., Vedaldi, A., Torr, P.H.: End-to-end representation learning for correlation filter based tracking. In: Proceedings of IEEE CVPR, pp. 5000–5008 (2017)
Dong, X., Shen, J.: Triplet loss in Siamese network for object tracking. In: Proceedings of ECCV, pp. 472–488 (2018)
Danelljan, M., Robinson, A., Khan, F.S., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Proceedings of ECCV, pp. 472–488 (2016)
Danelljan, M., Bhat, G., Shahbaz Khan, F., Felsberg, M.: ECO: efficient convolution operators for tracking. In: Proceedings of IEEE CVPR, pp. 6931–6939 (2017)
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Proceedings of ECCV, pp. 103–119 (2018)
Zhang, Z., Peng, H.: Deeper and wider Siamese networks for real-time visual tracking (2019)
Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking (2018)
Wang, Q., Zhang, L., Bertinetto, L., Hu, W., Torr, P.H.: Fast online object tracking and segmentation: a unifying approach. In: Proceedings of IEEE CVPR (2019)
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: Proceedings of IEEE CVPR (2019)
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: Proceedings of IEEE CVPR, pp. 8971–8980 (2018)
Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: Proceedings of IEEE ICCV (2019)
Liu, W., et al.: SSD: single shot MultiBox detector. In: Proceedings of ECCV, pp. 21–37 (2016)
Fu, C., Liu, W., Ranga, A., Tyagi, A., Berg, A.: DSSD: deconvolutional single shot detector (2017)
Cui, L., et al.: MDSSD: multi-scale deconvolutional single shot detector for small objects (2018)
Lim, J.S., Astrid, M., Yoon, H.J., Lee, S.I.: Small object detection using context and attention (2019)
Yang, X., et al.: SCRDet: towards more robust detection for small, cluttered and rotated objects. In: Proceedings IEEE ICCV (2019)
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement (2018)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE CVPR, pp. 580–587 (2014)
Jiang, B., Luo, R., Mao, J., Xiao, T., Jiang, Y.: Acquisition of localization confidence for accurate object detection. In: Proceedings of IEEE ECCV, pp. 816–832 (2018)
Park, J., Woo, S., Lee, J.Y., Kweon, I.S.: BAM: bottleneck attention module. In: Proceedings of BMVC, pp. 147–161 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of IEEE CVPR, pp. 770–778 (2016)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of CVPR, pp. 2818–2826 (2016)
Hu, J., Shen, L., Albanie, S., Sun, G., Wu, E.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020). https://doi.org/10.1109/TPAMI.2019.2913372
Fan, H., et al.: LaSOT: a high-quality benchmark for large-scale single object tracking. In: Proceedings of IEEE CVPR (2019)
Huang, L., Zhao, X., Huang, K.: GOT-10k: a large high-diversity benchmark for generic object tracking in the wild. IEEE Trans. Pattern Anal. Mach. Intell. 1 (2019). https://doi.org/10.1109/TPAMI.2019.2957464
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification. In: Proceedings of ICCV, pp. 1026–1034 (2015)
Galoogahi, H.K., Fagg, A., Huang, C., Ramanan, D., Lucey, S.: Need for speed: a benchmark for higher frame rate object tracking. In: Proceedings of IEEE ICCV, pp. 1134–1143 (2017)
Kingma, D.P., Ba, J.: ADAM: a method for stochastic optimization. In: Proceedings of ICLR (2014)
Danelljan, M., Gool, L.V., Timofte, R.: Probabilistic regression for visual tracking. In: Proceedings of IEEE CVPR (2020)
Zhang, Z., Peng, H., Fu, J., Li, B., Hu, W.: Ocean: object-aware anchor-free tracking. In: Proceedings of ECCV (2020)
Song, Y., Ma, C., Gong, L., Zhang, J., Lau, R.W., Yang, M.H.: CREST: convolutional residual learning for visual tracking. In: Proceedings of ICCV, pp. 2574–2583 (2017)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: Proceedings of IEEE CVPR, pp. 4293–4302 (2016)
Fan, H., Ling, H.: Parallel tracking and verifying. IEEE Trans. Image Process. 28, 4130–4144 (2019)
Zhang, T., Xu, C., Yang, M.H.: Multi-task correlation particle filter for robust object tracking. In: Proceedings of IEEE CVPR, pp. 4819–4827 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Supplementary material 1 (mp4 93098 KB)
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Marvasti-Zadeh, S.M., Khaghani, J., Ghanei-Yakhdan, H., Kasaei, S., Cheng, L. (2021). COMET: Context-Aware IoU-Guided Network for Small Object Tracking. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12623. Springer, Cham. https://doi.org/10.1007/978-3-030-69532-3_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-69532-3_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69531-6
Online ISBN: 978-3-030-69532-3
eBook Packages: Computer ScienceComputer Science (R0)