Spatio-Temporal Object Detection from UAV On-Board Cameras | SpringerLink
Skip to main content

Spatio-Temporal Object Detection from UAV On-Board Cameras

  • Conference paper
  • First Online:
Computer Analysis of Images and Patterns (CAIP 2021)

Abstract

We propose a new two stage spatio-temporal object detector framework able to improve detection precision by taking into account temporal information. First, a short-term proposal linking and aggregation method improves box features. Then, we design a long-term attention module that further enhances short-term aggregated features adding long-term spatio-temporal information. This module takes into account object trajectories to effectively exploit long-term relationships between proposals in arbitrary distant frames. Many videos recorded from UAV on-board cameras have a high density of small objects, making the detection problem very challenging. Our method takes advantage of spatio-temporal information to address these issues increasing the detection robustness. We have compared our method with state-of-the-art video object detectors in two different publicly available datasets focused on UAV recorded videos. Our approach outperforms previous methods in both datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 9151
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 11439
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bertasius, G., Torresani, L., Shi, J.: Object detection in video with spatiotemporal sampling networks. In: IEEE International Conference on Computer Vision (ICCV) (2018)

    Google Scholar 

  2. Chen, Y., Cao, Y., Hu, H., Wang, L.: Memory enhanced global-local aggregation for video object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10337–10346 (2020)

    Google Scholar 

  3. Cores, D., Mucientes, M., Brea, V.M.: RoI feature propagation for video object detection. In: European Conference on Artificial Intelligence (ECAI) (2020)

    Google Scholar 

  4. Deng, J., Pan, Y., Yao, T., Zhou, W., Li, H., Mei, T.: Relation distillation networks for video object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 7023–7032 (2019)

    Google Scholar 

  5. Du, D., et al.: The unmanned aerial vehicle benchmark: Object detection and tracking. In: European Conference on Computer Vision (ECCV), pp. 370–386 (2018)

    Google Scholar 

  6. Guo, C., et al.: Progressive sparse local attention for video object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 3909–3918 (2019)

    Google Scholar 

  7. Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3588–3597 (2018)

    Google Scholar 

  8. Kang, K., et al.: Object detection in videos with tubelet proposal networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  9. Kang, K., et al.: T-CNN: Tubelets with convolutional neural networks for object detection from videos. IEEE Trans. Circ. Syst. Video Technol. 28(10), 2896–2907 (2017)

    Article  MathSciNet  Google Scholar 

  10. Kang, K., Ouyang, W., Li, H., Wang, X.: Object detection from video tubelets with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  11. Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logist. Q. 2(1–2), 83–97 (1955)

    Article  MathSciNet  Google Scholar 

  12. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  13. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759–8768 (2018)

    Google Scholar 

  14. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)

    Google Scholar 

  15. Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)

    Google Scholar 

  16. Tang, P., Wang, C., Wang, X., Liu, W., Zeng, W., Wang, J.: Object detection in videos by high quality object linking. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)

    Google Scholar 

  17. Tian, Z., Shen, C., Chen, H., He, T.: FCOS: Fully convolutional one-stage object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 9627–9636 (2019)

    Google Scholar 

  18. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  19. Wang, S., Zhou, Y., Yan, J., Deng, Z.: Fully motion-aware network for video object detection. In: IEEE International Conference on Computer Vision (ICCV) (2018)

    Google Scholar 

  20. Xiao, F., Jae Lee, Y.: Video object detection with an aligned spatial-temporal memory. In: European Conference on Computer Vision (ECCV) (2018)

    Google Scholar 

  21. Zhu, P., Wen, L., Bian, X., Ling, H., Hu, Q.: Vision meets drones: A challenge. arXiv preprint arXiv:1804.07437 (2018)

  22. Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: more deformable, better results. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9308–9316 (2019)

    Google Scholar 

  23. Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: IEEE International Conference on Computer Vision (ICCV) (2017)

    Google Scholar 

Download references

Acknowledgements

This research was partially funded by the Spanish Ministry of Science, Innovation and Universities under grants TIN2017-84796-C2-1-R and RTI2018-097088-B-C32, and the Galician Ministry of Education, Culture and Universities under grants ED431C 2018/29, ED431C 2017/69 and accreditation 2016–2019, ED431G/08. These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Cores .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cores, D., Brea, V., Mucientes, M. (2021). Spatio-Temporal Object Detection from UAV On-Board Cameras. In: Tsapatsoulis, N., Panayides, A., Theocharides, T., Lanitis, A., Pattichis, C., Vento, M. (eds) Computer Analysis of Images and Patterns. CAIP 2021. Lecture Notes in Computer Science(), vol 13053. Springer, Cham. https://doi.org/10.1007/978-3-030-89131-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89131-2_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89130-5

  • Online ISBN: 978-3-030-89131-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics