Abstract
We propose a new two stage spatio-temporal object detector framework able to improve detection precision by taking into account temporal information. First, a short-term proposal linking and aggregation method improves box features. Then, we design a long-term attention module that further enhances short-term aggregated features adding long-term spatio-temporal information. This module takes into account object trajectories to effectively exploit long-term relationships between proposals in arbitrary distant frames. Many videos recorded from UAV on-board cameras have a high density of small objects, making the detection problem very challenging. Our method takes advantage of spatio-temporal information to address these issues increasing the detection robustness. We have compared our method with state-of-the-art video object detectors in two different publicly available datasets focused on UAV recorded videos. Our approach outperforms previous methods in both datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bertasius, G., Torresani, L., Shi, J.: Object detection in video with spatiotemporal sampling networks. In: IEEE International Conference on Computer Vision (ICCV) (2018)
Chen, Y., Cao, Y., Hu, H., Wang, L.: Memory enhanced global-local aggregation for video object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10337–10346 (2020)
Cores, D., Mucientes, M., Brea, V.M.: RoI feature propagation for video object detection. In: European Conference on Artificial Intelligence (ECAI) (2020)
Deng, J., Pan, Y., Yao, T., Zhou, W., Li, H., Mei, T.: Relation distillation networks for video object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 7023–7032 (2019)
Du, D., et al.: The unmanned aerial vehicle benchmark: Object detection and tracking. In: European Conference on Computer Vision (ECCV), pp. 370–386 (2018)
Guo, C., et al.: Progressive sparse local attention for video object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 3909–3918 (2019)
Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3588–3597 (2018)
Kang, K., et al.: Object detection in videos with tubelet proposal networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Kang, K., et al.: T-CNN: Tubelets with convolutional neural networks for object detection from videos. IEEE Trans. Circ. Syst. Video Technol. 28(10), 2896–2907 (2017)
Kang, K., Ouyang, W., Li, H., Wang, X.: Object detection from video tubelets with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Kuhn, H.W.: The hungarian method for the assignment problem. Naval Res. Logist. Q. 2(1–2), 83–97 (1955)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8759–8768 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems (NIPS) (2015)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Tang, P., Wang, C., Wang, X., Liu, W., Zeng, W., Wang, J.: Object detection in videos by high quality object linking. IEEE Transactions on Pattern Analysis and Machine Intelligence (2019)
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: Fully convolutional one-stage object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 9627–9636 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Wang, S., Zhou, Y., Yan, J., Deng, Z.: Fully motion-aware network for video object detection. In: IEEE International Conference on Computer Vision (ICCV) (2018)
Xiao, F., Jae Lee, Y.: Video object detection with an aligned spatial-temporal memory. In: European Conference on Computer Vision (ECCV) (2018)
Zhu, P., Wen, L., Bian, X., Ling, H., Hu, Q.: Vision meets drones: A challenge. arXiv preprint arXiv:1804.07437 (2018)
Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable convnets v2: more deformable, better results. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9308–9316 (2019)
Zhu, X., Wang, Y., Dai, J., Yuan, L., Wei, Y.: Flow-guided feature aggregation for video object detection. In: IEEE International Conference on Computer Vision (ICCV) (2017)
Acknowledgements
This research was partially funded by the Spanish Ministry of Science, Innovation and Universities under grants TIN2017-84796-C2-1-R and RTI2018-097088-B-C32, and the Galician Ministry of Education, Culture and Universities under grants ED431C 2018/29, ED431C 2017/69 and accreditation 2016–2019, ED431G/08. These grants are co-funded by the European Regional Development Fund (ERDF/FEDER program).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Cores, D., Brea, V., Mucientes, M. (2021). Spatio-Temporal Object Detection from UAV On-Board Cameras. In: Tsapatsoulis, N., Panayides, A., Theocharides, T., Lanitis, A., Pattichis, C., Vento, M. (eds) Computer Analysis of Images and Patterns. CAIP 2021. Lecture Notes in Computer Science(), vol 13053. Springer, Cham. https://doi.org/10.1007/978-3-030-89131-2_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-89131-2_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89130-5
Online ISBN: 978-3-030-89131-2
eBook Packages: Computer ScienceComputer Science (R0)