Learning Local Feature Descriptors for Multiple Object Tracking

Mykheievskyi, Dmytro; Borysenko, Dmytro; Porokhonskyy, Viktor

doi:10.1007/978-3-030-69532-3_34

Dmytro Mykheievskyi¹²,
Dmytro Borysenko¹² &
Viktor Porokhonskyy¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12623))

Included in the following conference series:

Asian Conference on Computer Vision

970 Accesses

Abstract

The present study aims at learning class-agnostic embedding, which is suitable for Multiple Object Tracking (MOT). We demonstrate that the learning of local feature descriptors could provide a sufficient level of generalization. Proposed embedding function exhibits on-par performance with its dedicated person re-identification counterparts in their target domain and outperforms them in others. Through its utilization, our solutions achieve state-of-the-art performance in a number of MOT benchmarks, which includes CVPR’19 Tracking Challenge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11439; Price includes VAT (Japan)

Softcover Book: JPY 14299; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Local many-to-many matching via ROI feature decomposition for multi-object tracking

Article 26 June 2024

Self-supervised multi-object tracking based on metric learning

Article Open access 03 July 2024

One-Stage Object Detection and Feature Embedding Network for Multiple Object Tracking

Notes

1.
Corresponding VLFeat [46] implementations were employed similarly to Ref. [26].
2.
See the processing time comparison in the supplementary material.

References

Grant, J.M., Flynn, P.J.: Crowd scene understanding from video: a survey. ACM Trans. Multimedia Comput. Commun. Appl. 13, 1–23 (2017)
Article Google Scholar
Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 215–230. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_16
Chapter Google Scholar
Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 34, 334–352 (2004)
Google Scholar
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Google Scholar
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)
Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. arXiv preprint arXiv:1511.04136 (2015)
Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1532–1545 (2014)
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. arXiv preprint arXiv:1703.06870 (2017)
Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. arXiv preprint arXiv:1512.02325 (2015)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)
Tan, M., Pang, R., Le, Q.: EfficientDet: scalable and efficient object detection. arXiv preprint arXiv:1911.09070 (2019)
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. arXiv preprint arXiv:1612.03144 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
Xie, S., Girshick, R.B., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:1611.05431 (2016)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. arXiv preprint arXiv:1906.09756 (2019)
Dai, J., et al.: Deformable convolutional networks. arXiv preprint arXiv:1703.06211 (2017)
Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable ConvNets V2: More deformable, better results. arXiv preprint arXiv:1811.11168 (2018)
Luo, W., et al.: Multiple object tracking: A literature review. arXiv preprint arXiv:1409.7618v4 (2017)
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016)
Google Scholar
Yu, F., Li, W., Li, Q., Liu, Yu., Shi, X., Yan, J.: POI: multiple object tracking with high performance detection and appearance feature. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 36–42. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_3
Chapter Google Scholar
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. (IJRR) 32, 1231–1237 (2013)
Article Google Scholar
Gray, D., Brennan, S., Tao, H.: Evaluating appearance models for recognition, reacquisition, and tracking. In: IEEE International Workshop on Performance Evaluation for Tracking and Surveillance, Rio de Janeiro (2007)
Google Scholar
Zheng, L., Zhang, H., Sun, S., Chandraker, M., Tian, Q.: Person re-identification in the wild. arXiv preprint arXiv:1604.02531 (2016)
Wang, Z., Zheng, L., Liu, Y., Wang, S.: Towards real-time multi-object tracking. arXiv preprint arXiv:1909.12605v1 (2019)
Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: CVPR (2017)
Google Scholar
Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: Local descriptor learning loss. arXiv preprint arXiv:1705.10872 (2017)
Tian, Y., Fan, B., Wu, F.: L2-Net: Deep learning of discriminative patch descriptor in Euclidean space. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6128–6136 (2017)
Google Scholar
Balntas, V., Edgar Riba, D.P., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: Richard, C., Wilson, E.R.H., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference (BMVC), pp. 119.1–119.11. BMVA Press (2016)
Google Scholar
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep learning for person re-identification: a survey and outlook. arXiv preprint arXiv:2001.04193 (2020)
Winder, S., Brown, M.: Learning local image descriptors. In: CVPR (2007)
Google Scholar
Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: ICCV (2019)
Google Scholar
Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. arXiv preprint arXiv:1802.08122 (2018)
Song, C., Huang, Y., Ouyang, W., Wang, L.: Mask-guided contrastive attention model for person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Keller, M., Chen, Z., Maffra, F., Schmuck, P., Chli, M.: Learning deep descriptors with scale-aware triplet networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Google Scholar
Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: SOSNet: second order similarity regularization for local descriptor learning. In: CVPR (2019)
Google Scholar
Zhang, L., Rusinkiewicz, S.: Learning local descriptors with a CDF-based dynamic soft margin. In: International Conference on Computer Vision (ICCV) (2019)
Google Scholar
Zhang, X.Y., Zhang, L., Zheng, Z.Y., Liu, Y., Bian, J.W., Cheng, M.M.: AdaSample: adaptive sampling of hard positives for descriptor learning. arXiv preprint arXiv:1911.12110 (2019)
Zhang, X., Yu, F.X., Kumar, S., Chang, S.F.: Learning spread-out local feature descriptors. arXiv preprint arXiv:1708.06320 (2017)
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE (2017)
Google Scholar
Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: GCNet: non-local networks meet squeeze-excitation networks and beyond. arXiv preprint arXiv:1904.11492 (2019)
Chen, K., et al.: MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575 (2014)
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. arXiv preprint arXiv:1904.02701 (2019)
Yu, F., et al.: BDD100K: a diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 (2018)
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms (2008)
Google Scholar
Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: learning local features from images. arXiv preprint arXiv:1805.09662 (2018)
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. arXiv preprint arXiv:1905.03561 (2019)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, D.M. (eds.) AISTATS. Volume 9 of JMLR Proceedings, pp. 249–256. JMLR.org (2010)
Google Scholar
Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Google Scholar
Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2, 83–97 (1955)
Article MathSciNet Google Scholar
Kalman, R.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82, 35–45 (1960)
Article MathSciNet Google Scholar
Welch, G., Bishop, G.: An Introduction to the Kalman filter. University of North Carolina at Chapel Hill, Chapel Hill (1995)
Google Scholar
Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3029–3037 (2015)
Google Scholar
Pang, J., Qiu, L., Chen, H., Li, Q., Darrell, T., Yu, F.: Quasi-dense instance similarity learning. arXiv:2006.06664 (2020)
Chang, X., Hospedales, T.M., Xiang, T.: Multi-level factorisation net for person re-identification. arXiv preprint arXiv:1803.09132 (2018)
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: IEEE International Conference on Computer Vision (2015)
Google Scholar
Dendorfer, P., et al.: CVPR19 tracking and detection challenge: how crowded can it get? arXiv preprint arXiv:1906.04567 (2019)
Bergmann, P., Meinhardt, T., Leal-Taixé, L.: Tracking without bells and whistles. arXiv preprint arXiv:1903.05625 (2019)
Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: International Workshop on Traffic and Street Surveillance for Safety and Security at IEEE AVSS 2017, Lecce, Italy (2017)
Google Scholar
Yoon, Y., Kim, D.Y., Yoon, K., Song, Y., Jeon, M.: Online multiple pedestrian tracking using deep temporal appearance matching association. arXiv preprint arXiv:1907.00831 (2019)
Geusebroek, J.M., Burghouts, G.J., Smeulders, A.W.M.: ALOI: Amsterdam library of object images. Int. J. Comput. Vis. 61(1), 103–112 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Samsung R&D Institute Ukraine (SRK), 57 L’va Tolstogo Street, Kyiv, 01032, Ukraine
Dmytro Mykheievskyi, Dmytro Borysenko & Viktor Porokhonskyy

Authors

Dmytro Mykheievskyi
View author publications
You can also search for this author in PubMed Google Scholar
Dmytro Borysenko
View author publications
You can also search for this author in PubMed Google Scholar
Viktor Porokhonskyy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Viktor Porokhonskyy .

Editor information

Editors and Affiliations

Waseda University, Tokyo, Japan
Hiroshi Ishikawa
Institute of Automation of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Czech Technical University in Prague, Prague, Czech Republic
Tomas Pajdla
University of Pennsylvania, Philadelphia, PA, USA
Jianbo Shi

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3342 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mykheievskyi, D., Borysenko, D., Porokhonskyy, V. (2021). Learning Local Feature Descriptors for Multiple Object Tracking. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12623. Springer, Cham. https://doi.org/10.1007/978-3-030-69532-3_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-69532-3_34
Published: 27 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69531-6
Online ISBN: 978-3-030-69532-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Local Feature Descriptors for Multiple Object Tracking

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Local many-to-many matching via ROI feature decomposition for multi-object tracking

Self-supervised multi-object tracking based on metric learning

One-Stage Object Detection and Feature Embedding Network for Multiple Object Tracking

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 3342 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Learning Local Feature Descriptors for Multiple Object Tracking

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Local many-to-many matching via ROI feature decomposition for multi-object tracking

Self-supervised multi-object tracking based on metric learning

One-Stage Object Detection and Feature Embedding Network for Multiple Object Tracking

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 3342 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation