Learning Local Feature Descriptors for Multiple Object Tracking | SpringerLink
Skip to main content

Learning Local Feature Descriptors for Multiple Object Tracking

  • Conference paper
  • First Online:
Computer Vision – ACCV 2020 (ACCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12623))

Included in the following conference series:

  • 970 Accesses

Abstract

The present study aims at learning class-agnostic embedding, which is suitable for Multiple Object Tracking (MOT). We demonstrate that the learning of local feature descriptors could provide a sufficient level of generalization. Proposed embedding function exhibits on-par performance with its dedicated person re-identification counterparts in their target domain and outperforms them in others. Through its utilization, our solutions achieve state-of-the-art performance in a number of MOT benchmarks, which includes CVPR’19 Tracking Challenge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 11439
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 14299
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Corresponding VLFeat [46] implementations were employed similarly to Ref. [26].

  2. 2.

    See the processing time comparison in the supplementary material.

References

  1. Grant, J.M., Flynn, P.J.: Crowd scene understanding from video: a survey. ACM Trans. Multimedia Comput. Commun. Appl. 13, 1–23 (2017)

    Article  Google Scholar 

  2. Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 215–230. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_16

    Chapter  Google Scholar 

  3. Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 34, 334–352 (2004)

    Google Scholar 

  4. Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

    Google Scholar 

  5. Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)

  6. Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. arXiv preprint arXiv:1511.04136 (2015)

  7. Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1532–1545 (2014)

    Article  Google Scholar 

  8. He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. arXiv preprint arXiv:1703.06870 (2017)

  9. Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  10. Liu, W., et al.: SSD: single shot multibox detector. arXiv preprint arXiv:1512.02325 (2015)

  11. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)

  12. Tan, M., Pang, R., Le, Q.: EfficientDet: scalable and efficient object detection. arXiv preprint arXiv:1911.09070 (2019)

  13. Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. arXiv preprint arXiv:1612.03144 (2016)

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)

  15. Xie, S., Girshick, R.B., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:1611.05431 (2016)

  16. Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. arXiv preprint arXiv:1906.09756 (2019)

  17. Dai, J., et al.: Deformable convolutional networks. arXiv preprint arXiv:1703.06211 (2017)

  18. Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable ConvNets V2: More deformable, better results. arXiv preprint arXiv:1811.11168 (2018)

  19. Luo, W., et al.: Multiple object tracking: A literature review. arXiv preprint arXiv:1409.7618v4 (2017)

  20. Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016)

    Google Scholar 

  21. Yu, F., Li, W., Li, Q., Liu, Yu., Shi, X., Yan, J.: POI: multiple object tracking with high performance detection and appearance feature. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 36–42. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_3

    Chapter  Google Scholar 

  22. Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. (IJRR) 32, 1231–1237 (2013)

    Article  Google Scholar 

  23. Gray, D., Brennan, S., Tao, H.: Evaluating appearance models for recognition, reacquisition, and tracking. In: IEEE International Workshop on Performance Evaluation for Tracking and Surveillance, Rio de Janeiro (2007)

    Google Scholar 

  24. Zheng, L., Zhang, H., Sun, S., Chandraker, M., Tian, Q.: Person re-identification in the wild. arXiv preprint arXiv:1604.02531 (2016)

  25. Wang, Z., Zheng, L., Liu, Y., Wang, S.: Towards real-time multi-object tracking. arXiv preprint arXiv:1909.12605v1 (2019)

  26. Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: CVPR (2017)

    Google Scholar 

  27. Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: Local descriptor learning loss. arXiv preprint arXiv:1705.10872 (2017)

  28. Tian, Y., Fan, B., Wu, F.: L2-Net: Deep learning of discriminative patch descriptor in Euclidean space. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6128–6136 (2017)

    Google Scholar 

  29. Balntas, V., Edgar Riba, D.P., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: Richard, C., Wilson, E.R.H., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference (BMVC), pp. 119.1–119.11. BMVA Press (2016)

    Google Scholar 

  30. Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep learning for person re-identification: a survey and outlook. arXiv preprint arXiv:2001.04193 (2020)

  31. Winder, S., Brown, M.: Learning local image descriptors. In: CVPR (2007)

    Google Scholar 

  32. Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: ICCV (2019)

    Google Scholar 

  33. Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. arXiv preprint arXiv:1802.08122 (2018)

  34. Song, C., Huang, Y., Ouyang, W., Wang, L.: Mask-guided contrastive attention model for person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  35. Keller, M., Chen, Z., Maffra, F., Schmuck, P., Chli, M.: Learning deep descriptors with scale-aware triplet networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

    Google Scholar 

  36. Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: SOSNet: second order similarity regularization for local descriptor learning. In: CVPR (2019)

    Google Scholar 

  37. Zhang, L., Rusinkiewicz, S.: Learning local descriptors with a CDF-based dynamic soft margin. In: International Conference on Computer Vision (ICCV) (2019)

    Google Scholar 

  38. Zhang, X.Y., Zhang, L., Zheng, Z.Y., Liu, Y., Bian, J.W., Cheng, M.M.: AdaSample: adaptive sampling of hard positives for descriptor learning. arXiv preprint arXiv:1911.12110 (2019)

  39. Zhang, X., Yu, F.X., Kumar, S., Chang, S.F.: Learning spread-out local feature descriptors. arXiv preprint arXiv:1708.06320 (2017)

  40. Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE (2017)

    Google Scholar 

  41. Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: GCNet: non-local networks meet squeeze-excitation networks and beyond. arXiv preprint arXiv:1904.11492 (2019)

  42. Chen, K., et al.: MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)

  43. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575 (2014)

  44. Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. arXiv preprint arXiv:1904.02701 (2019)

  45. Yu, F., et al.: BDD100K: a diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 (2018)

  46. Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms (2008)

    Google Scholar 

  47. Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: learning local features from images. arXiv preprint arXiv:1805.09662 (2018)

  48. Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. arXiv preprint arXiv:1905.03561 (2019)

  49. Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, D.M. (eds.) AISTATS. Volume 9 of JMLR Proceedings, pp. 249–256. JMLR.org (2010)

    Google Scholar 

  50. Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)

    Google Scholar 

  51. Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

    Google Scholar 

  52. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2, 83–97 (1955)

    Article  MathSciNet  Google Scholar 

  53. Kalman, R.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82, 35–45 (1960)

    Article  MathSciNet  Google Scholar 

  54. Welch, G., Bishop, G.: An Introduction to the Kalman filter. University of North Carolina at Chapel Hill, Chapel Hill (1995)

    Google Scholar 

  55. Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3029–3037 (2015)

    Google Scholar 

  56. Pang, J., Qiu, L., Chen, H., Li, Q., Darrell, T., Yu, F.: Quasi-dense instance similarity learning. arXiv:2006.06664 (2020)

  57. Chang, X., Hospedales, T.M., Xiang, T.: Multi-level factorisation net for person re-identification. arXiv preprint arXiv:1803.09132 (2018)

  58. Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: IEEE International Conference on Computer Vision (2015)

    Google Scholar 

  59. Dendorfer, P., et al.: CVPR19 tracking and detection challenge: how crowded can it get? arXiv preprint arXiv:1906.04567 (2019)

  60. Bergmann, P., Meinhardt, T., Leal-Taixé, L.: Tracking without bells and whistles. arXiv preprint arXiv:1903.05625 (2019)

  61. Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: International Workshop on Traffic and Street Surveillance for Safety and Security at IEEE AVSS 2017, Lecce, Italy (2017)

    Google Scholar 

  62. Yoon, Y., Kim, D.Y., Yoon, K., Song, Y., Jeon, M.: Online multiple pedestrian tracking using deep temporal appearance matching association. arXiv preprint arXiv:1907.00831 (2019)

  63. Geusebroek, J.M., Burghouts, G.J., Smeulders, A.W.M.: ALOI: Amsterdam library of object images. Int. J. Comput. Vis. 61(1), 103–112 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Viktor Porokhonskyy .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 3342 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mykheievskyi, D., Borysenko, D., Porokhonskyy, V. (2021). Learning Local Feature Descriptors for Multiple Object Tracking. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12623. Springer, Cham. https://doi.org/10.1007/978-3-030-69532-3_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-69532-3_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-69531-6

  • Online ISBN: 978-3-030-69532-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics