Abstract
The present study aims at learning class-agnostic embedding, which is suitable for Multiple Object Tracking (MOT). We demonstrate that the learning of local feature descriptors could provide a sufficient level of generalization. Proposed embedding function exhibits on-par performance with its dedicated person re-identification counterparts in their target domain and outperforms them in others. Through its utilization, our solutions achieve state-of-the-art performance in a number of MOT benchmarks, which includes CVPR’19 Tracking Challenge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Grant, J.M., Flynn, P.J.: Crowd scene understanding from video: a survey. ACM Trans. Multimedia Comput. Commun. Appl. 13, 1–23 (2017)
Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 215–230. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_16
Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual surveillance of object motion and behaviors. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 34, 334–352 (2004)
Geiger, A., Lenz, P., Urtasun, R.: Are we ready for autonomous driving? The KITTI vision benchmark suite. In: Conference on Computer Vision and Pattern Recognition (CVPR) (2012)
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)
Wen, L., et al.: UA-DETRAC: a new benchmark and protocol for multi-object detection and tracking. arXiv preprint arXiv:1511.04136 (2015)
Dollár, P., Appel, R., Belongie, S., Perona, P.: Fast feature pyramids for object detection. IEEE Trans. Pattern Anal. Mach. Intell. 36, 1532–1545 (2014)
He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. arXiv preprint arXiv:1703.06870 (2017)
Li, Y., Qi, H., Dai, J., Ji, X., Wei, Y.: Fully convolutional instance-aware semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Liu, W., et al.: SSD: single shot multibox detector. arXiv preprint arXiv:1512.02325 (2015)
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497 (2015)
Tan, M., Pang, R., Le, Q.: EfficientDet: scalable and efficient object detection. arXiv preprint arXiv:1911.09070 (2019)
Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. arXiv preprint arXiv:1612.03144 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015)
Xie, S., Girshick, R.B., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. arXiv preprint arXiv:1611.05431 (2016)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: high quality object detection and instance segmentation. arXiv preprint arXiv:1906.09756 (2019)
Dai, J., et al.: Deformable convolutional networks. arXiv preprint arXiv:1703.06211 (2017)
Zhu, X., Hu, H., Lin, S., Dai, J.: Deformable ConvNets V2: More deformable, better results. arXiv preprint arXiv:1811.11168 (2018)
Luo, W., et al.: Multiple object tracking: A literature review. arXiv preprint arXiv:1409.7618v4 (2017)
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3464–3468 (2016)
Yu, F., Li, W., Li, Q., Liu, Yu., Shi, X., Yan, J.: POI: multiple object tracking with high performance detection and appearance feature. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 36–42. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_3
Geiger, A., Lenz, P., Stiller, C., Urtasun, R.: Vision meets robotics: the KITTI dataset. Int. J. Robot. Res. (IJRR) 32, 1231–1237 (2013)
Gray, D., Brennan, S., Tao, H.: Evaluating appearance models for recognition, reacquisition, and tracking. In: IEEE International Workshop on Performance Evaluation for Tracking and Surveillance, Rio de Janeiro (2007)
Zheng, L., Zhang, H., Sun, S., Chandraker, M., Tian, Q.: Person re-identification in the wild. arXiv preprint arXiv:1604.02531 (2016)
Wang, Z., Zheng, L., Liu, Y., Wang, S.: Towards real-time multi-object tracking. arXiv preprint arXiv:1909.12605v1 (2019)
Balntas, V., Lenc, K., Vedaldi, A., Mikolajczyk, K.: HPatches: a benchmark and evaluation of handcrafted and learned local descriptors. In: CVPR (2017)
Mishchuk, A., Mishkin, D., Radenovic, F., Matas, J.: Working hard to know your neighbor’s margins: Local descriptor learning loss. arXiv preprint arXiv:1705.10872 (2017)
Tian, Y., Fan, B., Wu, F.: L2-Net: Deep learning of discriminative patch descriptor in Euclidean space. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6128–6136 (2017)
Balntas, V., Edgar Riba, D.P., Mikolajczyk, K.: Learning local feature descriptors with triplets and shallow convolutional neural networks. In: Richard, C., Wilson, E.R.H., Smith, W.A.P. (eds.) Proceedings of the British Machine Vision Conference (BMVC), pp. 119.1–119.11. BMVA Press (2016)
Ye, M., Shen, J., Lin, G., Xiang, T., Shao, L., Hoi, S.C.H.: Deep learning for person re-identification: a survey and outlook. arXiv preprint arXiv:2001.04193 (2020)
Winder, S., Brown, M.: Learning local image descriptors. In: CVPR (2007)
Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification. In: ICCV (2019)
Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-identification. arXiv preprint arXiv:1802.08122 (2018)
Song, C., Huang, Y., Ouyang, W., Wang, L.: Mask-guided contrastive attention model for person re-identification. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Keller, M., Chen, Z., Maffra, F., Schmuck, P., Chli, M.: Learning deep descriptors with scale-aware triplet networks. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
Tian, Y., Yu, X., Fan, B., Wu, F., Heijnen, H., Balntas, V.: SOSNet: second order similarity regularization for local descriptor learning. In: CVPR (2019)
Zhang, L., Rusinkiewicz, S.: Learning local descriptors with a CDF-based dynamic soft margin. In: International Conference on Computer Vision (ICCV) (2019)
Zhang, X.Y., Zhang, L., Zheng, Z.Y., Liu, Y., Bian, J.W., Cheng, M.M.: AdaSample: adaptive sampling of hard positives for descriptor learning. arXiv preprint arXiv:1911.12110 (2019)
Zhang, X., Yu, F.X., Kumar, S., Chang, S.F.: Learning spread-out local feature descriptors. arXiv preprint arXiv:1708.06320 (2017)
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE (2017)
Cao, Y., Xu, J., Lin, S., Wei, F., Hu, H.: GCNet: non-local networks meet squeeze-excitation networks and beyond. arXiv preprint arXiv:1904.11492 (2019)
Chen, K., et al.: MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575 (2014)
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. arXiv preprint arXiv:1904.02701 (2019)
Yu, F., et al.: BDD100K: a diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 (2018)
Vedaldi, A., Fulkerson, B.: VLFeat: an open and portable library of computer vision algorithms (2008)
Ono, Y., Trulls, E., Fua, P., Yi, K.M.: LF-Net: learning local features from images. arXiv preprint arXiv:1805.09662 (2018)
Dusmanu, M., et al.: D2-Net: a trainable CNN for joint detection and description of local features. arXiv preprint arXiv:1905.03561 (2019)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, D.M. (eds.) AISTATS. Volume 9 of JMLR Proceedings, pp. 249–256. JMLR.org (2010)
Dollár, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: a benchmark. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2009)
Xiao, T., Li, S., Wang, B., Lin, L., Wang, X.: Joint detection and identification feature learning for person search. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Res. Logist. Q. 2, 83–97 (1955)
Kalman, R.: A new approach to linear filtering and prediction problems. J. Basic Eng. 82, 35–45 (1960)
Welch, G., Bishop, G.: An Introduction to the Kalman filter. University of North Carolina at Chapel Hill, Chapel Hill (1995)
Choi, W.: Near-online multi-target tracking with aggregated local flow descriptor. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3029–3037 (2015)
Pang, J., Qiu, L., Chen, H., Li, Q., Darrell, T., Yu, F.: Quasi-dense instance similarity learning. arXiv:2006.06664 (2020)
Chang, X., Hospedales, T.M., Xiang, T.: Multi-level factorisation net for person re-identification. arXiv preprint arXiv:1803.09132 (2018)
Zheng, L., Shen, L., Tian, L., Wang, S., Wang, J., Tian, Q.: Scalable person re-identification: a benchmark. In: IEEE International Conference on Computer Vision (2015)
Dendorfer, P., et al.: CVPR19 tracking and detection challenge: how crowded can it get? arXiv preprint arXiv:1906.04567 (2019)
Bergmann, P., Meinhardt, T., Leal-Taixé, L.: Tracking without bells and whistles. arXiv preprint arXiv:1903.05625 (2019)
Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: International Workshop on Traffic and Street Surveillance for Safety and Security at IEEE AVSS 2017, Lecce, Italy (2017)
Yoon, Y., Kim, D.Y., Yoon, K., Song, Y., Jeon, M.: Online multiple pedestrian tracking using deep temporal appearance matching association. arXiv preprint arXiv:1907.00831 (2019)
Geusebroek, J.M., Burghouts, G.J., Smeulders, A.W.M.: ALOI: Amsterdam library of object images. Int. J. Comput. Vis. 61(1), 103–112 (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Mykheievskyi, D., Borysenko, D., Porokhonskyy, V. (2021). Learning Local Feature Descriptors for Multiple Object Tracking. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12623. Springer, Cham. https://doi.org/10.1007/978-3-030-69532-3_34
Download citation
DOI: https://doi.org/10.1007/978-3-030-69532-3_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69531-6
Online ISBN: 978-3-030-69532-3
eBook Packages: Computer ScienceComputer Science (R0)