Abstract
Multi-camera tracking (MCT) aims to track people across multiple cameras. To match tracks across cameras, existing MCT solutions primarily rely on Person Re-Identification (Re-ID) that compares people’s visual appearance. However, this approach fails to match people with very similar appearances, such as people wearing uniforms in workplaces. In this paper, we propose a method based on spatio-temporal association (STA) to overcome the limitations of visual-based Re-ID in the problem of similar-appearance MCT. Our proposed method operates effectively when there are (even small) overlaps between cameras and a moderate number (i.e., maximum from 4 to 7 individuals) of people moving closely to each other in each overlapping region. We evaluate our proposed method on our prepared private dataset and the PETS2009 public one. The experimental results show that our proposed method matches people appearing in multiple cameras correctly and outperforms the MCT based on visual Re-ID method in case people have similar appearances, and it works well even if the overlapping region is small. To further strengthen the proposed method, we perform error analysis and introduce three extensions to mitigate the problems of missing detections and inaccurate footpoint interpolation. These three extensions further improve our proposed baseline method accuracy of the matching at frame level.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Han, X., et al.: MMPTRACK: large-scale densely annotated multi-camera multiple people tracking benchmark. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pp. 4860–4869 (2023)
Olagoke, A.S., Ibrahim, H., Teoh, S.S.: Literature survey on multi-camera system and its application. IEEE Access 8, 172,892–172,922 (2020)
Khule, S., Jaybhay, S., Metkari, P., Balkhande, B.: Smart surveillance system real-time multi-person multi-camera tracking at the edge (2022)
Oğul, B.B.: A learning-based method for person re-identification. Master’s thesis, Middle East Technical University (2013)
Anjum, N., Cavallaro, A.: Trajectory association and fusion across partially overlapping cameras. In: 2009 Sixth IEEE International Conference on Advanced Video and Signal Based Surveillance, pp. 201–206 (2009)
Zhang, X., Izquierdo, E.: Real-time multi-target multi-camera tracking with spatial-temporal information. In: 2019 IEEE Visual Communications and Image Processing (VCIP), pp. 1–4 (2019). https://doi.org/10.1109/VCIP47243.2019.8965845
Chen, A.T.Y., Biglari-Abhari, M., Wang, K.I.K.: Fusing appearance and spatio-temporal models for person re-identification and tracking. J. Imaging 6, 27 (2020)
Jang, J., Seon, M.J., Choi, J.: Lightweight indoor multi-object tracking in overlapping FOV multi-camera environments. Sensors (Basel, Switzerland) 22, 5267 (2022)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional siamese networks for object tracking (2016)
Tao, R., Gavves, E., Smeulders, A.W.M.: Siamese instance search for tracking (2016)
Zhou, K., Yang, Y., Cavallaro, A., Xiang, T.: Omni-scale feature learning for person re-identification (2019)
Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-identification (2017)
Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding (2015)
Dong, X., Shen, J.: Triplet loss in siamese network for object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 472–488. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_28
Hsu, H.M., Huang, T.W., Wang, G., Cai, J., Lei, Z., Hwang, J.N.: Multi-camera tracking of vehicles based on deep features re-id and trajectory-based camera link models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2019)
Wu, C.W., Zhong, M.T., Tsao, Y., Yang, S.W., Chen, Y.K., Chien, S.Y.: Track-clustering error evaluation for track-based multi-camera tracking system employing human re-identification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1416–1424 (2017). https://doi.org/10.1109/CVPRW.2017.184
Hartley, R., Zisserman, A.: Multiple view geometry in computer vision, chap. 13. In: Multiple View Geometry in Computer Vision. Cambridge University Press (2004)
Munkres, J.: Algorithms for the assignment and transportation problems. J. Soc. Ind. Appl. Math. 5(1), 32–38 (1957). https://doi.org/10.1137/0105003
Ferryman, J., Shahrokni, A.: PETS2009: dataset and challenge. In: 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance, pp. 1–6 (2009). https://doi.org/10.1109/PETS-WINTER.2009.5399556
Chavdarova, T., et al.: The WILDTRACK multi-camera person dataset. arXiv preprint arXiv:1707.09299 (2017)
Xu, Y., Lin, L., Zheng, W.S., Liu, X.: Human re-identification by matching compositional template with cluster sampling. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2013)
Han, X., et al.: MMPTRACK: large-scale densely annotated multi-camera multiple people tracking benchmark (2021)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)
Zhang, Y., et al.: ByteTrack: multi-object tracking by associating every detection box (2022)
WongKinYiu: Implementation of paper - YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. https://github.com/WongKinYiu/yolov7
Luiten, J., et al.: HOTA: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 129(2), 548–578 (2020). https://doi.org/10.1007/s11263-020-01375-2
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004). https://doi.org/10.1023/B:VISI.0000029664.99615.94
Garrido-Jurado, S., Muñoz-Salinas, R., Madrid-Cuevas, F.J., Marín-Jiménez, M.: Automatic generation and detection of highly reliable fiducial markers under occlusion. Pattern Recogn. 47(6), 2280–2292 (2014). https://doi.org/10.1016/j.patcog.2014.01.005
McNally, W., Vats, K., Wong, A., McPhee, J.: Rethinking keypoint representations: modeling keypoints and poses as objects for multi-person human pose estimation. arXiv preprint arXiv:2111.08557 (2021)
Maji, D., Nagori, S., Mathew, M., Poddar, D.: YOLO-pose: enhancing YOLO for multi person pose estimation using object keypoint similarity loss (2022)
Acknowledgment
The support for this research work from AWL, Inc. is gratefully appreciated. We also thank our colleagues in AWL Vietnam for their helpful support and discussion.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tran, L.Q., Pham, M.C., Nguyen, Q.N. (2024). Multi-camera Tracking Based on Spatio-Temporal Association in Small Overlapping Regions. In: Arai, K. (eds) Intelligent Computing. SAI 2024. Lecture Notes in Networks and Systems, vol 1018. Springer, Cham. https://doi.org/10.1007/978-3-031-62269-4_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-62269-4_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-62268-7
Online ISBN: 978-3-031-62269-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)