Adversarial Semi-supervised Multi-domain Tracking

Meshgi, Kourosh; Mirzaei, Maryam Sadat

doi:10.1007/978-3-030-69532-3_37

Kourosh Meshgi¹² &
Maryam Sadat Mirzaei¹²

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12623))

Included in the following conference series:

Asian Conference on Computer Vision

939 Accesses

Abstract

Neural networks for multi-domain learning empowers an effective combination of information from different domains by sharing and co-learning the parameters. In visual tracking, the emerging features in shared layers of a multi-domain tracker, trained on various sequences, are crucial for tracking in unseen videos. Yet, in a fully shared architecture, some of the emerging features are useful only in a specific domain, reducing the generalization of the learned feature representation. We propose a semi-supervised learning scheme to separate domain-invariant and domain-specific features using adversarial learning, to encourage mutual exclusion between them, and to leverage self-supervised learning for enhancing the shared features using the unlabeled reservoir. By employing these features and training dedicated layers for each sequence, we build a tracker that performs exceptionally on different types of videos.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11439; Price includes VAT (Japan)

Softcover Book: JPY 14299; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Robust visual tracking using very deep generative model

Article Open access 11 January 2023

Unsupervised Deep Representation Learning for Real-Time Tracking

Article 21 September 2020

SSL-MOT: self-supervised learning based multi-object tracking

Article 22 April 2022

References

Caruana, R.: Multitask learning: a knowledge-based source of inductive bias (1993)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: ICML2008, pp. 160–167. ACM (2008)
Google Scholar
Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: an overview. In: ICASSP 2013, pp. 8599–8603. IEEE (2013)
Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV 2015, pp. 1440–1448 (2015)
Google Scholar
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR 2016
Google Scholar
Liu, P., Qiu, X., Huang, X.: Adversarial multi-task learning for text classification. In: ACL 2017, pp. 1–10 (2017)
Google Scholar
Roshan Zamir, A., Sax, A., Shen, W., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: CVPR 2018, pp. 3712–3722 (2018)
Google Scholar
Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. PAMI 35, 1798–1828 (2013)
Article Google Scholar
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: ICML 2015, pp. 1180–1189 (2015)
Google Scholar
Feichtenhofer, C., Pinz, A., Wildes, R.: Spatiotemporal residual networks for video action recognition. In: NIPS 2016, pp. 3468–3476 (2016)
Google Scholar
Sebag, A.S., Heinrich, L., Schoenauer, M., Sebag, M., Wu, L., Altschuler, S.: Multi-domain adversarial learning. In: ICLR 2019 (2019)
Google Scholar
Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: NIPS, pp. 809–817 (2013)
Google Scholar
Zhou, X., Xie, L., Zhang, P., Zhang, Y.: An ensemble of deep neural networks for object tracking. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 843–847. IEEE (2014)
Google Scholar
Fan, J., Xu, W., Wu, Y., Gong, Y.: Human tracking using convolutional neural networks. IEEE Trans. Neural Networks 21, 1610–1623 (2010)
Article Google Scholar
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: ICCV 2015, pp. 3074–3082 (2015)
Google Scholar
Zhang, K., Liu, Q., Wu, Y., Yang, M.: Robust visual tracking via convolutional networks without training. IEEE TIP 25, 1779–1792 (2016)
MathSciNet MATH Google Scholar
Zhu, Z., Huang, G., Zou, W., Du, D., Huang, C.: UCT: learning unified convolutional networks for real-time visual tracking. In: ICCVw, pp. 1973–1982 (2017)
Google Scholar
Chen, K., Tao, W.: Once for all: a two-flow convolutional neural network for visual tracking. IEEE CSVT 28, 3377–3386 (2018)
Google Scholar
Wang, N., Li, S., Gupta, A., Yeung, D.Y.: Transferring rich feature hierarchies for robust visual tracking. arXiv (2015)
Google Scholar
Drayer, B., Brox, T.: Object detection, tracking, and motion segmentation for object-level video segmentation. arXiv preprint arXiv:1608.03066 (2016)
Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: CVPR, pp. 1420–1429 (2016)
Google Scholar
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Chapter Google Scholar
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: CVPR 2018 (2018)
Google Scholar
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: CVPR 2019 (2019)
Google Scholar
Wang, X., Li, C., Luo, B., Tang, J.: SINT++: Robust visual tracking via adversarial positive instance generation. In: CVPR 2018 (2018)
Google Scholar
Song, Y., et al.: VITAL: visual tracking via adversarial learning. In: CVPR (2018)
Google Scholar
Varol, G., Laptev, I., Schmid, C.: Long-term temporal convolutions for action recognition. PAMI 40, 1510–1517 (2018)
Article Google Scholar
Gkioxari, G., Malik, J.: Finding action tubes. In: CVPR, pp. 759–768 (2015)
Google Scholar
Chao, Y.W., Vijayanarasimhan, S., Seybold, B., Ross, D.A., Deng, J., Sukthankar, R.: Rethinking the faster R-CNN architecture for temporal action localization. In: CVPR 2018, pp. 1130–1139 (2018)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)
Google Scholar
Zhu, Z., Wu, W., Zou, W., Yan, J.: End-to-end flow correlation tracking with spatial-temporal attention. In: CVPR, vol. 42, p. 20 (2017)
Google Scholar
Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: ICCV 2015, pp. 2758–2766 (2015)
Google Scholar
Feichtenhofer, C., Pinz, A., Wildes, R.P., Zisserman, A.: What have we learned from deep representations for action recognition? Connections 19, 29 (2018)
Google Scholar
Gladh, S., Danelljan, M., Khan, F.S., Felsberg, M.: Deep motion features for visual tracking. In: ICPR, pp. 1243–1248. IEEE (2016)
Google Scholar
Duong, L., Cohn, T., Bird, S., Cook, P.: Low resource dependency parsing: cross-lingual parameter sharing in a neural network parser. In: ACL-IJCNLP 2015, pp. 845–850 (2015)
Google Scholar
Yang, Y., Hospedales, T.M.: Trace norm regularised deep multi-task learning. In: ICLR 2017 (2017)
Google Scholar
Søgaard, A., Goldberg, Y.: Deep multi-task learning with low level tasks supervised at lower layers. In: ACL 2016, pp. 231–235 (2016)
Google Scholar
Hashimoto, K., Tsuruoka, Y., Socher, R., et al.: A joint many-task model: growing a neural network for multiple NLP tasks. In: EMNLP 2017, pp. 1923–1933 (2017)
Google Scholar
Sanh, V., Wolf, T., Ruder, S.: A hierarchical multi-task approach for learning embeddings from semantic tasks. In: AAAI 2019, vol. 33, pp. 6949–6956 (2019)
Google Scholar
Liu, S., Pan, S.J., Ho, Q.: Distributed multi-task relationship learning. In: ACM SIGKDD 2017, pp. 937–946. ACM (2017)
Google Scholar
Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR 2018, pp. 7482–7491 (2018)
Google Scholar
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)
Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Erhan, D.: Domain separation networks. In: NIPS 2016, pp. 343–351 (2016)
Google Scholar
Abu-Mostafa, Y.S.: Learning from hints in neural networks. J. Complex. 6, 192–198 (1990)
Article MathSciNet Google Scholar
Yu, J., Jiang, J.: Learning sentence embeddings with auxiliary tasks for cross-domain sentiment classification. In: EMNLP 2016, pp. 236–246 (2016)
Google Scholar
Caruana, R.: Multitask learning. Machine Learn. 28, 41–75 (1997)
Article Google Scholar
Liu, P., Qiu, X., Huang, X.: Deep multi-task learning with shared memory for text classification. In: EMNLP 2016 (2016)
Google Scholar
Rei, M.: Semi-supervised multitask learning for sequence labeling. In: ACL 2017, pp. 2121–2130 (2017)
Google Scholar
Bingel, J., Søgaard, A.: Identifying beneficial task relations for multi-task learning in deep neural networks. In: ACL 2015, pp. 164–169 (2017)
Google Scholar
Doersch, C., Zisserman, A.: Multi-task self-supervised visual learning. In: ICCV 2017, pp. 2051–2060 (2017)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar
Kolesnikov, A., Zhai, X., Beyer, L.: Revisiting self-supervised visual representation learning. arXiv preprint arXiv:1901.09005 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40
Chapter Google Scholar
Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: ICCV 2015, pp. 1422–1430 (2015)
Google Scholar
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728 (2018)
Wei, D., Lim, J.J., Zisserman, A., Freeman, W.T.: Learning and using the arrow of time. In: CVPR 2018, pp. 8052–8060 (2018)
Google Scholar
Misra, I., Zitnick, C.L., Hebert, M.: Shuffle and learn: unsupervised learning using temporal order verification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 527–544. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_32
Chapter Google Scholar
Vondrick, C., Shrivastava, A., Fathi, A., Guadarrama, S., Murphy, K.: Tracking emerges by colorizing videos. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 402–419. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_24
Chapter Google Scholar
Li, H., Li, Y., Porikli, F.: Deeptrack: learning discriminative feature representations online for robust visual tracking. IEEE TIP 25, 1834–1848 (2016)
MathSciNet MATH Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS 2014, pp. 2672–2680 (2014)
Google Scholar
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Machine Learn. 79, 151–175 (2010)
Article MathSciNet Google Scholar
Jia, Y., Salzmann, M., Darrell, T.: Factorized latent spaces with structured sparsity. In: NIPS 2010, pp. 982–990 (2010)
Google Scholar
Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: Youtube-Boundingboxes: a large high-precision human-annotated data set for object detection in video. In: CVPR 2017, pp. 5296–5305 (2017)
Google Scholar
Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. PAMI (2019)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR 2014, pp. 580–587 (2014)
Google Scholar
Meshgi, K., Oba, S., Ishii, S.: Efficient diverse ensemble for discriminative co-tracking: supplementary material. In: CVPR (2018)
Google Scholar
Sung, K.K., Poggio, T.: Example-based learning for view-based human face detection. PAMI 20, 39–51 (1998)
Article Google Scholar
Baker, J.E.: Reducing bias and inefficiency in the selection algorithm. In: Proceedings of the Second International Conference on Genetic Algorithms (1987)
Google Scholar
Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: ICML 1994, pp. 148–156 (1994)
Google Scholar
Vedaldi, A., Lenc, K.: MatConvNet: convolutional neural networks for Matlab. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 689–692. ACM (2015)
Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: CVPR 2013, pp. 2411–2418. IEEE (2013)
Google Scholar
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. PAMI 37, 1834–1848 (2015)
Article Google Scholar
Fan, H., et al.: LaSOT: a high-quality benchmark for large-scale single object tracking. In: CVPR 2019 (2019)
Google Scholar
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27
Chapter Google Scholar
Müller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 310–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_19
Chapter Google Scholar
Kristan, M., et al.: The sixth visual object tracking VOT2018 challenge results. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 3–53. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_1
Chapter Google Scholar
Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: ICCV 2019 (2019)
Google Scholar
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ATOM: accurate tracking by overlap maximization. In: CVPR 2019 (2019)
Google Scholar
Zhu, Z., et al.: STResNet\_cf tracker: the deep spatiotemporal features learning for correlation filter based robust visual object tracking. IEEE Access 7, 30142–30156 (2019)
Article Google Scholar
Zhou, Y., et al.: Efficient correlation tracking via center-biased spatial regularization. TIP 27, 6159–6173 (2018)
MathSciNet Google Scholar
Jung, I., Son, J., Baek, M., Han, B.: Real-time MDNet. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 89–104. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_6
Chapter Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR 2016 (2016)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR 2014 (2014)
Google Scholar
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. PAMI 34, 1409–1422 (2012)
Article Google Scholar
Hare, S., Saffari, A., Torr, P.H.: Struck: structured output tracking with kernels. In: ICCV 2011 (2011)
Google Scholar
Gao, J., Ling, H., Hu, W., Xing, J.: Transfer learning based visual tracking with Gaussian processes regression. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 188–203. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_13
Chapter Google Scholar
Zhang, J., Ma, S., Sclaroff, S.: MEEM: Robust tracking via multiple experts using entropy minimization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 188–203. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_13
Chapter Google Scholar
Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.: Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: CVPR 2015
Google Scholar
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.: Staple: complementary learners for real-time tracking. In: CVPR 2016, pp. 1401–1409 (2016)
Google Scholar
Meshgi, K., Oba, S., Ishii, S.: Active discriminative tracking using collective memory. In: MVA 2017
Google Scholar
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: ICCV 2015, pp. 4310–4318 (2015)
Google Scholar
Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 472–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_29
Chapter Google Scholar
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: ICCVw, pp. 58–66 (2015)
Google Scholar
Kiani Galoogahi, H., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In: ICCV 2017
Google Scholar
Li, F., et al.: Learning spatial-temporal regularized correlation filters for visual tracking. In: CVPR 2018 (2018)
Google Scholar
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR (2017)
Google Scholar
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7
Chapter Google Scholar
Wang, X., Yang, R., Sun, T., Luo, B.: Learning target-aware attention for robust tracking with conditional adversarial network. In: BMVC, p. 131 (2019)
Google Scholar
Huang, L., Zhao, X., Huang, K.: GlobalTrack: a simple and strong baseline for long-term tracking. AAAI (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

RIKEN Center for Advanced Intelligence Project (AIP), Tokyo, Japan
Kourosh Meshgi & Maryam Sadat Mirzaei

Authors

Kourosh Meshgi
View author publications
You can also search for this author in PubMed Google Scholar
Maryam Sadat Mirzaei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kourosh Meshgi .

Editor information

Editors and Affiliations

Waseda University, Tokyo, Japan
Hiroshi Ishikawa
Institute of Automation of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Czech Technical University in Prague, Prague, Czech Republic
Tomas Pajdla
University of Pennsylvania, Philadelphia, PA, USA
Jianbo Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Meshgi, K., Mirzaei, M.S. (2021). Adversarial Semi-supervised Multi-domain Tracking. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12623. Springer, Cham. https://doi.org/10.1007/978-3-030-69532-3_37

Download citation

DOI: https://doi.org/10.1007/978-3-030-69532-3_37
Published: 27 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69531-6
Online ISBN: 978-3-030-69532-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics