Abstract
Neural networks for multi-domain learning empowers an effective combination of information from different domains by sharing and co-learning the parameters. In visual tracking, the emerging features in shared layers of a multi-domain tracker, trained on various sequences, are crucial for tracking in unseen videos. Yet, in a fully shared architecture, some of the emerging features are useful only in a specific domain, reducing the generalization of the learned feature representation. We propose a semi-supervised learning scheme to separate domain-invariant and domain-specific features using adversarial learning, to encourage mutual exclusion between them, and to leverage self-supervised learning for enhancing the shared features using the unlabeled reservoir. By employing these features and training dedicated layers for each sequence, we build a tracker that performs exceptionally on different types of videos.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Caruana, R.: Multitask learning: a knowledge-based source of inductive bias (1993)
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: ICML2008, pp. 160–167. ACM (2008)
Deng, L., Hinton, G., Kingsbury, B.: New types of deep neural network learning for speech recognition and related applications: an overview. In: ICASSP 2013, pp. 8599–8603. IEEE (2013)
Girshick, R.: Fast R-CNN. In: ICCV 2015, pp. 1440–1448 (2015)
Nam, H., Han, B.: Learning multi-domain convolutional neural networks for visual tracking. In: CVPR 2016
Liu, P., Qiu, X., Huang, X.: Adversarial multi-task learning for text classification. In: ACL 2017, pp. 1–10 (2017)
Roshan Zamir, A., Sax, A., Shen, W., Guibas, L.J., Malik, J., Savarese, S.: Taskonomy: disentangling task transfer learning. In: CVPR 2018, pp. 3712–3722 (2018)
Ruder, S.: An overview of multi-task learning in deep neural networks. arXiv preprint arXiv:1706.05098 (2017)
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. PAMI 35, 1798–1828 (2013)
Ganin, Y., Lempitsky, V.: Unsupervised domain adaptation by backpropagation. In: ICML 2015, pp. 1180–1189 (2015)
Feichtenhofer, C., Pinz, A., Wildes, R.: Spatiotemporal residual networks for video action recognition. In: NIPS 2016, pp. 3468–3476 (2016)
Sebag, A.S., Heinrich, L., Schoenauer, M., Sebag, M., Wu, L., Altschuler, S.: Multi-domain adversarial learning. In: ICLR 2019 (2019)
Wang, N., Yeung, D.Y.: Learning a deep compact image representation for visual tracking. In: NIPS, pp. 809–817 (2013)
Zhou, X., Xie, L., Zhang, P., Zhang, Y.: An ensemble of deep neural networks for object tracking. In: 2014 IEEE International Conference on Image Processing (ICIP), pp. 843–847. IEEE (2014)
Fan, J., Xu, W., Wu, Y., Gong, Y.: Human tracking using convolutional neural networks. IEEE Trans. Neural Networks 21, 1610–1623 (2010)
Ma, C., Huang, J.B., Yang, X., Yang, M.H.: Hierarchical convolutional features for visual tracking. In: ICCV 2015, pp. 3074–3082 (2015)
Zhang, K., Liu, Q., Wu, Y., Yang, M.: Robust visual tracking via convolutional networks without training. IEEE TIP 25, 1779–1792 (2016)
Zhu, Z., Huang, G., Zou, W., Du, D., Huang, C.: UCT: learning unified convolutional networks for real-time visual tracking. In: ICCVw, pp. 1973–1982 (2017)
Chen, K., Tao, W.: Once for all: a two-flow convolutional neural network for visual tracking. IEEE CSVT 28, 3377–3386 (2018)
Wang, N., Li, S., Gupta, A., Yeung, D.Y.: Transferring rich feature hierarchies for robust visual tracking. arXiv (2015)
Drayer, B., Brox, T.: Object detection, tracking, and motion segmentation for object-level video segmentation. arXiv preprint arXiv:1608.03066 (2016)
Tao, R., Gavves, E., Smeulders, A.W.: Siamese instance search for tracking. In: CVPR, pp. 1420–1429 (2016)
Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.S.: Fully-convolutional Siamese networks for object tracking. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 850–865. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_56
Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: CVPR 2018 (2018)
Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: SiamRPN++: evolution of Siamese visual tracking with very deep networks. In: CVPR 2019 (2019)
Wang, X., Li, C., Luo, B., Tang, J.: SINT++: Robust visual tracking via adversarial positive instance generation. In: CVPR 2018 (2018)
Song, Y., et al.: VITAL: visual tracking via adversarial learning. In: CVPR (2018)
Varol, G., Laptev, I., Schmid, C.: Long-term temporal convolutions for action recognition. PAMI 40, 1510–1517 (2018)
Gkioxari, G., Malik, J.: Finding action tubes. In: CVPR, pp. 759–768 (2015)
Chao, Y.W., Vijayanarasimhan, S., Seybold, B., Ross, D.A., Deng, J., Sukthankar, R.: Rethinking the faster R-CNN architecture for temporal action localization. In: CVPR 2018, pp. 1130–1139 (2018)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS, pp. 568–576 (2014)
Zhu, Z., Wu, W., Zou, W., Yan, J.: End-to-end flow correlation tracking with spatial-temporal attention. In: CVPR, vol. 42, p. 20 (2017)
Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: ICCV 2015, pp. 2758–2766 (2015)
Feichtenhofer, C., Pinz, A., Wildes, R.P., Zisserman, A.: What have we learned from deep representations for action recognition? Connections 19, 29 (2018)
Gladh, S., Danelljan, M., Khan, F.S., Felsberg, M.: Deep motion features for visual tracking. In: ICPR, pp. 1243–1248. IEEE (2016)
Duong, L., Cohn, T., Bird, S., Cook, P.: Low resource dependency parsing: cross-lingual parameter sharing in a neural network parser. In: ACL-IJCNLP 2015, pp. 845–850 (2015)
Yang, Y., Hospedales, T.M.: Trace norm regularised deep multi-task learning. In: ICLR 2017 (2017)
Søgaard, A., Goldberg, Y.: Deep multi-task learning with low level tasks supervised at lower layers. In: ACL 2016, pp. 231–235 (2016)
Hashimoto, K., Tsuruoka, Y., Socher, R., et al.: A joint many-task model: growing a neural network for multiple NLP tasks. In: EMNLP 2017, pp. 1923–1933 (2017)
Sanh, V., Wolf, T., Ruder, S.: A hierarchical multi-task approach for learning embeddings from semantic tasks. In: AAAI 2019, vol. 33, pp. 6949–6956 (2019)
Liu, S., Pan, S.J., Ho, Q.: Distributed multi-task relationship learning. In: ACM SIGKDD 2017, pp. 937–946. ACM (2017)
Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR 2018, pp. 7482–7491 (2018)
Liu, P., Qiu, X., Huang, X.: Recurrent neural network for text classification with multi-task learning. arXiv preprint arXiv:1605.05101 (2016)
Bousmalis, K., Trigeorgis, G., Silberman, N., Krishnan, D., Erhan, D.: Domain separation networks. In: NIPS 2016, pp. 343–351 (2016)
Abu-Mostafa, Y.S.: Learning from hints in neural networks. J. Complex. 6, 192–198 (1990)
Yu, J., Jiang, J.: Learning sentence embeddings with auxiliary tasks for cross-domain sentiment classification. In: EMNLP 2016, pp. 236–246 (2016)
Caruana, R.: Multitask learning. Machine Learn. 28, 41–75 (1997)
Liu, P., Qiu, X., Huang, X.: Deep multi-task learning with shared memory for text classification. In: EMNLP 2016 (2016)
Rei, M.: Semi-supervised multitask learning for sequence labeling. In: ACL 2017, pp. 2121–2130 (2017)
Bingel, J., Søgaard, A.: Identifying beneficial task relations for multi-task learning in deep neural networks. In: ACL 2015, pp. 164–169 (2017)
Doersch, C., Zisserman, A.: Multi-task self-supervised visual learning. In: ICCV 2017, pp. 2051–2060 (2017)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Kolesnikov, A., Zhai, X., Beyer, L.: Revisiting self-supervised visual representation learning. arXiv preprint arXiv:1901.09005 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Zhang, R., Isola, P., Efros, A.A.: Colorful image colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 649–666. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_40
Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: ICCV 2015, pp. 1422–1430 (2015)
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. arXiv preprint arXiv:1803.07728 (2018)
Wei, D., Lim, J.J., Zisserman, A., Freeman, W.T.: Learning and using the arrow of time. In: CVPR 2018, pp. 8052–8060 (2018)
Misra, I., Zitnick, C.L., Hebert, M.: Shuffle and learn: unsupervised learning using temporal order verification. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 527–544. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_32
Vondrick, C., Shrivastava, A., Fathi, A., Guadarrama, S., Murphy, K.: Tracking emerges by colorizing videos. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11217, pp. 402–419. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01261-8_24
Li, H., Li, Y., Porikli, F.: Deeptrack: learning discriminative feature representations online for robust visual tracking. IEEE TIP 25, 1834–1848 (2016)
Goodfellow, I., et al.: Generative adversarial nets. In: NIPS 2014, pp. 2672–2680 (2014)
Ben-David, S., Blitzer, J., Crammer, K., Kulesza, A., Pereira, F., Vaughan, J.W.: A theory of learning from different domains. Machine Learn. 79, 151–175 (2010)
Jia, Y., Salzmann, M., Darrell, T.: Factorized latent spaces with structured sparsity. In: NIPS 2010, pp. 982–990 (2010)
Real, E., Shlens, J., Mazzocchi, S., Pan, X., Vanhoucke, V.: Youtube-Boundingboxes: a large high-precision human-annotated data set for object detection in video. In: CVPR 2017, pp. 5296–5305 (2017)
Huang, L., Zhao, X., Huang, K.: Got-10k: a large high-diversity benchmark for generic object tracking in the wild. PAMI (2019)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR 2014, pp. 580–587 (2014)
Meshgi, K., Oba, S., Ishii, S.: Efficient diverse ensemble for discriminative co-tracking: supplementary material. In: CVPR (2018)
Sung, K.K., Poggio, T.: Example-based learning for view-based human face detection. PAMI 20, 39–51 (1998)
Baker, J.E.: Reducing bias and inefficiency in the selection algorithm. In: Proceedings of the Second International Conference on Genetic Algorithms (1987)
Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: ICML 1994, pp. 148–156 (1994)
Vedaldi, A., Lenc, K.: MatConvNet: convolutional neural networks for Matlab. In: Proceedings of the 23rd ACM International Conference on Multimedia, pp. 689–692. ACM (2015)
Wu, Y., Lim, J., Yang, M.H.: Online object tracking: a benchmark. In: CVPR 2013, pp. 2411–2418. IEEE (2013)
Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. PAMI 37, 1834–1848 (2015)
Fan, H., et al.: LaSOT: a high-quality benchmark for large-scale single object tracking. In: CVPR 2019 (2019)
Mueller, M., Smith, N., Ghanem, B.: A benchmark and simulator for UAV tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 445–461. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_27
Müller, M., Bibi, A., Giancola, S., Alsubaihi, S., Ghanem, B.: TrackingNet: a large-scale dataset and benchmark for object tracking in the wild. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 310–327. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_19
Kristan, M., et al.: The sixth visual object tracking VOT2018 challenge results. In: Leal-Taixé, L., Roth, S. (eds.) ECCV 2018. LNCS, vol. 11129, pp. 3–53. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-11009-3_1
Bhat, G., Danelljan, M., Gool, L.V., Timofte, R.: Learning discriminative model prediction for tracking. In: ICCV 2019 (2019)
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ATOM: accurate tracking by overlap maximization. In: CVPR 2019 (2019)
Zhu, Z., et al.: STResNet\_cf tracker: the deep spatiotemporal features learning for correlation filter based robust visual object tracking. IEEE Access 7, 30142–30156 (2019)
Zhou, Y., et al.: Efficient correlation tracking via center-biased spatial regularization. TIP 27, 6159–6173 (2018)
Jung, I., Son, J., Baek, M., Han, B.: Real-time MDNet. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 89–104. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_6
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR 2016 (2016)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR 2014 (2014)
Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. PAMI 34, 1409–1422 (2012)
Hare, S., Saffari, A., Torr, P.H.: Struck: structured output tracking with kernels. In: ICCV 2011 (2011)
Gao, J., Ling, H., Hu, W., Xing, J.: Transfer learning based visual tracking with Gaussian processes regression. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 188–203. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_13
Zhang, J., Ma, S., Sclaroff, S.: MEEM: Robust tracking via multiple experts using entropy minimization. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 188–203. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_13
Hong, Z., Chen, Z., Wang, C., Mei, X., Prokhorov, D., Tao, D.: Multi-store tracker (muster): a cognitive psychology inspired approach to object tracking. In: CVPR 2015
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., Torr, P.H.: Staple: complementary learners for real-time tracking. In: CVPR 2016, pp. 1401–1409 (2016)
Meshgi, K., Oba, S., Ishii, S.: Active discriminative tracking using collective memory. In: MVA 2017
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Learning spatially regularized correlation filters for visual tracking. In: ICCV 2015, pp. 4310–4318 (2015)
Danelljan, M., Robinson, A., Shahbaz Khan, F., Felsberg, M.: Beyond correlation filters: learning continuous convolution operators for visual tracking. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 472–488. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_29
Danelljan, M., Hager, G., Shahbaz Khan, F., Felsberg, M.: Convolutional features for correlation filter based visual tracking. In: ICCVw, pp. 58–66 (2015)
Kiani Galoogahi, H., Fagg, A., Lucey, S.: Learning background-aware correlation filters for visual tracking. In: ICCV 2017
Li, F., et al.: Learning spatial-temporal regularized correlation filters for visual tracking. In: CVPR 2018 (2018)
Danelljan, M., Bhat, G., Khan, F.S., Felsberg, M.: ECO: efficient convolution operators for tracking. In: CVPR (2017)
Zhu, Z., Wang, Q., Li, B., Wu, W., Yan, J., Hu, W.: Distractor-aware Siamese networks for visual object tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11213, pp. 103–119. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01240-3_7
Wang, X., Yang, R., Sun, T., Luo, B.: Learning target-aware attention for robust tracking with conditional adversarial network. In: BMVC, p. 131 (2019)
Huang, L., Zhao, X., Huang, K.: GlobalTrack: a simple and strong baseline for long-term tracking. AAAI (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Meshgi, K., Mirzaei, M.S. (2021). Adversarial Semi-supervised Multi-domain Tracking. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12623. Springer, Cham. https://doi.org/10.1007/978-3-030-69532-3_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-69532-3_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69531-6
Online ISBN: 978-3-030-69532-3
eBook Packages: Computer ScienceComputer Science (R0)