Abstract
This paper presents a video object segmentation method which jointly uses motion boundary and convolutional neural network (CNN)-based class-level maps to carry out the co-segmentation of the frames. The key characteristic of the proposed approach is a combination of those two sources of information to create initial object and background regions. These regions are employed within the co-segmentation energy function. The motion boundary map detects the areas which contain the object movement, and the CNN-based class saliency map determines the regions with more impact on acquiring the correct network classification. The proposed approach can be implemented on unconstrained natural videos which include changes in an object’s appearance, rapidly moving background, object deformation in non-rigid moving, rapid camera motion and even the existence of a static object. Experimental results on two challenging datasets (i.e., Davis and SegTrackv2 datasets) demonstrate the competitive performance of the proposed method compared with the state-of-the-art approaches.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Arbeláez P, Pont-Tuset J, Barron JT, Marques F, Malik J (2014) Multiscale combinatorial grouping. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 328–335
Chatfield K, Simonyan K, Vedaldi A, Zisserman A (2014) Return of the devil in the details: delving deep into convolutional nets. arXiv preprint arXiv:14053531
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 248–255
Dong X, Shen J, Shao L, Yang MH (2015) Interactive cosegmentation using global and local energy optimization. IEEE Trans Image Process 24(11):3966–3977
Faktor A, Irani M (2014) Video segmentation by non-local consensus voting. In: British machine vision (BMVC) conference
Fathi A, Naghsh-Nilchi AR (2013) Integrating adaptive neuro-fuzzy inference system and local binary pattern operator for robust retinal blood vessels segmentation. Neural Comput Appl 22(1):163–174
Fragkiadaki K, Arbelaez P, Felsen P, Malik J (2015) Learning to segment moving objects in videos. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 4083–4090
Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Computer vision and pattern recognition (CVPR) conference, IEEE, pp 447–456
He K, Zhang X, Ren S, Sun J (2015) Deep residual learning for image recognition. arXiv preprint arXiv:151203385
Hochbaum DS, Singh V (2009) An efficient algorithm for co-segmentation. In: Computer vision (ICCV) international conference. IEEE, pp 269–276
Hu YT, Huang JB, Schwing A (2017) Maskrnn: instance level video object segmentation. In: Advances in neural information processing systems. pp 325–334
Jain SD, Xiong B, Grauman K (2017) Fusionseg: learning to combine motion and appearance for fully automatic segmentation of generic objects in videos. arXiv preprint arXiv:170105384
Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Image and video retrieval international conference. ACM, pp 494–501
Kamranian Z, Nilchi ARN, Monadjemi A, Navab N (2018a) Iterative algorithm for interactive co-segmentation using semantic information propagation. Appl Intell 48(12):5019–5036
Kamranian Z, Tombari F, Nilchi ARN, Monadjemi A, Navab N (2018b) Co-segmentation via visualization. J Vis Commun Image Represent 55:201–214
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 1725–1732
Khoreva A, Perazzi F, Benenson R, Schiele B, Sorkine-Hornung A (2016) Learning video object segmentation from static images. arXiv preprint arXiv:161202646
Kim G, Xing EP (2012) On multiple foreground cosegmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 837–844
Kim G, Xing EP, Fei-Fei L, Kanade T (2011) Distributed cosegmentation via submodular optimization on anisotropic diffusion. In: Computer vision (ICCV) international conference. IEEE, pp 169–176
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems (NIPS) conference. NIPS, pp 1097–1105
Lee YJ, Kim J, Grauman K (2011) Key-segments for video object segmentation. In: Computer vision (ICCV) international conference. IEEE, pp 1995–2002
Lee YJ, Ghosh J, Grauman K (2012) Discovering important people and objects for egocentric video summarization. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 1346–1353
Li F, Kim T, Humayun A, Tsai D, Rehg JM (2013) Video segmentation by tracking many figure-ground segments. In: Computer vision (ICCV) international conference. IEEE, pp 2192–2199
Li H, Li Y, Porikli F (2016a) Deeptrack: learning discriminative feature representations online for robust visual tracking. IEEE Trans Image Process 25(4):1834–1848
Li K, Zhang J, Tao W (2016b) Unsupervised co-segmentation for indefinite number of common foreground objects. IEEE Trans Image Process 25(4):1898–1909
Ma C, Huang JB, Yang X, Yang MH (2015) Hierarchical convolutional features for visual tracking. In: Computer vision (ICCV) international conference. IEEE, pp 3074–3082
Ma T, Latecki LJ (2012) Maximum weight cliques with mutex constraints for video object segmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 670–677
Meng F, Li H, Liu G, Ngan KN (2012) Object co-segmentation based on shortest path algorithm and saliency model. IEEE Trans Multimed 14(5):1429–1441
Meng F, Cai J, Li H (2016) Cosegmentation of multiple image groups. Comput Vis Image Underst 146:67–76
Mukherjee L, Singh V, Dyer CR (2009) Half-integrality based algorithms for cosegmentation of images. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 2028–2035
Nam H, Han B (2016) Learning multi-domain convolutional neural networks for visual tracking. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 4293–4302
Oneata D, Revaud J, Verbeek J, Schmid C (2014) Spatio-temporal object detection proposals. In: European conference on computer vision (ECCV). Springer, pp 737–752
Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video. In: Computer Vision (ICCV) International Conference, IEEE, pp 1777–1784
Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 724–732
Rother C, Minka T, Blake A, Kolmogorov V (2006) Cosegmentation of image pairs by histogram matching-incorporating a global constraint into MRFS. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 993–1000
Sadeghian H, Villani L, Kamranian Z, Karami A (2015) Visual servoing with safe interaction using image moments. In: Intelligent robots and systems (IROS) international conference. IEEE, pp 5479–5485
Schwarz LA, Mateus D, Castañeda V, Navab N (2010) Manifold learning for tof-based human body tracking and activity recognition. In: British machine vision (BMVC) conference. Citeseer, pp 1–11
Simonyan K, Zisserman A (2014a) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems (NIPS) conference. NIPS, pp 568–576
Simonyan K, Zisserman A (2014b) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556
Springenberg JT, Dosovitskiy A, Brox T, Riedmiller M (2014) Striving for simplicity: the all convolutional net. arXiv preprint arXiv:14126806
Sundberg P, Brox T, Maire M, Arbeláez P, Malik J (2011) Occlusion boundary detection and figure/ground assignment from optical flow. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 2233–2240
Taylor B, Karasev V, Soatto S (2015) Causal video object segmentation from persistence of occlusions. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 4268–4276
Tsai D, Flagg M, Nakazawa A, Rehg JM (2012) Motion coherent tracking using multi-label MRF optimization. Int J Comput Vis 100(2):190–202
Tsai YH, Yang MH, Black MJ (2016a) Video segmentation via object flow. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 3899–3908
Tsai YH, Zhong G, Yang MH (2016b) Semantic co-segmentation in videos. In: European conference computer vision (ECCV). Springer, pp 760–775
Wang H, Raiko T, Lensu L, Wang T, Karhunen J (2016) Semi-supervised domain adaptation for weakly labeled semantic video object segmentation. arXiv preprint arXiv:160602280
Wang W, Shen J, Porikli F (2015) Saliency-aware geodesic video object segmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 3395–3402
Wen L, Du D, Lei Z, Li SZ, Yang MH (2015) Jots: joint online tracking and segmentation. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 2226–2234
Xiao F, Jae Lee Y (2016) Track and segment: an iterative unsupervised approach for video object proposals. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 933–942
Yu G, Yuan J (2015) Fast action proposals for human action detection and search. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 1302–1311
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision (ECCV). Springer, pp 818–833
Zhang D, Javed O, Shah M (2013) Video object segmentation through spatially accurate and temporally dense extraction of primary object regions. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 628–635
Zhang L, He Z, Liu Y (2017a) Deep object recognition across domains based on adaptive extreme learning machine. Neurocomputing 239:194–203
Zhang L, Yang J, Zhang D (2017b) Domain class consistency based transfer learning for image classification across domains. Inf Sci 418:242–257
Zhang Y, Chen X, Li J, Wang C, Xia C (2015) Semantic object segmentation via detection in weakly labeled video. In: Computer vision and pattern recognition (CVPR) conference. IEEE, pp 3641–3649
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All the authors declare that they have no conflict of interest regarding the publication of this paper.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kamranian, Z., Naghsh Nilchi, A.R., Sadeghian, H. et al. Joint motion boundary detection and CNN-based feature visualization for video object segmentation. Neural Comput & Applic 32, 4073–4091 (2020). https://doi.org/10.1007/s00521-019-04448-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-019-04448-7