Abstract
This paper proposes a feed forward architecture algorithm using fusion of features and classifiers for semantic segmentation. The algorithm consists of three phases: Firstly, the features from hierarchical convolutional neural network (CNN) and the features based on region are extracted and fused on super pixel level; secondly, multiple classifiers of Softmax, XGBoost and Random Forest are ensemble to compute the per-pixel class probabilities; at last, a fully connected conditional random field is employed to enhance the final performance. The hierarchical features contain more global evidence and the region features contain more local evidence. So the fusion of these two features is expected to enhance the feature representation ability. In classification phase, integrating multiple classifiers aims to improve the generalization ability of classification algorithms. Experiments are conducted on Sift-Flow datasets by our proposed methods with competitive labeling accuracy.
Similar content being viewed by others
References
Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Susstrunk S (2012) Slic superpixels compared to state-of-the-art superpixel methods. IEEE Trans Pattern Anal Mach Intell 34(11):2274–2282
Bengio Y (2009) Learning deep architectures for AI. Found Trends Mach Learn 2(1):1–127
Bertasius G, Shi J, Torresani L (2015) High-for-low and low for-high: Efficient boundary detection from deep object features and its applications to high-level vision. In: Proceedings of the IEEE International Conference on Computer Vision, p 504–512
Bu S, Han P, Liu Z, Han J (2016) Scene parsing using inference embedded deep networks. Pattern Recongnition 59:188–198
Byeon W, Breuel TM, Raue F, Liwicki M (2015) Scene labeling with LSTM recurrent neural network. In: CVPR
Caesar H, Jasper U, Ferrari V. Region-based semantic segmentation with end-to-end training. arXiv1607.07671
Carreira J, Caseiro R, Batista J, Sminchisescu C (2012) Semantic segmentation with second-order pooling. In: ECCV
Chen L-C, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2015) Semantic image segmentation with deep convolutional nets and fully connected crfs. In: ICLRW
Dai J, He K, Sun J (2015) Convolutional feature masking for joint object and stuff segmentation. In: CVPR
Eigen D, Puhrsch C, Fergus R (2014) Depth map prediction from a single imae using a multi-scale deep network. In: NIPS
Farabet C, Couprie C, Najman L, LeCun Y (2013) Learning hierarchial features for scene labeling. IEEE TPAMI
Gao Z, Zhang L-f, Chen M-y, Hauptmann A, Zhang H, Cai A (2014) Enhanced and hierarchical structure algorithm for data imbalance problem in semantic extraction under massive video dataset. Multimed Tools Appl 68(3):641–657
Gatta C, Romero A, van de Veijer J (2014) Unrolling loopy top-down semantic feedback in convolutional deep networks. In: Workshop at CVPR
Girshick R (2015) Fast R-CNN. In: ICCV
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR
Hariharan B, Arbelaez P, Girshick R, Malik J (2014) Simultaneous detection and segmentation. In: ECCV
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition, CVPR
Janai J, Güney F, Behl A, Geiger A (2017) Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv:1704.05519v1
Kokkinos I (2015) Pushing the boundaries of boundary detection using deep learning. arXiv preprint arXiv:1511.07386
Krahenbuhl P, Koltun V (2011) Efficient inference in fully connected crfs with Gaussian edge potentials. In: NI-PS
Li F, Carreira J, Lebanon G, Sminchisescu C (2013) Composite statistical inference for semantic segmentation. In: CVPR
Liu A-A, Su Y-T, Nie W-Z, Kankanhalli M (2017) Hierarchical clustering multi-task learning for joint human action grouping and recognition. IEEE Trans Pattern Anal Mach Intell 39(1):102–114
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE CVPR
Nie W-Z, Liu A-A, Gao Z, Su Y-T (2015) Clique-graph matching by preserving global & local structure. In: CVPR 2015. IEEE, Boston
Pinheiro P, Collobert R (2014) Recurrent convolutional neural networks for scene labeling. In: ICML
Plath N, Toussaint M, Nakajima S (2009) Multi-class image segmentation using conditional random fields and global classification. In: ICML
Wu Z, Shen C, Hengel AVD (2016) High-performance semantic segmentation using very deep fully convolutional networks. arXiv preprint arXiv:1604.04339
Yang Y, Song J, Huang Z, Ma Z, Sebe N, Hauptmann AG (2013) Multi-feature fusion via hierarchical regression for multimedia analysis. IEEE Trans Multimedia 15(3):572–581
Zavaschi THH, Britto AS, Oliveira LES, Koerich AL (2013) Fusion of feature sets and classifiers for facial expression recognition. Expert Syst Appl 40(2):646–655
Zhang H, Shang X, Luan H, Wang M, Chua T-S (2016) Learning from collective intelligence: feature learning using social images and tags. ACM Trans Multimed Comput Commun Appl 13:1529–1537
Zheng S, Jayasuamana S, Romera-Paredes B, Vineet V, Su Z, Du D, Huang C, Torr P. Conditional random fields as recurrent neural networks. arXiv preprint ar-Xiv: 1502.03240
Acknowledgements
This research has been supported by National Natural Science Foundation of China (U1509207, 61472278, 61403281 and 61572357).
Author information
Authors and Affiliations
Corresponding author
Additional information
Yanbing Xue and Huiqiang Geng should be considered as joint first authors.
Rights and permissions
About this article
Cite this article
Xue, Y., Geng, H., Zhang, H. et al. Semantic segmentation based on fusion of features and classifiers. Multimed Tools Appl 77, 22199–22211 (2018). https://doi.org/10.1007/s11042-018-5858-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5858-z