Abstract
In recent years, there has been a massive explosion of multimedia content on the web, multi-modal examples such as images associated with tags can be easily accessed from social website such as Flickr. In this paper, we consider two classification tasks: supervised and semi-supervised multi-modal image classification, to take advantage of the increasing multi-modal examples on the web. We first propose a Markov random field (MRF) based fusion method: discriminative probabilistic graphical fusion (DPGF) for the supervised multi-modal image classification, which can make use of the associated tags to enhance the classification performance. Based on DPGF, we then propose a three-step learning procedure: DPGF+RLS+SVM, for the semi-supervised multi-modal image classification, which uses both the labeled and unlabeled examples for training. Experimental results on two datasets: PASCAL VOC’07 and MIR Flickr, show that our methods can well exploit the multi-modal data and unlabeled examples, and they also outperform previous state-of-the-art methods in both two multi-modal image classification. Finally we consider the weakly supervised condition where class labels are from image tags which are noisy. Our semi-supervised approach also improves the classification performance in this case.
Similar content being viewed by others
References
Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimed Syst 16(6):345–379
Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the SMO algorithm. In: Proceedings of the 21st international conference on machine learning. ACM, p 6
Baluja S (1998) Probabilistic modeling for face orientation discrimination: learning from labeled and unlabeled data. NIPS
Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classification. In: Proceedings of the international conference on image processing, ICIP 2003, vol 3. IEEE
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434
Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol 1. Springer, New York
Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th annual conference on computational learning theory. ACM, pp 92-100
Cai D, He X, Han J (2007) Semi-supervised discriminant analysis. In: IEEE 11th international conference on computer vision, ICCV 2007. IEEE, pp 1–7
Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27
Chang S-F, Manmatha R, Chua T-S (2005) Combining text audio-visual features in video indexing. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP’05), vol 5. IEEE
Chapelle O, Haffner P, Vapnik VN (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005, CVPR, vol 1. IEEE, pp 886–893
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The PASCAL Visual Object Classes Challenge (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html
Gao Y, Wang M, Zha Z-J, Shen J, Li X, Wu X (2013) Visual-textual joint relevance learning for tag-based social image search, p 1
Goumehei E, Tolpekin VA (2010) Contextual image classification with support vector machine
Guillaumin M, Verbeek J, Schmid C (2010) Multimodal semi-supervised learning for image classification. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE
Hammersley JM, Clifford P (1968) Markov fields on finite graphs and lattices
Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM
Chapelle O, Scholkopf B, Zien A (eds) (2006) Semi-supervised learning. MIT press, Cambridge
Iyengar G, Nock H, Neti C, Franz M (2002) In: Proceedings of IEEE international conference on multimedia and expo, 2002 ICME02, vol 2. IEEE, pp 369–372
Kawanabe M, Binder A, Muller C, Wojcikiewicz W (2011) Multi-modal visual concept classification of images via Markov random walk over tags. In: IEEE workshop on applications of computer vision (WACV). IEEE, pp 396–401
Li S Z (1995) Markov random field modeling in computer vision. Springer, New York
Li Y, Crandall DJ, Huttenlocher DP (2009) Landmark classification in large-scale image collections. In: IEEE 12th international conference on computer vision. IEEE, pp 1957–1964
Lienhart R, Romberg S, H?rster E (2009) Multilayer pLSA for multimodal image retrieval. In: Proceedings of the ACM international conference on image and video retrieval. ACM, p 9
Lin HT, Lin CJ, Weng RC (2007) A note on Platts probabilistic outputs for support vector machines[J]. Mach Learn 68(3):267–276
Liu N, Dellandrea E, Zhu C, Bichot C-E, Chen L (2012) A selective weighted late fusion for visual concept recognition. In: Workshops and demonstrations omputer Vision CECCV. Springer, Berlin Heidelberg, pp 426–435
Nigam K, McCallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2–3):103–134
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. Computer Vision CECCV 2006. Springer, Berlin Heidelberg, pp 490–503
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. In: International journal of computer vision 42.3, pp 145–175
Pang Y, Ma Z, Yuan Y, Li X, Wang K (2011) Multimodal learning for multi-label image classification. In: 18th IEEE international conference on image processing (ICIP), 2011. IEEE, pp 1797–1800
Papadopoulos S, Zigkolis C, Kompatsiaris Y, Vakali A (2010) Cluster-based landmark and event detection on tagged photo collections. IEEE Multimedia
Perronnin F, Snchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Computer Vision CECCV 2010. Springer, Berlin Heidelberg, pp 143–156
Sindhwani V, Niyogi P, Belkin M (2005) A co-regularization approach to semi-supervised learning with multiple views. In: Proceedings of ICML workshop on learning with multiple views, pp 74–79
Snoek CGM, Worring M, Arnold WMS (2005) Early versus late fusion in semantic video analysis. In: Proceedings of the 13th annual ACM international conference on multimedia. ACM
Srivastava N, Salakhutdinov R (2012) Multimodal learning with deep Boltzmann machines. In: Advances in neural information processing systems, p 25
Sun S (2011) Multi-view Laplacian support vector machines. In: Advanced data mining and applications. Springer, Berlin Heidelberg, pp 209–222
Verbeek J, Guillaumin M, Mensink T et al (2010) Image annotation with tagprop on the MIRFLICKR set. In: Proceedings of the international conference on multimedia information retrieval. ACM, pp 537–546
Wang G, Hoiem D, Forsyth D (2009) Building text features for object image classification. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009, IEEE, pp 1367–1374
Wang J, Yang J, Kai Y, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3360–3367
Xiang Y, Zhou X, Chua T-S, Ngo C-W (2009) A revisit of generative model for automatic image annotation using markov random fields. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009, IEEE, pp 1153–1160
Yang J, Li Y, Tian Y, Duan L, Gao W (2009) Group-sensitive multiple kernel learning for object categorization. In: IEEE 12th international conference on computer vision. IEEE, pp 436–443
Znaidia A, Shabou A, Popescu A, Le Borgne H, Hudelot C (2012) Multimodal feature generation framework for semantic image classification. In: Proceedings of the 2nd ACM international conference on multimedia retrieval. ACM
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xie, L., Pan, P. & Lu, Y. Markov random field based fusion for supervised and semi-supervised multi-modal image classification. Multimed Tools Appl 74, 613–634 (2015). https://doi.org/10.1007/s11042-014-2018-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2018-y