Markov random field based fusion for supervised and semi-supervised multi-modal image classification | Multimedia Tools and Applications Skip to main content
Log in

Markov random field based fusion for supervised and semi-supervised multi-modal image classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In recent years, there has been a massive explosion of multimedia content on the web, multi-modal examples such as images associated with tags can be easily accessed from social website such as Flickr. In this paper, we consider two classification tasks: supervised and semi-supervised multi-modal image classification, to take advantage of the increasing multi-modal examples on the web. We first propose a Markov random field (MRF) based fusion method: discriminative probabilistic graphical fusion (DPGF) for the supervised multi-modal image classification, which can make use of the associated tags to enhance the classification performance. Based on DPGF, we then propose a three-step learning procedure: DPGF+RLS+SVM, for the semi-supervised multi-modal image classification, which uses both the labeled and unlabeled examples for training. Experimental results on two datasets: PASCAL VOC’07 and MIR Flickr, show that our methods can well exploit the multi-modal data and unlabeled examples, and they also outperform previous state-of-the-art methods in both two multi-modal image classification. Finally we consider the weakly supervised condition where class labels are from image tags which are noisy. Our semi-supervised approach also improves the classification performance in this case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimed Syst 16(6):345–379

    Article  Google Scholar 

  2. Bach FR, Lanckriet GRG, Jordan MI (2004) Multiple kernel learning, conic duality, and the SMO algorithm. In: Proceedings of the 21st international conference on machine learning. ACM, p 6

  3. Baluja S (1998) Probabilistic modeling for face orientation discrimination: learning from labeled and unlabeled data. NIPS

  4. Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classification. In: Proceedings of the international conference on image processing, ICIP 2003, vol 3. IEEE

  5. Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434

    MATH  MathSciNet  Google Scholar 

  6. Bishop CM, Nasrabadi NM (2006) Pattern recognition and machine learning, vol 1. Springer, New York

    Google Scholar 

  7. Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th annual conference on computational learning theory. ACM, pp 92-100

  8. Cai D, He X, Han J (2007) Semi-supervised discriminant analysis. In: IEEE 11th international conference on computer vision, ICCV 2007. IEEE, pp 1–7

  9. Chang C-C, Lin C-J (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):27

    Google Scholar 

  10. Chang S-F, Manmatha R, Chua T-S (2005) Combining text audio-visual features in video indexing. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP’05), vol 5. IEEE

  11. Chapelle O, Haffner P, Vapnik VN (1999) Support vector machines for histogram-based image classification. IEEE Trans Neural Netw 10(5):1055–1064

    Article  Google Scholar 

  12. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  13. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition, 2005, CVPR, vol 1. IEEE, pp 886–893

  14. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2007) The PASCAL Visual Object Classes Challenge (VOC2007) Results. http://www.pascal-network.org/challenges/VOC/voc2007/workshop/index.html

  15. Gao Y, Wang M, Zha Z-J, Shen J, Li X, Wu X (2013) Visual-textual joint relevance learning for tag-based social image search, p 1

  16. Goumehei E, Tolpekin VA (2010) Contextual image classification with support vector machine

  17. Guillaumin M, Verbeek J, Schmid C (2010) Multimodal semi-supervised learning for image classification. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE

  18. Hammersley JM, Clifford P (1968) Markov fields on finite graphs and lattices

  19. Huiskes MJ, Lew MS (2008) The MIR flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval. ACM

  20. Chapelle O, Scholkopf B, Zien A (eds) (2006) Semi-supervised learning. MIT press, Cambridge

  21. Iyengar G, Nock H, Neti C, Franz M (2002) In: Proceedings of IEEE international conference on multimedia and expo, 2002 ICME02, vol 2. IEEE, pp 369–372

  22. Kawanabe M, Binder A, Muller C, Wojcikiewicz W (2011) Multi-modal visual concept classification of images via Markov random walk over tags. In: IEEE workshop on applications of computer vision (WACV). IEEE, pp 396–401

  23. Li S Z (1995) Markov random field modeling in computer vision. Springer, New York

    Book  Google Scholar 

  24. Li Y, Crandall DJ, Huttenlocher DP (2009) Landmark classification in large-scale image collections. In: IEEE 12th international conference on computer vision. IEEE, pp 1957–1964

  25. Lienhart R, Romberg S, H?rster E (2009) Multilayer pLSA for multimodal image retrieval. In: Proceedings of the ACM international conference on image and video retrieval. ACM, p 9

  26. Lin HT, Lin CJ, Weng RC (2007) A note on Platts probabilistic outputs for support vector machines[J]. Mach Learn 68(3):267–276

    Article  Google Scholar 

  27. Liu N, Dellandrea E, Zhu C, Bichot C-E, Chen L (2012) A selective weighted late fusion for visual concept recognition. In: Workshops and demonstrations omputer Vision CECCV. Springer, Berlin Heidelberg, pp 426–435

  28. Nigam K, McCallum AK, Thrun S, Mitchell T (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2–3):103–134

    Article  MATH  Google Scholar 

  29. Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. Computer Vision CECCV 2006. Springer, Berlin Heidelberg, pp 490–503

  30. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. In: International journal of computer vision 42.3, pp 145–175

  31. Pang Y, Ma Z, Yuan Y, Li X, Wang K (2011) Multimodal learning for multi-label image classification. In: 18th IEEE international conference on image processing (ICIP), 2011. IEEE, pp 1797–1800

  32. Papadopoulos S, Zigkolis C, Kompatsiaris Y, Vakali A (2010) Cluster-based landmark and event detection on tagged photo collections. IEEE Multimedia

  33. Perronnin F, Snchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Computer Vision CECCV 2010. Springer, Berlin Heidelberg, pp 143–156

  34. Sindhwani V, Niyogi P, Belkin M (2005) A co-regularization approach to semi-supervised learning with multiple views. In: Proceedings of ICML workshop on learning with multiple views, pp 74–79

  35. Snoek CGM, Worring M, Arnold WMS (2005) Early versus late fusion in semantic video analysis. In: Proceedings of the 13th annual ACM international conference on multimedia. ACM

  36. Srivastava N, Salakhutdinov R (2012) Multimodal learning with deep Boltzmann machines. In: Advances in neural information processing systems, p 25

  37. Sun S (2011) Multi-view Laplacian support vector machines. In: Advanced data mining and applications. Springer, Berlin Heidelberg, pp 209–222

  38. Verbeek J, Guillaumin M, Mensink T et al (2010) Image annotation with tagprop on the MIRFLICKR set. In: Proceedings of the international conference on multimedia information retrieval. ACM, pp 537–546

  39. Wang G, Hoiem D, Forsyth D (2009) Building text features for object image classification. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009, IEEE, pp 1367–1374

  40. Wang J, Yang J, Kai Y, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3360–3367

  41. Xiang Y, Zhou X, Chua T-S, Ngo C-W (2009) A revisit of generative model for automatic image annotation using markov random fields. In: IEEE conference on computer vision and pattern recognition, 2009. CVPR 2009, IEEE, pp 1153–1160

  42. Yang J, Li Y, Tian Y, Duan L, Gao W (2009) Group-sensitive multiple kernel learning for object categorization. In: IEEE 12th international conference on computer vision. IEEE, pp 436–443

  43. Znaidia A, Shabou A, Popescu A, Le Borgne H, Hudelot C (2012) Multimodal feature generation framework for semantic image classification. In: Proceedings of the 2nd ACM international conference on multimedia retrieval. ACM

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Pan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, L., Pan, P. & Lu, Y. Markov random field based fusion for supervised and semi-supervised multi-modal image classification. Multimed Tools Appl 74, 613–634 (2015). https://doi.org/10.1007/s11042-014-2018-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2018-y

Keywords

Navigation