Scene recognition by semantic visual words | Signal, Image and Video Processing Skip to main content
Log in

Scene recognition by semantic visual words

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In this paper, we propose a novel approach to introduce semantic relations into the bag-of-words framework. We use the latent semantic models, such as latent semantic analysis (LSA) and probabilistic latent semantic analysis (pLSA), in order to define semantically rich features and embed the visual features into a semantic space. The semantic features used in LSA technique are derived from the low-rank approximation of word–image occurrence matrix by singular value decomposition. Similarly, by using the pLSA approach, the topic-specific distributions of words can be considered dimensions of a concept space. In the proposed space, the distances between words represent the semantic distances which are used for constructing a discriminative and semantically meaningful vocabulary. Position information significantly improves scene recognition accuracy. Inspired by this, in this paper, we bring position information into the proposed semantic vocabulary frameworks. We have tested our approach on the 15-Scene and 67-MIT Indoor datasets and have achieved very promising results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. Polysemy is the existence of words which convey different concepts in different images. For instance in text domain, the word table can either be interpreted as a piece of furniture or an arrangement of data.

References

  1. Bosch, A., Zisserman, A., Muñoz, X.: Scene classification via pLSA. In: European Conference on Computer Vision (ECCV) (2006)

  2. Bosch, A., Zisserman, A., Muoz, X.: Scene classification using a hybrid generative/discriminative approach. IEEE Trans. Pattern Anal. Mach. Intell. 30(4), 712–727 (2008)

    Article  Google Scholar 

  3. Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: International Workshop on Statistical Learning in Computer Vision, ECCV (2004)

  4. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)

    Article  Google Scholar 

  5. Farahzadeh, E., Cham, T.J., Li, W.: Incorporating local and global information using a novel distance function for scene recognition. In: IEEE Workshop on Robot Vision, Winter Vision Meetings (WVM) (2013)

  6. Fei-Fei, L., Perona, P.: A Bayesian hierarchical model for learning natural scene categories. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2005)

  7. Grauman, K., Darrell, T.: The pyramid match kernel: discriminative classification with sets of image features. In: IEEE International Conference on Computer Vision (ICCV) (2005)

  8. Gupta, S., Arbelaez, P., Malik, J.: Perceptual organization and recognition of indoor scenes from rgb-d images. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2013)

  9. Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)

    Article  MATH  Google Scholar 

  10. Shotton, J., Johnson, R.C. M.: Semantic texton forests for image categorization and segmentation. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2008)

  11. Kwitt, R., Vasconcelos, N., Rasiwasia, N.: Scene recognition on the semantic manifold. In: European Conference on Computer Vision (ECCV) (2012)

  12. Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2006)

  13. Li, L.J., Su, H., Xing, E.P., Fei-Fei, L.: Object bank: A high-level image representation for scene classification and semantic feature sparsification. In: Neural Information Processing Systems (NIPS) (2010)

  14. Li, X., Guo, Y.: An object co-occurrence assisted hierarchical model for scene understanding. In: British Computer Vision Conference (BMVC) (2012)

  15. Liu, D., Chen, T.: Unsupervised image categorization and object localization using topic models and correspondences between images. In: IEEE International Conference on Computer Vision (ICCV) (2007)

  16. Liu, J., Shah, M.: Scene modeling using co-clustering. In: IEEE International Conference on Computer Vision (ICCV) (2007)

  17. Liu, J., Shah, M.: Learning human action via information maximization. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2008)

  18. Liu, J., Yang, Y., Shah, M.: Learning semantic visual vocabularies using diffusion distance. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2009)

  19. Lowe, D.: Object recognition from local scale-invariant features. In: IEEE International Conference on Computer Vision (ICCV) (1999)

  20. Moosmann, F., Triggs, B., Jurie, F.: Fast discriminative visual codebooks using randomized clustering forests. In: Neural Information Processing Systems (NIPS) (2006)

  21. Niebles, J.C., Wang, H., Fei-Fei, L.: Unsupervised learning of human action categories using spatial–temporal words. Int. J. Comput. Vis. 79(3), 299–318 (2008)

    Article  Google Scholar 

  22. Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)

    Article  MATH  Google Scholar 

  23. Felzenszwalb, P.F., Girshick, D.M.R.B., Ramanan, D.: Object detection with discriminatively trained part-based models. In: IEEE Transactions on Pattern Analysis and Machine Intelligence (2010)

  24. Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: IEEE International Conference on Computer Vision (ICCV) (2011)

  25. Parizi, S., Oberlin, J., Felzenszwalb, P.: Reconfigurable models for scene recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2012)

  26. Quattoni, A., Torralba, A.: Indoor scene recognition. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) (2009)

  27. Quelhas, P., Monay, F., Odobez, J.m., Gatica-perez, D., Tuytelaars, T., Van Gool, L.: Modeling scenes with local descriptors and latent aspects. In: IEEE International Conference on Computer Vision (ICCV) (2005)

  28. Saghafi, B., Farahzadeh, E., Rajan, D., Sluzek, A.: Embedding visual words into concept space for action and scene recognition. In: British Machine Vision Conference (BMVC) (2010)

  29. Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: European Conference on Computer Vision (ECCV) (2012)

  30. Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images. In: IEEE International Conference on Computer Vision (ICCV) (2005)

  31. Vogel, J., Schiele, B.: Natural scene retrieval based on a semantic modeling step. In: ACM International Conference on Image and Video Retrieval (CIVR) (2004)

  32. Wu, J., Rehg, J.M.: CENTRIST: a visual descriptor for scene categorization. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1489–1501 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elahe Farahzadeh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Farahzadeh, E., Cham, TJ. & Sluzek, A. Scene recognition by semantic visual words. SIViP 9, 1935–1944 (2015). https://doi.org/10.1007/s11760-014-0687-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-014-0687-7

Keywords