Using Word Mover’s Distance with Spatial Constraints for Measuring Similarity Between Mongolian Word Images

Wei, Hongxi; Zhang, Hui; Gao, Guanglai; Su, Xiangdong

doi:10.1007/978-3-319-70093-9_20

Hongxi Wei¹⁸,
Hui Zhang¹⁸,
Guanglai Gao¹⁸ &
…
Xiangdong Su¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10637))

Included in the following conference series:

International Conference on Neural Information Processing

4305 Accesses
2 Citations

Abstract

In the framework of bag-of-visual-words, visual words are independent each other, which results in discarding spatial relations and lacking semantic information of visual words. To capture semantic information of visual words, a deep learning procedure similar to word embedding technique is used for mapping visual words to embedding vectors in a semantic space. And then, word mover’s distance (WMD) is utilized to measure similarity between two word images, which calculates the minimum traveling distance from the visual embeddings of one word image to another one. Moreover, word images are partitioned into several sub-regions with equal sizes along rows and columns in advance. After that, WMDs can be computed from the corresponding sub-regions of the two word images, separately. Thus, the similarity between the two word images is the sum of these WMDs. Experimental results show that the proposed method outperforms various baseline and state-of-the-art methods, including spatial pyramid matching, latent Dirichlet allocation, average visual word embeddings and the original word mover’s distance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 11439; Price includes VAT (Japan)

Softcover Book: JPY 14299; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Integrating Visual Word Embeddings into Translation Language Model for Keyword Spotting on Historical Mongolian Document Images

An angle-based method for measuring the semantic similarity between visual and textual features

Article 06 February 2018

Multi-matched Similarity: A New Method for Image Retrieval

References

Rath, T.M., Manmatha, R.: Word spotting for historical manuscripts. Int. J. Doc. Anal. Recogn. 9(2), 139–152 (2007)
Article Google Scholar
Rath, T.M., Manmatha, R.: Features for word spotting in historical manuscripts. In: Proceedings of ICDAR 2003, pp. 218–222. IEEE Press, New York (2003)
Google Scholar
Rath, T.M., Manmatha, R.: Word image matching using dynamic time warping. In: Proceedings of CVPR 2003, pp. 521–527. IEEE Press, New York (2003)
Google Scholar
Shekhar, R., Jawahar, C.V.: Word image retrieval using bag of visual words. In: Proceedings of DAS 2012, pp. 297–301. IEEE Press, New York (2012)
Google Scholar
Aldavert, D., Rusinol, M., Toledo, R., Llados, J.: A study of bag-of-visual-words representations for handwritten keyword spotting. Int. J. Doc. Anal. Recogn. 18(3), 223–234 (2015)
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Coorado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS 2013, pp. 3111–3119. MIT Press, Massachusetts (2013)
Google Scholar
Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. Proc. Mach. Learn. Res. 37, 957–966 (2015)
Google Scholar
Fornes, A., Frinken, V., Fischer, A., Almazan, J., Jackson, G., Bunke, H.: A keyword spotting approach using blurred shape model-based descriptors. In: Proceedings of HIP 2011, pp. 83–89. ACM Press, New York (2011)
Google Scholar
Aldavert, D., Rusinol, M., Toledo, R., Llados, J.: Integrating visual and textual cues for query-by-string word spotting. In: Proceedings of ICDAR 2013, pp. 511–515. IEEE Press, New York (2013)
Google Scholar
Rothacker, L., Fink, G.A.: Segmentation-free query-by-string word spotting with bag-of-features HMMs. In: Proceedings of ICDAR 2015, pp. 661–665. IEEE Press, New York (2015)
Google Scholar
Wei, H.X., Gao, G.L., Su, X.D.: A multiple instances approach to improving keyword spotting on historical Mongolian document images. In: Proceedings of ICDAR 2015, pp. 121–125. IEEE Press, New York (2015)
Google Scholar
Wei, H.X., Zhang, H., Gao, G.L.: Representing word image using visual word embeddings and RNN for keyword spotting on historical document images. In: Proceedings of ICME 2017, pp. 1374–1379. IEEE Press, New York (2017)
Google Scholar
Wei, H.X., Gao, G.L.: Visual language model for keyword spotting on historical Mongolian document images. In: Proceedings of CCDC 2017, pp. 1765–1770. IEEE Press, New York (2017)
Google Scholar
Wei, H., Gao, G., Su, X.: LDA-based word image representation for keyword spotting on historical Mongolian documents. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9950, pp. 432–441. Springer, Cham (2016). doi:10.1007/978-3-319-46681-1_52
Chapter Google Scholar
Zamani, H., Croft, W.B.: Embeddings-based query language models. In: Proceedings of ICTIR 2016, pp. 147–156. ACM Press, New York (2016)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of EMNLP 2014, pp. 1532–1543. ACL Press, Stroudsburg (2014)
Google Scholar
Nalisnick, E., Mitra, B., Craswell, N., Caruana, R.: Improving document ranking with dual word embeddings. In: Proceedings of WWW 2016, pp. 83–84. ACM Press, New York (2016)
Google Scholar
Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of CVPR 2006, pp. 2169–2178. IEEE Press, New York (2006)
Google Scholar

Download references

Acknowledgement

This paper is supported by the National Natural Science Foundation of China under Grant 61463038.

Author information

Authors and Affiliations

School of Computer Science, Inner Mongolia University, Hohhot, 010021, China
Hongxi Wei, Hui Zhang, Guanglai Gao & Xiangdong Su

Authors

Hongxi Wei
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Guanglai Gao
View author publications
You can also search for this author in PubMed Google Scholar
Xiangdong Su
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongxi Wei .

Editor information

Editors and Affiliations

Guangdong University of Technology, Guangzhou, China
Derong Liu
Guangdong University of Technology, Guangzhou, China
Shengli Xie
South China University of Technology, Guangzhou, China
Yuanqing Li
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Dongbin Zhao
King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
El-Sayed M. El-Alfy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, H., Zhang, H., Gao, G., Su, X. (2017). Using Word Mover’s Distance with Spatial Constraints for Measuring Similarity Between Mongolian Word Images. In: Liu, D., Xie, S., Li, Y., Zhao, D., El-Alfy, ES. (eds) Neural Information Processing. ICONIP 2017. Lecture Notes in Computer Science(), vol 10637. Springer, Cham. https://doi.org/10.1007/978-3-319-70093-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-70093-9_20
Published: 24 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70092-2
Online ISBN: 978-3-319-70093-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Using Word Mover’s Distance with Spatial Constraints for Measuring Similarity Between Mongolian Word Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Integrating Visual Word Embeddings into Translation Language Model for Keyword Spotting on Historical Mongolian Document Images

An angle-based method for measuring the semantic similarity between visual and textual features

Multi-matched Similarity: A New Method for Image Retrieval

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Using Word Mover’s Distance with Spatial Constraints for Measuring Similarity Between Mongolian Word Images

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Integrating Visual Word Embeddings into Translation Language Model for Keyword Spotting on Historical Mongolian Document Images

An angle-based method for measuring the semantic similarity between visual and textual features

Multi-matched Similarity: A New Method for Image Retrieval

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation