Abstract
Hashing, as an efficient approach, has been widely used for large-scale similarity search. Unfortunately, many existing hashing methods based on observed keyword features are not effective for short texts due to the sparseness and shortness. Recently, some researchers try to construct semantic relationship using certain granularity topics. However, the topics of certain granularity are insufficient to preserve the optimal semantic similarity for different types of datasets. On the other hand, tag information should be fully exploited to enhance the similarity of related texts. We, therefore, propose a novel unified hashing approach that the optimal topic features can be selected automatically to be integrated with original features for preserving similarity, and tags are fully utilized to improve hash code learning. We carried out extensive experiments on one short text dataset and even one normal text dataset. The results demonstrate that our approach is effective and significantly outperforms baseline methods on several evaluation metrics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: FOCS 2006, pp. 459–468. IEEE (2006)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15(6), 1373–1396 (2003)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. The Journal of Machine Learning Research 3, 993–1022 (2003)
Chen, M., Jin, X., Shen, D.: Short text classification improved by learning multi-granularity topics. In: IJCAI, pp. 1776–1781. AAAI Press (2011)
Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Lang, K.: Newsweeder: Learning to filter netnews. In: ICML. Citeseer (1995)
Lin, G., Shen, C., Suter, D., van den Hengel, A.: A general two-step approach to learning-based hashing. In: ICCV, pp. 2552–2559 (2013)
Phan, X.H., Nguyen, L.M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: WWW, pp. 91–100. ACM (2008)
Salakhutdinov, R., Hinton, G.: Semantic hashing. IJAR 50(7), 969–978 (2009)
Wang, Q., Zhang, D., Si, L.: Semantic hashing using tags and topic modeling. In: SIGIR, pp. 213–222. ACM (2013)
Weiss, Y., Torralba, A.: Spectral hashing. In: NIPS, vol. 9, pp. 1753–1760 (2008)
Zhang, D., Wang, J., Cai, D., Lu, J.: Extensions to self-taught hashing: Kernelisation and supervision. Practice 29, 38 (2010)
Zhang, D., Wang, J., Cai, D., Lu, J.: Laplacian co-hashing of terms and documents. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 577–580. Springer, Heidelberg (2010)
Zhang, D., Wang, J., Cai, D., Lu, J.: Self-taught hashing for fast similarity search. In: SIGIR, pp. 18–25. ACM (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Xu, J., Xu, B., Zhao, J., Tian, G., Zhang, H., Hao, H. (2014). Short Text Hashing Improved by Integrating Topic Features and Tags. In: Loo, C.K., Yap, K.S., Wong, K.W., Teoh, A., Huang, K. (eds) Neural Information Processing. ICONIP 2014. Lecture Notes in Computer Science, vol 8835. Springer, Cham. https://doi.org/10.1007/978-3-319-12640-1_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-12640-1_36
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12639-5
Online ISBN: 978-3-319-12640-1
eBook Packages: Computer ScienceComputer Science (R0)