Abstract
Nowadays, document clustering technology has been extensively used in text mining, information retrieval systems and etc. The input of network is the key problem for topical concept utilizing the Neural Network. This paper presents an input model of Neural Network that calculates the Mutual Information between contextual words and ambiguous word by using statistical method and taking the contextual words to certain number beside the topical concept according to (-M, +N). In this paper, we introduce a novel topical document clustering method called Document Characters Indexing Clustering (DCIC), which can identify topics accurately and cluster documents according to these topics. In DCIC, “topic elements” are defined and extracted for indexing base clusters. Additionally, document characters are investigated and exploited. Experimental results show that DCIC based on BP Neural Networks models can gain a higher precision (92.76%) than some widely used traditional clustering methods.
The work is supported by the S&T plan projects of Hubei Provincial Education Department of China (No.Q20122207).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Deerwester, S.T., Dumais, T.K., Landauer, G.W., Furnas, R.A.: Indexing by latent semantic analysis. Journal of the Society for Information Science 41(6), 391–407 (2012)
Lee, D.L., Chuang, H., Seamons, K.: Document Ranking and the Vector-Space Model. IEEE Software 14(2), 67–75 (2009)
Daniel, F. : An analysis of recent work on clustering algorithms. Technical Report, University of Washington (2004)
Zamir, O., Etzioni, O.: Web Document Clustering: A Feasibility Demonstration. In: Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46–54 (2008)
Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Cambridge (2007)
Macskassy, S.A., Banerjee, A., Davison, B.D., Hirsh, H. : Human performance on clustering web pages: a preliminary study. In: Proc. of KDD, New York, NY, USA, pp. 264–268 (August 2008)
Maedche, S., Staab, A.: Ontology learning for the semantic web. IEEE Intelligent Systems 16(2) (2011)
Miller, G.: WordNet: A lexical database for english. CACM 38(11), 39–41 (2012)
Neumann, G., Backofen, R., Baur, J., Becker, M., Braun, C.: An information extraction core system for real world german text processing. In: Proceedings of the Conference on Applied Natural Language Processing, Washington, USA, pp. 208–205(2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Fu, X., Ding, Y. (2014). The Research of Document Clustering Topical Concept Based on Neural Networks. In: Zeng, Z., Li, Y., King, I. (eds) Advances in Neural Networks – ISNN 2014. ISNN 2014. Lecture Notes in Computer Science(), vol 8866. Springer, Cham. https://doi.org/10.1007/978-3-319-12436-0_69
Download citation
DOI: https://doi.org/10.1007/978-3-319-12436-0_69
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12435-3
Online ISBN: 978-3-319-12436-0
eBook Packages: Computer ScienceComputer Science (R0)