Abstract
Text classification is a key technique for handling and organizing text data. The support vector machine(SVM) is shown to be better for the classification among well-known methods. In this paper, the grouping method of the similar words, is proposed for the classification of documents, which is applied to Reuters news and it is shown that the grouping of words has equivalent ability to the Latent Semantic Analysis(LSA) in the classification accuracy. Further, a new combining method is proposed for the classification, which consists of Grouping, LSA followed by the k-Nearest Neighbor classification ( k-NN ). The combining method proposed here, shows the higher accuracy in the classification than the conventional methods of the kNN, and the LSA followed by the kNN. Then, the combining method shows almost same accuracies as SVM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Grossman, D.A., Frieder, O.: Information Retrieval - Algorithms and Heuristics, p. 332. Springer, Heidelberg (2004)
Sebastiani, F.: A tutorial on automated text categorization. In: Proc. of ASAI 1999, 1st Argentinian Symposium on Artificial Intelligence. Buenos Aires, pp. 7–35 (1999)
Derrwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)
Landauer, P.W., Folz, T.K., Laham, D.: Introduction to latent semantic analysis. Discourse Processes 25, 259–284 (1998)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Bao, B., Ishii, N.: Combining Multiple K-Nearest Neighbor Classifiers for Text Classification by Reducts. In: Lange, S., Satoh, K., Smith, C.H. (eds.) DS 2002. LNCS, vol. 2534, pp. 340–347. Springer, Heidelberg (2002)
Sirmakessis, S.: Text Mining and its Application, p. 204. Springer, Heidelberg (2003)
Baldi, P., Frasconi, P., Smyth, P.: Modeling the Internet and the Web, p. 285. Wiley, Chichester (2003)
Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proc. of ACM SIGIR Cof. On Res. And Development in Information Retrieval, SIGIR 1999, pp. 42–49 (1999)
Joachims, T.: A statistical learning model of text classification for support vector machines. In: Proc. of ACM SIGIR Cof. On Res. And Development in Information Retrieval, SIGIR 2001, pp. 128–136 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ishii, N., Murai, T., Yamada, T., Bao, Y., Suzuki, S. (2006). Text Classification: Combining Grouping, LSA and kNN vs Support Vector Machine. In: Gabrys, B., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2006. Lecture Notes in Computer Science(), vol 4252. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11893004_51
Download citation
DOI: https://doi.org/10.1007/11893004_51
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46537-9
Online ISBN: 978-3-540-46539-3
eBook Packages: Computer ScienceComputer Science (R0)