Abstract
Most existing text classification methods (and text mining methods at large) are based on representing the documents using the traditional vector space model. We argue that important information, such as the relationship among words, is lost. We propose a term graph model to represent not only the content of a document but also the relationship among the keywords. We demonstrate that the new model enables us to define new similarity functions, such as considering rank correlation based on PageRank-style algorithms, for the classification purpose. Our preliminary results show promising results of our new model.
This work was partially supported by ARC Discovery Grant – DP0346004.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Antoine, M., Zaiane, O.R.: Classifying text documents by associating terms with text categories. In: Proceedings of the 13th Australasian conference on database technologies, Melbourne, Australia, vol. 5, pp. 215–222 (2002)
Beil, F., Ester, M., Xu, X.: Frequent term-based text clustering. In: Proceedings of the 8th International Conference on Knowledge Discovery and Data Mining (2002)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks and ISDN Systems 30, 107–117 (1998)
Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machine (2001), At http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)
Fung, B.C.M., Wang, K., Ester, M.: Hierarchical document clustering using frequent itemsets. In: Proceedings of the SIAM International Conference on Data Mining (2003)
Jackson, P., Moulinier, I.: Natural Language Processing for Online Applications: Text Retrieval, Extraction and Categorization. John Benjamins Publishing Company, Amsterdam/Philadenphia (2002)
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)
Liu, B., Chin, C.W., Ng, H.T.: Mining topic-specific concepts and definitions on the web. In: Proceedings of the 12th International Conference on World Wide Web, pp. 251–260 (2003)
Liu, G., Lu, H., Yu, J.X., Wang, W., XiaoB, X.: AFOPT: An efficient implemetation of pattern growth approach. In: Workshop on Frequent Itemset Mining Implementations, Melbourne, Florida, USA (November 2003)
Paice, C.D.: Another stemmer. SIGIR Forum 24(3), 56–61 (1990), http://www.comp.lancs.ac.uk/computing/research/stemming/
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling, W.T.: Numerical Recipes in C: The Art of Scientific Computing, 2nd edn. Cambridge University Press, Cambridge (1992) ISBN 0-521-43108-5
Salton, G., McGill, M.J.: An Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Sebastiani, F.: Machine learning in automated text categorization. Technical Report Technical Report IEI-B4-31-1999, Consiglio Nazionale delle Ricerche, Pisa, Italy (1999)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Survey 34(1), 1–47 (2002)
Wang, K., Xu, C., Liu, B.: Clustering transactions using large items. In: CIKM 1999 (1999)
Yang, Y.: An evaluation of statistical approaches to text categorization. Technical Report Technical Report CMU-CS-97-127, Carnegie Mellon University (April 1997)
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: 22nd Annual International SIGIR 1999, Berkley, August 1999, pp. 42–49 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, W., Do, D.B., Lin, X. (2005). Term Graph Model for Text Classification. In: Li, X., Wang, S., Dong, Z.Y. (eds) Advanced Data Mining and Applications. ADMA 2005. Lecture Notes in Computer Science(), vol 3584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11527503_5
Download citation
DOI: https://doi.org/10.1007/11527503_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27894-8
Online ISBN: 978-3-540-31877-4
eBook Packages: Computer ScienceComputer Science (R0)