Abstract
Previous Bayesian document classification has a problem because it does not reflect semantic relation accurately in expressing characteristic of document. In order to resolve this problem, this paper suggests Bayesian document classification method through mining and refining of association word. Apriori algorithm extracts characteristic of test document in form of association words that reflects semantic relation and it mines association words from learning documents. If association word from learning documents is mined only with Apriori algorithm, inappropriate association word is included within them. Accordingly it has disadvantage of lack of accuracy in document classification. In order to complement the disadvantage, we adopt method to refine association words through use of genetic algorithm. Naïve Bayes classifier classifies test documents based on refined association words.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th VLDB Conference, Santiago, Chile, 1994.
R. Agrawal and T. Imielinski and A. Swami, “Mining association rules between sets of items in large databases,” In Proceedings of the 1993 ACM SIGMOD Conference, Washington DC, USA, 1993.
H. Chen, Y. Chung, M. Ramsey, C. Yang, P. Ma, J. Yen, “Intelligent Spider for Internet Searching,” Proceedings of the 30th Annual Hawaii International Conference on System Sciences — Volume IV, pp. 178–188, 1997.
W. Frakes and R. Baeza-Yates, information Retrieval, Prentice Hall, 1992.
T. Joachims, “A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization,” ICML-97, 1997.
S. J. Ko and J. H. Lee, “Feature Selection using Association Word Mining for Classification,” In Proceedings of DEXA2001, LNCS2113, 2001.
V. Hatzivassiloglou and K. McKeown, “Towards the automatic identification of adjectival scales: Clustering adjectives according to meaning,” Proceedings of the 31st Annual Meeting of the ACL, pp. 172–182, 1993.
Introduction to Rainbow URL:http://www.cs.cmu.edu/afs/cs/project/theo-11/www/naivebayes.html.
D. D. Lewis, “Naive (Bayes) at forty: The Independence Assumption in Information Retrieval,” In European Conference on Machine Learning, 1998.
Y. H. Li and A. K. Jain, “Classification of Text Documents,” The Computer Journal, Vol. 41, No. 8, 1998.
M. E. Maron, “Automatic indexing: An experimental inquiry,” Journal of the Association for Computing Machinery, 8:404–417, 1961.
T. Michael, Maching Learning, McGraw-Hill, pp. 154–200, 1997.
A. McCallum and K. Nigram, “A Comparison of Event Models for Naive Bayes Text Classification,” AAAI-98 Workshop on Learning for Text Categorization, 1998.
J. McMahon and F. Smith, “Improving statistical language model performance with automatically generated word hierarchies,” Computational Linguistics, Vol. 22, No. 2, 1995.
D. Mladenic, “Feature subset selection in text-learning,” Proceedings of the 10th European Conference on Machine Learning, pp. 95–100, 1998.
Cognitive Science Laboratory, Princeton University, “Word Net-a Lexical Database for English,” http://www.cogsci.princeton.edu/~wn/.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ko, S.J., Choi, J.H., Lee, J.H. (2003). Bayesian Web Document Classification through Optimizing Association Word. In: Chung, P.W.H., Hinde, C., Ali, M. (eds) Developments in Applied Artificial Intelligence. IEA/AIE 2003. Lecture Notes in Computer Science(), vol 2718. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45034-3_57
Download citation
DOI: https://doi.org/10.1007/3-540-45034-3_57
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40455-2
Online ISBN: 978-3-540-45034-4
eBook Packages: Springer Book Archive