Abstract
Naive Bayes is often used in text classification applications and experiments because of its simplicity and effectiveness. However, many different versions of Bayes model consider only one aspect of a particular word. In this paper we define an information criterion, Projective Information Gain, to decide which representation is appropriate for a specific word. Based on this, the conditional independence assumption is extended to make it more efficient and feasible and then we propose a novel Bayes model, General Naive Bayes (GNB), which can handle two representations concurrently. Experimental results and theoretical justification that demonstrate the feasibility of our approach are presented.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Sahami, M., et al.: A Bayesian approach to filtering junk e-mail. In: Proceedings of the AAAI Workshop, pp. 55–62. AAAI Press, Menlo Park (1998)
Androutsopoulos, I., Paliouras, G.: Learning to filter spam e-mail: A comparison of a Naive Bayesian and a memory-based approach. In: Zighed, A.D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 1–13. Springer, Heidelberg (2000)
Pazzani, M., Billsus, D.: Learning and revising user profiles: The identification of interesting web sites. Machine Learning 27, 313–331 (1997)
Vialta, R., Rish, I.: A Decomposition of Classes via Clustering to Explain and Improve Naive Bayes. In: Lavrač, N., et al. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, pp. 444–455. Springer, Heidelberg (2003)
Jiang, L., Zhang, H., Cai, Z.: Dynamic K-Nearest-Neighbor Naive Bayes with Attribute Weighted. In: Wang, L., et al. (eds.) FSKD 2006. LNCS (LNAI), vol. 4223, pp. 365–368. Springer, Heidelberg (2006)
Schneider, K.-M.: On Word Frequency Information and Negative Evidence in Naive Bayes Text Classification. In: Vicedo, J.L., et al. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 474–485. Springer, Heidelberg (2004)
Kalt, T., Croft, W.B.: A new probabilistic model of text classification and retrieval. Technical Report IR-78, University of Massachusetts Center for Intelligent Information Retrieval (1996)
Shi, Z.: Semi-supervised model-based document clustering: A comparative study. Machine Learning 65, 3–29 (1998)
http://www.daviddlewis.com/resources/testcollections/reuters21578/
Daphne, K., Mehran, S.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning, pp. 329–387 (1997)
Mehran, S.: Learning limited dependence Bayesian classifiers. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 335–338 (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Wang, L., Cao, C., Li, X., Li, H. (2007). Finding the Optimal Feature Representations for Bayesian Network Learning. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_96
Download citation
DOI: https://doi.org/10.1007/978-3-540-71701-0_96
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71700-3
Online ISBN: 978-3-540-71701-0
eBook Packages: Computer ScienceComputer Science (R0)