Finding the Optimal Feature Representations for Bayesian Network Learning

Wang, LiMin; Cao, ChunHong; Li, XiongFei; Li, HaiJun

doi:10.1007/978-3-540-71701-0_96

LiMin Wang¹,
ChunHong Cao²,
XiongFei Li¹ &
…
HaiJun Li³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4426))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1482 Accesses

Abstract

Naive Bayes is often used in text classification applications and experiments because of its simplicity and effectiveness. However, many different versions of Bayes model consider only one aspect of a particular word. In this paper we define an information criterion, Projective Information Gain, to decide which representation is appropriate for a specific word. Based on this, the conditional independence assumption is extended to make it more efficient and feasible and then we propose a novel Bayes model, General Naive Bayes (GNB), which can handle two representations concurrently. Experimental results and theoretical justification that demonstrate the feasibility of our approach are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Using Feature-Based Models with Complexity Penalization for Selecting Features

Article Open access 30 June 2016

Efficient parameter learning of Bayesian network classifiers

Article 26 January 2017

Assigning Different Weights to Feature Values in Naive Bayes

References

Sahami, M., et al.: A Bayesian approach to filtering junk e-mail. In: Proceedings of the AAAI Workshop, pp. 55–62. AAAI Press, Menlo Park (1998)
Google Scholar
Androutsopoulos, I., Paliouras, G.: Learning to filter spam e-mail: A comparison of a Naive Bayesian and a memory-based approach. In: Zighed, A.D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 1–13. Springer, Heidelberg (2000)
Google Scholar
Pazzani, M., Billsus, D.: Learning and revising user profiles: The identification of interesting web sites. Machine Learning 27, 313–331 (1997)
Article Google Scholar
Vialta, R., Rish, I.: A Decomposition of Classes via Clustering to Explain and Improve Naive Bayes. In: Lavrač, N., et al. (eds.) ECML 2003. LNCS (LNAI), vol. 2837, pp. 444–455. Springer, Heidelberg (2003)
Google Scholar
Jiang, L., Zhang, H., Cai, Z.: Dynamic K-Nearest-Neighbor Naive Bayes with Attribute Weighted. In: Wang, L., et al. (eds.) FSKD 2006. LNCS (LNAI), vol. 4223, pp. 365–368. Springer, Heidelberg (2006)
Chapter Google Scholar
Schneider, K.-M.: On Word Frequency Information and Negative Evidence in Naive Bayes Text Classification. In: Vicedo, J.L., et al. (eds.) EsTAL 2004. LNCS (LNAI), vol. 3230, pp. 474–485. Springer, Heidelberg (2004)
Google Scholar
Kalt, T., Croft, W.B.: A new probabilistic model of text classification and retrieval. Technical Report IR-78, University of Massachusetts Center for Intelligent Information Retrieval (1996)
Google Scholar
Shi, Z.: Semi-supervised model-based document clustering: A comparative study. Machine Learning 65, 3–29 (1998)
Google Scholar
http://people.csail.mit.edu/people/jrennie/20Newsgroups/
http://www.daviddlewis.com/resources/testcollections/reuters21578/
Daphne, K., Mehran, S.: Hierarchically classifying documents using very few words. In: Proceedings of the 14th International Conference on Machine Learning, pp. 329–387 (1997)
Google Scholar
Mehran, S.: Learning limited dependence Bayesian classifiers. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 335–338 (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, JiLin University, ChangChun 130012, China
LiMin Wang & XiongFei Li
College of Information Science and Engineering, Northeastern University, ShenYang 110004, China
ChunHong Cao
College of Computer Science, YanTai University, YanTai 264005, China
HaiJun Li

Authors

LiMin Wang
View author publications
You can also search for this author in PubMed Google Scholar
ChunHong Cao
View author publications
You can also search for this author in PubMed Google Scholar
XiongFei Li
View author publications
You can also search for this author in PubMed Google Scholar
HaiJun Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zhi-Hua Zhou Hang Li Qiang Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, L., Cao, C., Li, X., Li, H. (2007). Finding the Optimal Feature Representations for Bayesian Network Learning. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_96

Download citation

DOI: https://doi.org/10.1007/978-3-540-71701-0_96
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71700-3
Online ISBN: 978-3-540-71701-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics