Abstract
Lsquare and k-NN classifiers are two machine learning approaches for text classification. Rocchio is the classic method for text classification in information retrieval. Our approach is a supervised method, meaning that the list of categories should be defined and a set of training data should be provided for training the system. In this approach, documents are represented as vectors where each component is associated with a particular word.We propose voting method and OWA operator and Decision Template method for combining classifiers. In these we use an effective and efficient new method called variance-mean based feature filtering method of feature selection. Best feature selection method and combination of methods are used to do feature reduction in the representation phase of text classification is proposed. Using this efficient feature selection method and best classifier combination method we improve the text classification performance.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Kuncheva, L.I.: Combining Pattern Classifiers Methods and Algorithms. John Wiley, Chichester
Soucy, P., Mineau, G.W.: Feature selection strategies for text categorization. In: Xiang, Y., Chaib-draa, B. (eds.) Canadian AI 2003. LNCS (LNAI), vol. 2671, pp. 505–509. Springer, Heidelberg (2003)
Weiss, S., Kasif, S., Brill, E.: Text Classification in USENET Newsgroup: A Progress Report. In: AAAI Spring Symposium on Machine Learning in Information
Hull, D., Pedersen, J., Schutze, H.: Document Routing as Statistical Classification. In: AAAI Spring Symposium on Machine Learning in Information Access Technical Papers
Schutze, H., Hull, D., Pedersen, J.: A Comparison of Classifiers and Document Representations for the Routing Problem. In: SIGIR 1995, Washington, DC, pp. 229–237 (1995)
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York
Lewis, D.D.: Representation and Learning in Information Retrieval, University of Massachusetts Amherst, MA (1992)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk e-mail, Learning for Text Categorization, pp. 55–62. AAAI Press, Menlo Park (1998)
McCallum, A.K., Nigam, K.: A comparison of event models for naïve Bayes text classification. In: Proc. of AAAI 1998 Workshop on Learning for Text Categorization (1998)
Dong, Y.-S., Han, K.-S.: A Comparison of Several Ensemble Methods for Text Categorization. In: Proc. of the 2004 IEEE International Conference on Services Computing (2004)
Kuncheva, L.I.: Switching between Selection and Fusion in Combining Classifiers: An Experiment. IEEE Transaction on Systems, Man, and Cybernetics - part B: Cybernetics
Douglas Baker, L., McCallum, A.: Distributional clustering of words for text classification. In: Proc. of the 21st Intl. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 96–103. ACM Press, New York (1998)
Lewis, D., Ringutte, M.: A comparison of Two Learning Algorithm for Text Categorization. In: Third Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV, pp. 81–93 (1994)
Bell, D.A., Guan, J.W., Bi, Y.: On Combining Classifier Mass Functions for Text Categorization. IEEE Transactions on Knowledge and Data Engineering
Dumais, S.T., Platt, J., Heckerman, D., Sahami, M.: Inductive Learning Algorithms and Representations for Text Categorization. In: Proc. Seventh Int’l Conf. Information and Knowledge Management (1998)
Felici, G., Sun, F., Truemper, K.: A Method for Controlling Errors in Two-Class Classification. In: Proc. 23rd Ann. Int’l Computer Software and Applications Conf.
Felici, G., Truemper, K.: A Minsat Approach for Learning in Logic Domains. Informs J. Computing 14(1) (Winter 2002)
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)
Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proceedings of 14th International Conference on Machine Learning, San Francisco, pp. 412–420 (1997)
Al-Mubaid, H., Umair, S.A.: A New Text Categorization Technique Using Distributional Clustering and Learning Logic. Proc. of IEEE Transactions on Knowledge and Data Engineering 18(9) (September 2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Srinivas, M., Supreethi, K.P., Prasad, E.V., Kumari, S.A. (2009). Efficient Text Classification Using Best Feature Selection and Combination of Methods. In: Smith, M.J., Salvendy, G. (eds) Human Interface and the Management of Information. Designing Information Environments. Human Interface 2009. Lecture Notes in Computer Science, vol 5617. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02556-3_50
Download citation
DOI: https://doi.org/10.1007/978-3-642-02556-3_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02555-6
Online ISBN: 978-3-642-02556-3
eBook Packages: Computer ScienceComputer Science (R0)