The Impact of Feature Extraction on the Performance of a Classifier: kNN, Naïve Bayes and C4.5

Pechenizkiy, Mykola

doi:10.1007/11424918_28

Mykola Pechenizkiy²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3501))

Included in the following conference series:

Conference of the Canadian Society for Computational Studies of Intelligence

1332 Accesses
12 Citations

Abstract

“The curse of dimensionality” is pertinent to many learning algorithms, and it denotes the drastic raise of computational complexity and the classification error in high dimensions. In this paper, different feature extraction techniques as means of (1) dimensionality reduction, and (2) constructive induction are analyzed with respect to the performance of a classifier. Three commonly used classifiers are taken for the analysis: kNN, Naïve Bayes and C4.5 decision tree. One of the main goals of this paper is to show the importance of the use of class information in feature extraction for classification and (in)appropriateness of random projection or conventional PCA to feature extraction for classification for some data sets. Two eigenvector-based approaches that take into account the class information are analyzed. The first approach is parametric and optimizes the ratio of between-class variance to the within-class variance of the transformed data. The second approach is a nonparametric modification of the first one based on the local calculation of the between-class covariance matrix. In experiments on benchmark data sets these two approaches are compared with each other, with conventional PCA, with random projection and with plain classification without feature extraction for each classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Principal Components-Based Classification Using a Linear Discriminant Analyzer and Enhancement of its Prediction Accuracy

A subspace recursive and selective feature transformation method for classification tasks

Article Open access 02 December 2017

Analyzing the Impact of Principal Component Analysis on k-Nearest Neighbors and Naive Bayes Classification Algorithms

References

Achlioptas, D.: Database-friendly random projections. In: Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, Santa Barbara, California. ACM Press, New York (2001)
Google Scholar
Aivazyan, S.: Applied statistics: classification and dimension reduction. In: Finance and Statistics, Moscow (1989)
Google Scholar
Aladjem, M.: Multiclass discriminant mappings. Signal Processing 35, 1–18 (1994)
Article MATH Google Scholar
Aladjem, M.: Parametric and nonparametric linear mappings of multidimensional data. Pattern Recognition 24(6), 543–553 (1991)
Article Google Scholar
Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press, Princeton (1961)
MATH Google Scholar
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, San Francisco (2001)
Google Scholar
Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. Dept. of Information and Computer Science. University of California, Irvine CA (1998)
Google Scholar
Fayyad, U.M.: Data Mining and Knowledge Discovery: Making Sense Out of Data. IEEE Expert 11(5), 20–25 (1996)
Article Google Scholar
Fradkin, D., Madigan, D.: Experiments with random projections for machine learning. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, Washington (2003)
Google Scholar
Fukunaga, K.: Introduction to statistical pattern recognition. Academic Press, London (1990)
MATH Google Scholar
Jimenez, L., Landgrebe, D.: High Dimensional Feature Reduction Via Projection Pursuit. PhD Thesis and School of Electrical & Computer Engineering Technical Report TR-ECE 96-5 (1995)
Google Scholar
Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)
Google Scholar
Kiang, M.: A comparative assessment of classification methods. Decision Support Systems 35, 441–454 (2003)
Article Google Scholar
Kohavi, R., Sommerfield, D., Dougherty, J.: Data mining using MLC++: a machine learning library in C++. Tools with Artificial Intelligence, pp. 234–245. IEEE CS Press, Los Alamitos (1996)
Google Scholar
Liu, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publishers, Dordrecht (1998)
MATH Google Scholar
Michalski, R.S.: Seeking Knowledge in the Deluge of Facts. Fundamenta Informaticae 30, 283–297 (1997)
Google Scholar
Oza, N.C., Tumer, K.: Dimensionality Reduction Through Classifier Ensembles. Technical Report NASA-ARC-IC-1999-124, Computational Sciences Division, NASA Ames Research Center, Moffett Field, CA (1999)
Google Scholar
Quinlan, J.R.: C4.5 Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Tsymbal, A., Pechenizkiy, M., Puuronen, S., Patterson, D.W.: Dynamic integration of classifiers in the space of principal components. In: Kalinichenko, L.A., Manthey, R., Thalheim, B., Wloka, U. (eds.) ADBIS 2003. LNCS, vol. 2798, pp. 278–292. Springer, Heidelberg (2003)
Chapter Google Scholar
Tsymbal, A., Puuronen, S., Pechenizkiy, M., Baumgarten, M., Patterson, D.: Eigenvector-based feature extraction for classification. In: Proc. 15th Int. FLAIRS Conference on Artificial Intelligence, Pensacola, FL, USA, pp. 354–358. AAAI Press, Menlo Park (2002)
Google Scholar
Witten, I., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Information Systems, University of Jyväskylä, Jyväskylä, Finland
Mykola Pechenizkiy

Authors

Mykola Pechenizkiy
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Département d’informatique et de recherche opérationelle, CP 6128 succ. Centre-Ville, Université de Montréal, H3C 3J7, Montréal, Canada
Balázs Kégl
Département d’informatique et de recherche opérationnelle, Université de Montréal,
Guy Lapalme

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pechenizkiy, M. (2005). The Impact of Feature Extraction on the Performance of a Classifier: kNN, Naïve Bayes and C4.5. In: Kégl, B., Lapalme, G. (eds) Advances in Artificial Intelligence. Canadian AI 2005. Lecture Notes in Computer Science(), vol 3501. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11424918_28

Download citation

DOI: https://doi.org/10.1007/11424918_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25864-3
Online ISBN: 978-3-540-31952-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics