Abstract
Feature selection methods are often used to determine a small set of informative features that guarantee good classification results. Such procedures usually consist of two components: a separability criterion and a selection strategy. The most basic choices for the latter are individual ranking, forward search and backward search. Many intermediate methods such as floating search are also available. The forward as well as backward selection may cause lossy evaluation of the criterion and/or overtraining of the final classifier in case of high-dimensional spaces and small sample size problems. Backward selection may also become computationally prohibitive. Individual ranking, on the other hand, suffers as it neglects dependencies between features. A new strategy based on a pairwise evaluation has recently been proposed by Bo and Jonassen (Genome Biol 3, 2002) and Pękalska et al. (International Conference on Computer Recognition Systems, Poland, pp 271–278, 2005). Since it considers interactions between features, but always restricted to two-dimensional spaces, it may circumvent the small sample size problem. In this paper, we evaluate this idea in a more general framework for the selection of features as well as prototypes. Our finding is that such a pairwise selection may improve over traditional procedures and we present some artificial and real-world examples to support this claim. Additionally, we have also discovered that the set of problems for which the pairwise selection may be effective is small.
Similar content being viewed by others
References
Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96(12):6745–6750
Bennett CH, Gacs P, Li M, Vitányi PMB, Zurek W (1998) Information distance. IEEE Trans Inf Theory IT-44(4):1407–1423
Bo T, Jonassen I (2002) New feature subset selection procedures for classification of expression profiles. Genome Biol 3
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth, California
Brodatz P (1996) Textures: a photographic album for artists and designers. Dover, New York
Bunke H, Sanfeliu A (1990) Syntactic and structural pattern recognition theory and applications. World Scientific
Cover TM, van Campenhout JM (1977) On the possible ordering in the measurement selection problem. IEEE Trans Syst Man Cybern SMC-7(9):657–661
Das S (2001) Filters, wrappers and a boosting-based hybrid for feature selection. In: International Conference on Machine Learning, pp 74–81
Dubuisson MP, Jain AK (1994) Modified Hausdorff distance for object matching. In: International Conference on Pattern Recognition, vol 1, pp 566–568
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, New York
Duin RPW, Juszczak P, de Ridder D, Paclík P, Pękalska E, Tax DMJ (2004) PR-Tools, Pattern Recognition Tools. http://www.prtools.org
Fukunaga K (1990) Introduction to statistical pattern recognition, 2nd edn. Academic, INC
Hall M (2000) Correlation-based feature selection for machine learning. Ph.D Thesis, University of Waikato
Jain AK, Zongker D (1997) Feature selection—evaluation, application, and small sample performance. IEEE Trans Pattern Anal Mach Intell 19(2):153–158
Jain AK, Zongker D (1997) Representation and recognition of handwritten digits using deformable templates. IEEE Trans Pattern Anal Mach Intell 19(12):1386–1391
Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22:4–37
John GH, Kohavi R, Pfleger P (1994) Irrelevant features and the subset selection problem. In: Mahine learning: Proceedings of the Ninth International Conference. Morgan Kaufmann
Kohavi R (1995) The power of decision tables. In: Proceedings of the Eighth European Conference on Machine Learning ECML95, Lecture Notes in Artificial Intelligence, 914, pp 174–189. Springer, Berlin Heidelberg New York
Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324
Li L, Weinberg CR, Darden TA, Pedersen LG (2001) Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parametthe GA/KNN method. Bioinformatics 17:1131–1142
Lozano M, Sotoca JM, Sanchez JS, Pla F, Pękalska E, Duin RPW (2006) Experimental study on prototype optimisation algorithms for dissimilarity based classifiers. Pattern Recognit 39(10):1827–1838
Paclík P, Novovičová J, Somol P, Pudil P (2000) Road sign classification using Laplace Kernel classifier. Pattern Recognit Lett 21(13–14):1165–1173
Pękalska E, Duin RPW (2005) The dissimilarity representation for pattern recognition. Foundations and applications. World Scientific, Singapore
Pękalska E, Harol A, Lai C, Duin RPW (2005) Pairwise selection of features and prototypes. In: International Conference on Computer Recognition Systems, Poland, pp 271–278
Pękalska E, Duin RPW, Paclík P (2002) A generalized Kernel approach to dissimilarity based classification. J Mach Learn Res 2(2):175–211
Pękalska E, Duin RPW, Paclík P (2006) Prototype selection for dissimilarity-based classifiers. Pattern Recognit 39(2):189–208
Pudil P, Novovicova J, Kittler J (1994) Floating search methods in feature selection. Pattern Recognit Lett 15:1119–1125
Vapnik V (1998) Statistical learning theory. Wiley, New York
Veltkamp RC, Hagedoorn M (2000) Shape similarity measures, properties, and constructions. Advances in visual information systems, pp 467–476
Wilson CL, Garris MD (1992) Handprinted character database 3. Technical Report, National Institute of Standards and Technology
Xing E, Jordan M, Karp R (2001) Feature selection for high-dimencional genomic microarray data. In: International Conference on Machine Learning, pp 601–608
Yu L, Liu H (2003) Feature selection for high-dimensional data: a fast correlation-based filter solution. In: International Conference on Machine Learning, Washington
Acknowledgments
This work is supported by the Dutch Organization for Scientific Research (NWO) and the Dutch Cancer Institute (NKI). The authors thank Prof. Anil Jain and Dr. Douglas Zongker for providing the Digit dissimilarity data and Dr. Pavel Paclík for providing the RoadSign dissimilarity data.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Harol, A., Lai, C., Pękalska, E. et al. Pairwise feature evaluation for constructing reduced representations. Pattern Anal Applic 10, 55–68 (2007). https://doi.org/10.1007/s10044-006-0050-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-006-0050-x