Abstract
This chapter aims at providing the reader with the tools required for a statistically significant assessment of feature relevance and of the outcome of feature selection. The methods presented in this chapter can be integrated in feature selection wrappers and can serve to select the number of features for filters or feature ranking methods. They can also serve for hyper-parameter selection or model selection. Finally, they can be helpful for assessing the confidence on predictions made by learning machines on fresh data. The concept of model complexity is ubiquitous in this chapter. Before they start reading the chapter, readers with little or old knowledge of basic statistics should first delve into Appendix A; for others, the latter may serve as a quick reference guide for useful definitions and properties. The first section of the present chapter is devoted to the basic statistical tools for feature selection; it puts the task of feature selection into the appropriate statistical perspective, and describes important tools such as hypothesis tests - which are of general use - and random probes, which are more specifically dedicated to feature selection. The use of hypothesis tests is exemplified, and caveats about the reliability of the results of multiple tests are given, leading to the Bonferroni correction and to the definition of the false discovery rate. The use of random probes is also exemplified, in conjunction with forward selection. The second section of the chapter is devoted to validation and cross-validation; those are general tools for assessing the ability of models to generalize; in the present chapter, we show how they can be used specifically in the context of feature selection; attention is drawn to the limitations of those methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
D.M. Allen. The relationship between variable selection and prediction. Technometrics, 16:125–127, 1974.
C. Ambroise and G. J. McLachlan. Selection bias in gene extraction on the basis of microarray gene-expression data. PNAS, (99):6562–6566, 2002.
U. Anders and O. Korn. Model selection in neural networks. Neural Networks, 12: 309–323, 1999.
J. Bengio and Y. Grandvalet. No unbiased estimator of the variance of K-fold cross-validation. Journal of Machine Learning Research, 5:1089–1105, 2003.
Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B, 85:289–300, 1995.
J. Bi, K.P. Bennett, M. Embrechts, C.M. Breneman, and M. Song. Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research, 3:1229–1243, 2003.
A. Björck. Solving linear least squares problems by gram-schmidt orthogonalization. Nordisk Tidshrift for Informationsbehadlung, 7:1–21, 1967.
L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996.
S. Chen, S.A. Billings, and W. Luo. Orthogonal least squares methods and their application to non-linear system identification. International Journal of Control, 50:1873–1896, 1989.
C. Cortes and M. Mohri. Confidence intervals for area under ROC curve. In Neural information Processing Systems 2004, 2004.
D. R. Cox and D. V. Hinkley. Theoretical Statistics. Chapman and Hall/CRC, 1974.
B. Efron and R.J. Tibshirani. Introduction to the bootstrap. Chapman and Hall, New York, 1993.
C.R. Genovese and L. Wasserman. Operating characteristics and extensions of the false discovery rate procedure. J. Roy. Stat. Soc. B, 64:499–518, 2002.
G.C. Goodwin and R.L. Payne. Dynamic system identification: experiment design and data analysis. Academic Press, 1977.
I. Guyon, J. Makhoul, R. Schwartz, and V. Vapnik. What size test set gives good error rate estimates? IEEE Transactions on Pattern Analysis and Machine Intelligence, 20:52–64, 1998.
K. Jong, E. Marchiori, and M. Sebag. Ensemble learning with evolutionary computation: Application to feature ranking. In 8th International Conference on Parallel Problem Solving from Nature, pages 1133–1142. Springer, 2004.
P. Langley. Selection of relevant features in machine learning, 1994.
I.J. Leontaritis and S.A. Billings. Model selection and validation methods for nonlinear systems. International Journal of Control, 45:311–341, 1987.
G. Monari and G. Dreyfus. Local overfitting control via leverages. Neural Computation, 14:1481–1506, 2002.
A. Y. Ng. On feature selection: learning with exponentially many irrelevant features as training examples. In 15th International Conference on Machine Learning, pages 404–412. Morgan Kaufmann, San Francisco, CA, 1998.
M. Opper and O. Winther. Advances in large margin classifiers, chapter Gaussian processes and Support Vector Machines: mean field and leave-one-out, pages 311–326. MIT Press, 2000.
L. Oukhellou, P. Aknin, H. Stoppiglia, and G. Dreyfus. A new decision criterion for feature selection: Application to the classification of non destructive testing signatures. In European SIgnal Processing COnference (EUSIPCO’98), Rhodes, 1998.
Y. Oussar, G. Monari, and G. Dreyfus. Reply to the comments on ”local overfitting control via leverages” in ”jacobian conditioning analysis for model validation”. Neural Computation, 16:419–443, 2004.
I. Rivals and L. Personnaz. MLPs (mono-layer polynomials and multi-layer perceptrons) for non-linear modeling. JMLR, 2003.
G.A. Seber. Linear regression analysis. Wiley, New York, 1977.
G.A. Seber and C.J. Wild. Nonlinear regression. John Wiley and Sons, New York, 1989.
T. Söderström. On model structure testing in system identification. International Journal of Control, 26:1–18, 1977.
M. Stone. Cross-validatory choice and assessment of statistical predictions. J. Roy. Stat. Soc. B, 36:111–147, 1974.
H. Stoppiglia. Méthodes Statistiques de Sélection de Modèles Neuronaux; Applications Financières et Bancaires. PhD thesis, l’Université Pierre et Marie Curie, Paris, 1997. (available electronically at http://www.neurones.espci.fr).
H. Stoppiglia, G. Dreyfus, R. Dubois, and Y. Oussar. Ranking a random feature for variable and feature selection. Journal of Machine Learning Research, pages 1399–1414, 2003.
J.D. Storey and R. Tibshirani. Statistical significance for genomewide studies. Proc. Nat. Acad. Sci., 100:9440–9445, 2003.
V. Vapnik. Statistical Learning Theory. John Wiley & Sons, N.Y., 1998.
V.N. Vapnik. Estimation of dependencies based on empirical data. Springer, New-York, 1982.
D. Wolpert and W.G. Macready. An efficient method to estimate bagging’s generalization error. Machine Learning, 35(1):41–55, 1999.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Dreyfus, G., Guyon, I. (2006). Assessment Methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-540-35488-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35487-1
Online ISBN: 978-3-540-35488-8
eBook Packages: EngineeringEngineering (R0)