Assessment Methods

Dreyfus, Gérard; Guyon, Isabelle

doi:10.1007/978-3-540-35488-8_3

Gérard Dreyfus⁶ &
Isabelle Guyon⁷

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 207))

9383 Accesses

Abstract

This chapter aims at providing the reader with the tools required for a statistically significant assessment of feature relevance and of the outcome of feature selection. The methods presented in this chapter can be integrated in feature selection wrappers and can serve to select the number of features for filters or feature ranking methods. They can also serve for hyper-parameter selection or model selection. Finally, they can be helpful for assessing the confidence on predictions made by learning machines on fresh data. The concept of model complexity is ubiquitous in this chapter. Before they start reading the chapter, readers with little or old knowledge of basic statistics should first delve into Appendix A; for others, the latter may serve as a quick reference guide for useful definitions and properties. The first section of the present chapter is devoted to the basic statistical tools for feature selection; it puts the task of feature selection into the appropriate statistical perspective, and describes important tools such as hypothesis tests - which are of general use - and random probes, which are more specifically dedicated to feature selection. The use of hypothesis tests is exemplified, and caveats about the reliability of the results of multiple tests are given, leading to the Bonferroni correction and to the definition of the false discovery rate. The use of random probes is also exemplified, in conjunction with forward selection. The second section of the chapter is devoted to validation and cross-validation; those are general tools for assessing the ability of models to generalize; in the present chapter, we show how they can be used specifically in the context of feature selection; attention is drawn to the limitations of those methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 34319; Price includes VAT (Japan)

Softcover Book: JPY 42899; Price includes VAT (Japan)

Hardcover Book: JPY 42899; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Powershap: A Power-Full Shapley Feature Selection Method

Measuring the Stability of Feature Selection

On the Stability of Feature Selection in the Presence of Feature Correlations

References

D.M. Allen. The relationship between variable selection and prediction. Technometrics, 16:125–127, 1974.
Article MATH MathSciNet Google Scholar
C. Ambroise and G. J. McLachlan. Selection bias in gene extraction on the basis of microarray gene-expression data. PNAS, (99):6562–6566, 2002.
Article MATH Google Scholar
U. Anders and O. Korn. Model selection in neural networks. Neural Networks, 12: 309–323, 1999.
Article Google Scholar
J. Bengio and Y. Grandvalet. No unbiased estimator of the variance of K-fold cross-validation. Journal of Machine Learning Research, 5:1089–1105, 2003.
MathSciNet Google Scholar
Y. Benjamini and Y. Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B, 85:289–300, 1995.
MathSciNet Google Scholar
J. Bi, K.P. Bennett, M. Embrechts, C.M. Breneman, and M. Song. Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research, 3:1229–1243, 2003.
Article MATH Google Scholar
A. Björck. Solving linear least squares problems by gram-schmidt orthogonalization. Nordisk Tidshrift for Informationsbehadlung, 7:1–21, 1967.
MATH Google Scholar
L. Breiman. Bagging predictors. Machine Learning, 24:123–140, 1996.
MATH MathSciNet Google Scholar
S. Chen, S.A. Billings, and W. Luo. Orthogonal least squares methods and their application to non-linear system identification. International Journal of Control, 50:1873–1896, 1989.
Article MATH MathSciNet Google Scholar
C. Cortes and M. Mohri. Confidence intervals for area under ROC curve. In Neural information Processing Systems 2004, 2004.
Google Scholar
D. R. Cox and D. V. Hinkley. Theoretical Statistics. Chapman and Hall/CRC, 1974.
Google Scholar
B. Efron and R.J. Tibshirani. Introduction to the bootstrap. Chapman and Hall, New York, 1993.
MATH Google Scholar
C.R. Genovese and L. Wasserman. Operating characteristics and extensions of the false discovery rate procedure. J. Roy. Stat. Soc. B, 64:499–518, 2002.
Article MATH MathSciNet Google Scholar
G.C. Goodwin and R.L. Payne. Dynamic system identification: experiment design and data analysis. Academic Press, 1977.
Google Scholar
I. Guyon, J. Makhoul, R. Schwartz, and V. Vapnik. What size test set gives good error rate estimates? IEEE Transactions on Pattern Analysis and Machine Intelligence, 20:52–64, 1998.
Article Google Scholar
K. Jong, E. Marchiori, and M. Sebag. Ensemble learning with evolutionary computation: Application to feature ranking. In 8th International Conference on Parallel Problem Solving from Nature, pages 1133–1142. Springer, 2004.
Google Scholar
P. Langley. Selection of relevant features in machine learning, 1994.
Google Scholar
I.J. Leontaritis and S.A. Billings. Model selection and validation methods for nonlinear systems. International Journal of Control, 45:311–341, 1987.
Article MATH Google Scholar
G. Monari and G. Dreyfus. Local overfitting control via leverages. Neural Computation, 14:1481–1506, 2002.
Article MATH Google Scholar
A. Y. Ng. On feature selection: learning with exponentially many irrelevant features as training examples. In 15th International Conference on Machine Learning, pages 404–412. Morgan Kaufmann, San Francisco, CA, 1998.
Google Scholar
M. Opper and O. Winther. Advances in large margin classifiers, chapter Gaussian processes and Support Vector Machines: mean field and leave-one-out, pages 311–326. MIT Press, 2000.
Google Scholar
L. Oukhellou, P. Aknin, H. Stoppiglia, and G. Dreyfus. A new decision criterion for feature selection: Application to the classification of non destructive testing signatures. In European SIgnal Processing COnference (EUSIPCO’98), Rhodes, 1998.
Google Scholar
Y. Oussar, G. Monari, and G. Dreyfus. Reply to the comments on ”local overfitting control via leverages” in ”jacobian conditioning analysis for model validation”. Neural Computation, 16:419–443, 2004.
Article MATH Google Scholar
I. Rivals and L. Personnaz. MLPs (mono-layer polynomials and multi-layer perceptrons) for non-linear modeling. JMLR, 2003.
Google Scholar
G.A. Seber. Linear regression analysis. Wiley, New York, 1977.
MATH Google Scholar
G.A. Seber and C.J. Wild. Nonlinear regression. John Wiley and Sons, New York, 1989.
MATH Google Scholar
T. Söderström. On model structure testing in system identification. International Journal of Control, 26:1–18, 1977.
Article MATH MathSciNet Google Scholar
M. Stone. Cross-validatory choice and assessment of statistical predictions. J. Roy. Stat. Soc. B, 36:111–147, 1974.
MATH Google Scholar
H. Stoppiglia. Méthodes Statistiques de Sélection de Modèles Neuronaux; Applications Financières et Bancaires. PhD thesis, l’Université Pierre et Marie Curie, Paris, 1997. (available electronically at http://www.neurones.espci.fr).
Google Scholar
H. Stoppiglia, G. Dreyfus, R. Dubois, and Y. Oussar. Ranking a random feature for variable and feature selection. Journal of Machine Learning Research, pages 1399–1414, 2003.
Google Scholar
J.D. Storey and R. Tibshirani. Statistical significance for genomewide studies. Proc. Nat. Acad. Sci., 100:9440–9445, 2003.
Article MATH MathSciNet Google Scholar
V. Vapnik. Statistical Learning Theory. John Wiley & Sons, N.Y., 1998.
MATH Google Scholar
V.N. Vapnik. Estimation of dependencies based on empirical data. Springer, New-York, 1982.
Google Scholar
D. Wolpert and W.G. Macready. An efficient method to estimate bagging’s generalization error. Machine Learning, 35(1):41–55, 1999.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Électronique (CNRS UMR 7084), École Supérieure de Physique et de Chimie Industrielles (ESPCI-Paristech), 10 rue Vauquelin, 75005, Paris, France
Gérard Dreyfus
ClopiNet, 955 Creston Rd., Berkeley, CA, 94708, USA
Isabelle Guyon

Authors

Gérard Dreyfus
View author publications
You can also search for this author in PubMed Google Scholar
Isabelle Guyon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Clopinet, 955 Creston Road, 94708, Berkeley, USA
Isabelle Guyon
Department of Electrical Engineering & Computer Science — EECS, University of California, 94720, Berkeley, USA
Masoud Nikravesh
School of Electronics and Computer Sciences, University of Southampton, SO17 1BJ, Southampton Highfield, UK
Steve Gunn
Division of Computer Science Lab. Electronics Research, University of California, Soda Hall 387, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dreyfus, G., Guyon, I. (2006). Assessment Methods. In: Guyon, I., Nikravesh, M., Gunn, S., Zadeh, L.A. (eds) Feature Extraction. Studies in Fuzziness and Soft Computing, vol 207. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-35488-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-540-35488-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35487-1
Online ISBN: 978-3-540-35488-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Assessment Methods

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Powershap: A Power-Full Shapley Feature Selection Method

Measuring the Stability of Feature Selection

On the Stability of Feature Selection in the Presence of Feature Correlations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Assessment Methods

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Powershap: A Power-Full Shapley Feature Selection Method

Measuring the Stability of Feature Selection

On the Stability of Feature Selection in the Presence of Feature Correlations

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation