Abstract
A new algorithm for ranking the input features and obtaining the best feature subset is developed and illustrated in this paper. The asymptotic formula for mutual information and the expectation maximisation (EM) algorithm are used to developing the feature selection algorithm in this paper. We not only consider the dependence between the features and the class, but also measure the dependence among the features. Even for noisy data, this algorithm still works well. An empirical study is carried out in order to compare the proposed algorithm with the current existing algorithms. The proposed algorithm is illustrated by application to a variety of problems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Wang WJ, Jones P, Partridge D (1999) Assessing the impact of input features in a feedforward network. Neural Computing and Applications 9:101–112
van de Laar P, Heskes TM, Gielen CCAM (1999) Partial retraining: a new approach to input relevance determination. Int J Neur Sys 9:75–85
Battiti R (1994) Using mutual information for selecting features in supervised neutral net learning. IEEE Trans Neur Netwks 5:537–550
Kwak N, Choi C-H (2002) Input feature selection for classification problems. IEEE Trans Neur Netwks 13:143–159
Tchaban T, Taylor MJ, Griffin J (1998) Establishing impacts of the inputs in a feedforward network. Neural Computing and Applications 7:309–317
Young TY, Coraluppi G (1970) Stochastic estimation of a mixture of normal density functions using an information criterion. IEEE Trans Info Theor 16:258–263
Carreira-Perpinan MA (2000) Mode-finding for mixtures of Gaussian distributions. IEEE Trans Patt Anal Mach Intell 22(11):1318–1323
Cang S, Partridge D (2001) Determining the number of components in mixture models using Williams’ statistical test. In: Proceedings of the 8th International Conference on Neural Information Processing, Shanghai, China, November 2001
Richardson S, Green PJ (1997) On Bayesian analysis of mixtures with an unknown number of components. J Roy Stat Soc B59:731–792
Bishop C (1995) Neural networks for pattern recognition. Oxford University Press, Oxford, UK
Haykin S (1999) Neural networks: a comprehensive foundation. Prentice-Hall, Englewood Cliffs, NJ
Theodoridis S, Koutroumbas K (1999) Pattern recognition. Academic Press, San Diego, CA
Acknowledgements
We wish to thank Julia Sonander and Harri Howells of National Air-Traffic Services for the STCA data, and the Engineering and Physical Science Research Council of the UK for supporting this work (grant no. GR/M75143).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cang, S., Partridge, D. Feature ranking and best feature subset using mutual information. Neural Comput & Applic 13, 175–184 (2004). https://doi.org/10.1007/s00521-004-0400-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-004-0400-9