Abstract
Feature Selection (FS) algorithms are applied in bioinformatics applications to identify the disease causing genes. Performance of such algorithms is measured in terms of accuracy of the model and stability of FS algorithms. Stability evaluates the identical replication of feature sets obtained after every execution. Recently research has shown that a stability measure must satisfy set of properties like, fully defined, monotonicity, boundedness, deterministic maximum stability, and correction for chance. Among the existing stability measures, only Nogueira’s frequency based stability measure satisfies all the required properties. However, frequency based stability measures fail to discriminate among the cases when overall frequency of features are same. In order to address this issue, the paper proposes a hybrid similarity based stability measure which satisfies all the desirable properties, as mentioned earlier. The proposed stability measure is unique as it is the first similarity based stability measure that satisfies all the required properties. Also, all these essential properties are mathematically established. Further, the paper also proposes a combination of frequency based and similarity based measure which preserves all the aspects of both the approaches. The work presented also analyzes the stability performance of LASSO and Elastic Net, using synthetic and microarray gene expression datasets. Elastic Net depicts higher stability and selection of relevant features.











Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Alelyani S, Zhao Z, Liu H (2011) A dilemma in assessing stability of feature selection algorithms. IEEE International Conference on HPCC, pp 701–707
Bermingham ML, Pong-Wong R, Spiliopoulou A, Hayward C, Rudan I, Campbell H, Wright AF, Wilson JF, Agakov F, Navarro P, Haley CS (2015) Application of high-dimensional feature selection Evaluation for genomic prediction in man. Sci Rep 5:1–12
Bolȯn-Canedo V, Sȧnchez-Marono N, Alonso-Betanzos A, Beni̇tez JM, Herrera F (2014) A review of microarray datasets and applied feature selection methods. Inf Sci 282:111–135
Davis CA, Gerick F, Hintermair V, Friedel CC, Fundel K, Ku̇ffner R, Zimmer R (2006) Reliable gene signatures for microarray classification Assessment of stability and performance. Bioinformatics 22 (19):2356–2363
Dunne K, Cunningham P, Azuaje F (2002) Solutions to instability problems with sequential wrapper-based approaches to feature selection. J Mach Learn Res. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.11.4109
Goh WWB, Wong L (2016) Evaluating feature-selection stability in next-generation proteomics. J Bioinform Comput Biol 14(05):1–23
Guzmȧn-Marti̇nez R, Alaiz-Rodri̇guez R (2011) Feature selection stability assessment based on the Jensen-Shannon divergence
Kalousis A, Prados J, Hilario M (2005) Stability of feature selection algorithms. In: Fifth IEEE international conference on data mining (ICDM’05), pp 8
Kamkar I, Gupta SK, Phung D, Venkatesh S (2015) Stable feature selection with support vector machines. In: Australasian joint conference on artificial intelligence. Springer, Cham, pp 298–308
Krızek P (2016) Improving stability of feature selection methods, Caip 2009, pp 865–872
Kuncheva LI (2007) A stability index for feature selection. In: 25Th international multi-conference: artificial intelligence and applications. ACTA Press, pp 390–395
Lausser L, Mu̇ssel C, Maucher M, Kestler HA (2013) Measuring and visualizing the stability of biomarker selection techniques. Comput Stat 28(1):51–65
Lustgarten JL, Gopalakrishnan V, Visweswaran S (2009) Measuring stability of feature selection in biomedical datasets. In American Medical Informatics Association Symposium. American Medical Informatics Association, pp 406–410
Nogueira S, Sechidis K, Brown G (2017) On the stability of feature selection algorithms. J Mach Learn Res 18(1):6345–6398
Osanaiye O, Cai H, Choo KKR, Dehghantanha A, Xu Z, Dlodlo M (2016) Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing. Eurasip Journal on Wireless Communications and Networking 2016(1)
Sarah Nogueira B, Brown G (2016) Machine learning and knowledge discovery in databases. In: European conference on machine learning and principles and practice of knowledge discovery in databases, pp 442–457
Sehhati M, Mehridehnavi A, Rabbani H, Pourhossein M (2015) Stable gene signature selection for prediction of breast cancer recurrence using joint mutual information. IEEE/ACM Trans Comput Biol Bioinform 12 (6):1440–1448
Shi L, Reid LH, Jones WD, Shippy R, Warrington JA, Baker SC, Collins PJ, de Longueville F, Kawasaki ES, Lee KY, Luo Y, Sun YA, Willey JC, Setterquist RA, Fischer GM, Tong W, Dragan YP, Dix DJ, Frueh FW, Goodsaid FM, Herman D, Jensen RV, Johnson CD, Lobenhofer EK, Puri RK, Schrf U, Thierry-Mieg J, Wang C, Wilson M, Wolber PK, Zhang L, Amur S, Bao W, Barbacioru CC, Lucas Anne B, Bertholet V, Boysen C, Bromley B, Brown D, Brunner A, Canales R, Cao XM, Cebula TA, Chen JJ, Cheng J, Chu T-M, Chudin E, Corson J, Corton JC, Croner LJ, Davies C, Davison TS, Delenstarr G, Deng X, Dorris D, Eklund AC, Fan X-h, Fang H, Fulmer-Smentek S, Fuscoe JC, Gallagher K, Ge W, Guo L, Guo X, Hager J, Haje PK, Han J, Han T, Harbottle HC, Harris SC, Hatchwell E, Hauser CA, Hester S, Hong H, Hurban P, Jackson SA, Ji H, Knight CR, Kuo WP, LeClerc JE, Levy S, Li Q-Z, Liu C, Liu Y, Lombardi MJ, Ma Y, Magnuson SR, Maqsodi B, McDaniel T, Mei N, Myklebost O, Ning B, Novoradovskaya N, Orr MS, Osborn TW, Papallo A, Patterson TA, Perkins RG, Peters EH, Peterson R, Philips KL, Pine PS, Pusztai L, Qian F, Ren H, Rosen M, Rosenzweig BA, Samaha RR, Schena M, Schroth GP, Shchegrova S, Smith DD, Staedtler F, Su Z, Sun H, Szallasi Z, Tezak Z, Thierry-Mieg D, Thompson KL, Tikhonova I, Turpaz Y, Vallanat B, Van C, Walker SJ, Wang SJ, Wang Y, Wolfinger R, Wong A, Wu J, Xiao C, Xie Q, Xu J, Yang W, Zhang L, Zhong S, Zong Y, Slikker W (2006) The MicroArray Quality Control (MAQC) project shows inter- and intraplatform reproducibility of gene expression measurements. Nat Biotechnol 24(9):1151–1161
Somol P, Novovičová J (2010) Evaluating stability and comparing output of feature selectors that optimize feature subset cardinality. IEEE Trans Pattern Anal Mach Intell 32(11):1921–1939
Turney P (1995) Technical Note: Bias and the quantification of stability. Mach Learn 20:23–33
Wald R, Khoshgoftaar TM, Napolitano A (2013) Stability of filter- and Wrapper-Based feature subset selection. In: 25th international conference on tools with artificial intelligence. IEEE, pp 374–380
Yu L, Ding C, Loscalzo S, Stable feature selection via dense feature groups. In: 14Th ACM SIGKDD International conference on Knowledge discovery and data mining - KDD 08. ACM Press New York pp 803–811 (2008)
Zarkoob H, Mehrdad J (2015) Gangeh, and ali ghodsi. Fast and scalable feature selection for gene expression data using Hilbert-Schmidt independence criterion. IEEE Trans Comput Biol Bioinform 14(1):167–181
Zhang M, Zhang L, Zou J, Yao C, Xiao H, Liu Q, Wang J, Wang D, Wang C, Guo Z (2009) Evaluating reproducibility of differential expression discoveries in microarray studies by considering correlated molecular changes. Bioinform 25(13):1662–1668
Zhou DX (2013) On grouping effect of elastic net. Stat Probab Lett 83(9):2108–2112
Zucknick M, Richardson Sa, Stronach EA (2008) Comparing the characteristics of gene expression profiles derived by univariate and multivariate classification methods. Statistical Applications in Genetics and Molecular Biology 7(1):Article7
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Naik, A.K., Kuppili, V. & Edla, D.R. A new hybrid stability measure for feature selection. Appl Intell 50, 3471–3486 (2020). https://doi.org/10.1007/s10489-020-01731-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01731-2