Using Data Complexity Measures for Thresholding in Feature Selection Rankers | SpringerLink
Skip to main content

Using Data Complexity Measures for Thresholding in Feature Selection Rankers

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (CAEPIA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9868))

Included in the following conference series:

Abstract

In the last few years, feature selection has become essential to confront the dimensionality problem, removing irrelevant and redundant information. For this purpose, ranker methods have become an approximation commonly used since they do not compromise the computational efficiency. Ranker methods return an ordered ranking of all the features, and thus it is necessary to establish a threshold to reduce the number of features to deal with. In this work, a practical subset of features is selected according to three different data complexity measures, releasing the user from the task of choosing a fixed threshold in advance. The proposed approach was tested on six different DNA microarray datasets which have brought a difficult challenge for researchers due to the high number of gene expression and the low number of patients. The adequacy of the proposed approach in terms of classification error was checked by the use of an ensemble of ranker methods with a Support Vector Machine as classifier. This study shows that our approach was able to achieve competitive results compared with those obtained by fixed threshold approach, which is the standard in most research works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 5719
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 7149
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Ball, G.H., Hall, D.J.: Some implications of interactive graphic computer systems for data analysis and statistics. Technometrics 12(1), 17–31 (1970)

    Article  MATH  Google Scholar 

  2. Basu, M., Ho, T.K.: Data Complexity in Pattern Recognition. Springer Science & Business Media, Berlin (2006)

    Book  MATH  Google Scholar 

  3. Boln-Canedo, V., Snchez-Maroo, N., Alonso-Betanzos, A.: Feature Selection for High-Dimensional Data. Springer, Heidelberg (2016)

    Google Scholar 

  4. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A.: A review of feature selection methods on synthetic data. Knowl. Inf. Syst. 34(3), 483–519 (2013)

    Article  Google Scholar 

  5. Bolón-Canedo, V., Sánchez-Maroño, N., Alonso-Betanzos, A., Benítez, J.M., Herrera, F.: A review of microarray datasets and applied feature selection methods. Inf. Sci. 282, 111–135 (2014)

    Article  Google Scholar 

  6. Gao, K., Khoshgoftaar, T.M., Wang, H.: An empirical investigation of filter attribute selection techniques for software quality classification. In: IEEE International Conference on Information Reuse and Integration, IRI 2009, pp. 272–277. IEEE (2009)

    Google Scholar 

  7. Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)

    Article  MATH  Google Scholar 

  8. Guyon, I.: Feature Extraction: Foundations and Applications, vol. 207. Springer Science & Business Media, Berlin (2006)

    Google Scholar 

  9. Kononenko, I.: Estimating attributes: analysis and extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) Machine Learning: ECML-94. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  10. Kuncheva, L.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-Interscience, Hoboken (2004)

    Book  MATH  Google Scholar 

  11. Liu, H., Setiono, R.: Chi2: feature selection and discretization of numeric attributes. In: 2012 IEEE 24th International Conference on Tools with Artificial Intelligence, pp. 388–388. IEEE Computer Society (1995)

    Google Scholar 

  12. Mejía-Lavalle, M., Sucar, E., Arroyo, G.: Feature selection with a perceptron neural net. In: Proceedings of the International Workshop on Feature Selection for Data Mining, pp. 131–135 (2006)

    Google Scholar 

  13. Morán-Fernández, L., Bolón-Canedo, V., Alonso-Betanzos, A.: A time efficient approach for distributed feature selection partitioning by features. In: Puerta, J.M., et al. (eds.) CAEPIA 2015. LNCS, vol. 9422, pp. 245–254. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24598-0_22

    Chapter  Google Scholar 

  14. Navarro, F.F.G.: Feature selection in cancer research: microarray gene expression and in vivo 1H-MRS domains. Ph.D. thesis, Universitat Politècnica de Catalunya (2011)

    Google Scholar 

  15. Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 27(8), 1226–1238 (2005)

    Article  Google Scholar 

  16. Quinlan, J.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  17. Ridge, K.: Bio-medical dataset. http://datam.i2r.a-star.edu.sg/datasets/krbd. Accessed May 2016

  18. Rodríguez, D., Ruiz, R., Cuadrado-Gallego, J., Aguilar-Ruiz, J.: Detecting fault modules applying feature selection to classifiers. In: IEEE International Conference on Information Reuse and Integration, IRI 2007, pp. 667–672. IEEE (2007)

    Google Scholar 

  19. Seijo-Pardo, B., Bolón-Canedo, V., Alonso-Betanzos, A.: Using a feature selection ensemble on DNA microarray datasets. In: Proceeding of 24th European Symposium on Artificial Neural Networks, pp. 277–282 (2016)

    Google Scholar 

  20. Willett, P.: Combination of similarity rankings using data fusion. J. Chem. Inf. Model. 53(1), 1–10 (2013)

    Article  Google Scholar 

  21. Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This research has been financially supported in part by the Spanish Ministerio de Economía y Competitividad (research project TIN2015-65069-C2-1-R), by European Union FEDER funds and by the Consellería de Industria of the Xunta de Galicia (research project GRC2014/035).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Borja Seijo-Pardo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Seijo-Pardo, B., Bolón-Canedo, V., Alonso-Betanzos, A. (2016). Using Data Complexity Measures for Thresholding in Feature Selection Rankers. In: Luaces , O., et al. Advances in Artificial Intelligence. CAEPIA 2016. Lecture Notes in Computer Science(), vol 9868. Springer, Cham. https://doi.org/10.1007/978-3-319-44636-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44636-3_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44635-6

  • Online ISBN: 978-3-319-44636-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics