Abstract
Feature maximization (F-max) is an unbiased quality estimation metric of unsupervised classification (clustering) that favours clusters with a maximal feature F-measure value. In this article we show that an adaptation of this metric within the framework of supervised classification allows efficient feature selection and feature contrasting to be performed. We experiment the method on different types of textual data. In this context, we demonstrate that this technique significantly improves the performance of classification methods as compared with the use of state-of-the art feature selection techniques, notably in the case of the classification of unbalanced, highly multidimensional and noisy textual data gathered in similar classes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aha, D., Kibler, D., & Albert, M. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66.
Alphonse, E. E., et al. (2005). Préparation des donnés et analyse des résultats de DEFT’05. In TALN 2005 - Atelier DEFT 2005 (pp. 99–111).
Bache, K., & Lichman, M. (2013). Uci machine learning repository. University of California, School of Information and Computer Science, Irvine, CA, USA. http://archive.ics.uci.edu/ml.
Bolón-Canedo, V., Sánchez-Maroño, N., & Alonso-Betanzos, A. (2012). A review of feature selection methods on synthetic data. Knowledge and Information Systems, 1–37.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Technical report. Wadsworth International Group, Belmont, CA, USA.
Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, W. (2002). Synthetic minority oversampling technique. Journal of Artificial Intelligence Research, 16, 321–357.
Dash, M., & Liu, H. (2003). Consistency-based search in feature selection. Artificial Intelligence, 151(1), 155–176.
Daviet, H. (2009). Class-Add, une procédure de sélection de variables basée sur une troncature k-additive de l’information mutuelle et sur une classification ascendante hiérarchique en pré-traitement. Thèse de doctorat, Université de Nantes.
El-Bèze, M., Torres-Moreno, J.-M., & Béchet, F. (2005). Peut-on rendre automatiquement à César ce qui lui appartient. Application au jeu du Chirand-Mitterrac. In TALN 2005 - Atelier DEFT 2005 (pp. 125–134).
Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3, 1289–1305.
Good, P. (2006). Resampling methods. Ed. Birkhauser.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182.
Guyon, I., Weston, J., Barnhill, S., & Vapnik, V. (2002). Gene selection for cancer classification using support vector machines. Machine Learning, 46(1), 389–422.
Habert, B., et al. (2000). Profilage de textes: cadre de travail et expérience. In Proceedings of JADT’2000 (5ièmes journées internationales d’Analyse Statistique des Données Textuelles).
Hajlaoui, K., Cuxac, P., Lamirel, J.-C., & Francois, C. (2012). Enhancing patent expertise through automatic matching with scientific papers. In J.-G. Ganascia, P. Lenca & J.-M. Petit (Eds.), Discovery science. (Vol. 7569, pp. 299–312), Lecture notes in computer science. Berlin Heidelberg: Springer.
Hall, M., & Smith, L. (1999). Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper. In Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference (pp. 235–239).
Kira, K., & Rendell, L. (1995). The feature selection problem: Traditional methods and a new algorithm. In Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 129–134).
Kohavi, R., & John, G. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97(1–2), 273–324.
Konokenko, I. (1994). Estimating attributes: Analysis and extensions of relief. In Proceedings of European Conference on Machine Learning (pp. 171–182).
Ladha, L., & Deepa, T. (2011). Feature selection methods and algorithms. International Journal on Computer Science and Engineering, 3(5), 1787–1797.
Lallich, S., & Rakotomalala, R. (2000). Fast feature selection using partial correlation for multi-valued attributes. In D. A. Zighed, J. Komorowski & J. Żytkow (Eds.), Principles of data mining and knowledge discovery (Vol. 1910, pp. 221–231), Lecture notes in computer science. Berlin Heidelberg: Springer.
Lamirel, J., Al Shehabi, S., François, C., & Hoffmann, M. (2004). New classification quality estimators for analysis of documentary information: Application to patent analysis and web mapping. Scientometrics, 60(3), 445–562.
Lamirel, J., Ghribi, M., & Cuxac, P. (2010). Unsupervised recall and precision measures: a step towards new efficient clustering quality indexes. In Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT’2010, Paris, France).
Lamirel, J., Cuxac, P., Chivukula, A.S., & Hajlaoui, K. (2014). Optimizing text classification through efficient feature selection based on quality metric. Journal of Intelligent Information Systems, Special issue on PAKDD-QIMIE 2013 (pp. 1–18).
Lamirel, J., & Ta, A. (2008). Combination of hyperbolic visualization and graph-based approach for organizing data analysis results: An application to social network analysis. In Proceedings of the 4th International Conference on Webometrics, Informetrics and Scientometrics and 9th COLLNET Meetings, Berlin, Germany.
Lang, K. (1995). Learning to filter netnews. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 331–339).
Pearson, K. (1901). On lines an planes of closetst fit to systems of points in space. Philosophical Magazine, 2(11), 559–572.
Platt, J. (1999). Fast training of support vector machines using sequential minimal optimization. In: Advances in kernel methods (pp. 185–208). Cambridge, MA, USA: MIT Press.
Porter, M. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Quinlan, J. R. (1993). C4.5: programs for machine learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Salton, G. (1971). Automatic processing of foreign language documents. Englewood Clifs, NJ, USA: Prentice-Hill.
Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing.
Witten, I., & Frank, E. (2005). Data mining: Practical machine learning tools and techniques. San Francisco: Morgan Kaufmann.
Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of ICML 03, Washington DC, USA (pp. 856–863).
Acknowledgments
This work was carried out in the context of the QUAERO program (http://www.quaero.org) supported by OSEO (http://www.oseo.fr/), Agence française de développement de la recherche.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Lamirel, JC., Cuxac, P., Hajlaoui, K. (2017). A Novel Approach to Feature Selection Based on Quality Estimation Metrics. In: Guillet, F., Pinaud, B., Venturini, G. (eds) Advances in Knowledge Discovery and Management. Studies in Computational Intelligence, vol 665. Springer, Cham. https://doi.org/10.1007/978-3-319-45763-5_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-45763-5_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45762-8
Online ISBN: 978-3-319-45763-5
eBook Packages: EngineeringEngineering (R0)