Unsupervised Feature Selection via Unified Trace Ratio Formulation and K-means Clustering (TRACK)

Wang, De; Nie, Feiping; Huang, Heng

doi:10.1007/978-3-662-44845-8_20

De Wang²³,
Feiping Nie²³ &
Heng Huang²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8726))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

3564 Accesses
43 Citations

Abstract

Feature selection plays a crucial role in scientific research and practical applications. In the real world applications, labeling data is time and labor consuming. Thus, unsupervised feature selection methods are desired for many practical applications. Linear discriminant analysis (LDA) with trace ratio criterion is a supervised dimensionality reduction method that has shown good performance to improve classifications. In this paper, we first propose a unified objective to seamlessly accommodate trace ratio formulation and K-means clustering procedure, such that the trace ratio criterion is extended to unsupervised model. After that, we propose a novel unsupervised feature selection method by integrating unsupervised trace ratio formulation and structured sparsity-inducing norms regularization. The proposed method can harness the discriminant power of trace ratio criterion, thus it tends to select discriminative features. Meanwhile, we also provide two important theorems to guarantee the unsupervised feature selection process. Empirical results on four benchmark data sets show that the proposed method outperforms other sate-of-the-art unsupervised feature selection algorithms in all three clustering evaluation metrics.

Download to read the full chapter text

Chapter PDF

Joint local structure preservation and redundancy minimization for unsupervised feature selection

Article 20 July 2020

GOLFS: feature selection via combining both global and local information for high dimensional clustering

Article 03 August 2023

Balanced Spectral Clustering Algorithm Based on Feature Selection

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. In: NIPS, pp. 41–48 (2007)
Google Scholar
Cai, D., Zhang, C., He, X.: Unsupervised feature selection for multi-cluster data. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 333–342. ACM (2010)
Google Scholar
Cai, X., Nie, F., Huang, H., Ding, C.: Feature selection via l2,1-norm support vector machine. In: IEEE International Conference on Data Mining (2011)
Google Scholar
Chen, C.H., Pau, L.F., Wang, P.S.P.: Handbook of pattern recognition and computer vision. World Scientific (2010)
Google Scholar
Ding, C., Peng, H.: Minimum redundancy feature selection from microarray gene expression data. Journal of Bioinformatics and Computational Biology 3(02), 185–205 (2005)
Article Google Scholar
Ding, C., Li, T.: Adaptive dimension reduction using discriminant analysis and k-means clustering. In: International Conference on Machine Learning, pp. 521–528 (2007)
Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research 3, 1289–1305 (2003)
MATH Google Scholar
Gorodnitsky, I., Rao, B.: Sparse signal reconstruction from limited data using focuss: A re-weighted minimum norm algorithm. IEEE Transactions on Signal Processing 45(3), 600–616 (1997)
Article Google Scholar
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. Advances in Neural Information Processing Systems 18, 507 (2006)
Google Scholar
Jia, Y., Nie, F., Zhang, C.: Trace ratio problem revisited. IEEE Transactions on Neural Networks 20(4), 729–735 (2009)
Article Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Article MATH Google Scholar
Kong, D., Ding, C., Huang, H., Zhao, H.: Multi-label relieff and f-statistic feature selections for image annotation. In: The 25th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2352–2359 (2012)
Google Scholar
Kononenko, I.: Estimating attributes: analysis and extensions of relief. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Google Scholar
Kuhn, H.W.: The hungarian method for the assignment problem. Naval Research Logistics Quarterly 2(1-2), 83–97 (1955)
Article MathSciNet Google Scholar
Masaeli, M., Fung, G., Dy, J.G.: From transformation-based dimensionality reduction to feature selection. In: ICML, pp. 751–758 (2010)
Google Scholar
Nie, F., Huang, H., Cai, X., Ding, C.: Efficient and robust feature selection via joint l2,1-norms minimization. Advances in Neural Information Processing Systems 23, 1813–1821 (2010)
Google Scholar
Nie, F., Xiang, S., Jia, Y., Zhang, C.: Semi-supervised orthogonal discriminant analysis via label propagation. Pattern Recognition 42(11), 2615–2627 (2009)
Article MATH Google Scholar
Nie, F., Xiang, S., Jia, Y., Zhang, C., Yan, S.: Trace ratio criterion for feature selection. In: AAAI, pp. 671–676 (2008)
Google Scholar
Peng, H., Long, F., Ding, C.: Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(8), 1226–1238 (2005)
Article Google Scholar
Raileanu, L.E., Stoffel, K.: Theoretical comparison between the gini index and information gain criteria. Ann. Math. Artif. Intell. 41(1), 77–93 (2004)
Article MATH MathSciNet Google Scholar
Wang, C., Caob, L., Miao, B.: Optimal feature selection for sparse linear discriminant analysis and its applications in gene expression data. Computational Statistics and Data Analysis 66, 140–149 (2013)
Article MathSciNet Google Scholar
Wang, D., Nie, F., Huang, H., Yan, J., Risacher, S.L., Saykin, A.J., Shen, L.: Structural brain network constrained neuroimaging marker identification for predicting cognitive functions. In: Gee, J.C., Joshi, S., Pohl, K.M., Wells, W.M., Zöllei, L. (eds.) IPMI 2013. LNCS, vol. 7917, pp. 536–547. Springer, Heidelberg (2013)
Chapter Google Scholar
Wang, H., Nie, F., Huang, H., Kim, S., Nho, K., Risacher, S.L., Saykin, A.J., Shen, L.: Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the adni cohort. Bioinformatics 28(2), 229–237 (2012)
Article Google Scholar
Wang, H., Nie, F., Huang, H., Risacher, S., Ding, C., Saykin, A.J., Shen, L.: ADNI: Sparse multi-task regression and feature selection to identify brain imaging predictors for memory performance. In: IEEE Conference on Computer Vision (2011)
Google Scholar
Xiang, S., Nie, F., Zhang, C.: Learning a mahalanobis distance metric for data clustering and classification. Pattern Recognition 41(12), 3600–3612 (2008)
Article MATH Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of The Royal Statistical Society Series B 68(1), 49–67 (2006)
Article MATH MathSciNet Google Scholar
Zhao, Z., Liu, H.: Spectral feature selection for supervised and unsupervised learning. In: Proceedings of the 24th International Conference on Machine Learning, pp. 1151–1157. ACM (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, 76019, USA
De Wang, Feiping Nie & Heng Huang

Authors

De Wang
View author publications
You can also search for this author in PubMed Google Scholar
Feiping Nie
View author publications
You can also search for this author in PubMed Google Scholar
Heng Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Applied Sciences, Department of Computer and Decision Engineering, Université Libre de Bruxelles, Av. F. Roosevelt, CP 165/15, 1050, Brussels, Belgium
Toon Calders
Dipartimento di Informatica, Università degli Studi “Aldo Moro”, via Orabona 4, 70125, Bari, Italy
Floriana Esposito
Department of Computer Science, Universität Paderborn, Warburger Str. 100, 33098, Paderborn, Germany
Eyke Hüllermeier
Dipartimento di Informatica,, Università degli Studi di Torino, Corso Svizzera 185, 10149, Torino, Italy
Rosa Meo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, D., Nie, F., Huang, H. (2014). Unsupervised Feature Selection via Unified Trace Ratio Formulation and K-means Clustering (TRACK). In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2014. Lecture Notes in Computer Science(), vol 8726. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44845-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-662-44845-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44844-1
Online ISBN: 978-3-662-44845-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Unsupervised Feature Selection via Unified Trace Ratio Formulation and K-means Clustering (TRACK)

Abstract

Chapter PDF

Similar content being viewed by others

Joint local structure preservation and redundancy minimization for unsupervised feature selection

GOLFS: feature selection via combining both global and local information for high dimensional clustering

Balanced Spectral Clustering Algorithm Based on Feature Selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Unsupervised Feature Selection via Unified Trace Ratio Formulation and K-means Clustering (TRACK)

Abstract

Chapter PDF

Similar content being viewed by others

Joint local structure preservation and redundancy minimization for unsupervised feature selection

GOLFS: feature selection via combining both global and local information for high dimensional clustering

Balanced Spectral Clustering Algorithm Based on Feature Selection

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation