Ensembles of Representative Prototype Sets for Classification and Data Set Analysis

Müssel, Christoph; Lausser, Ludwig; Kestler, Hans A.

doi:10.1007/978-3-662-44983-7_29

Christoph Müssel²¹,
Ludwig Lausser²¹ &
Hans A. Kestler^21,22

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2986 Accesses
1 Citations

Abstract

The drawback of many state-of-the-art classifiers is that their models are not easily interpretable. We recently introduced Representative Prototype Sets (RPS), which are simple base classifiers that allow for a systematic description of data sets by exhaustive enumeration of all possible classifiers.

The major focus of the previous work was on a descriptive characterization of low-cardinality data sets. In the context of prediction, a lack of accuracy of the simple RPS model can be compensated by accumulating the decisions of several classifiers. Here, we now investigate ensembles of RPS base classifiers in a predictive setting on data sets of high dimensionality and low cardinality. The performance of several selection and fusion strategies is evaluated. We visualize the decisions of the ensembles in an exemplary scenario and illustrate links between visual data set inspection and prediction.

Christoph Müssel and Ludwig Lausser have contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

¥17,985 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: JPY 3498; Price includes VAT (Japan)

eBook: JPY 17159; Price includes VAT (Japan)

Softcover Book: JPY 21449; Price includes VAT (Japan)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Ensemble feature selection for high dimensional data: a new method and a comparative study

Article 24 April 2017

Wisdom of Crowds: An Empirical Study of Ensemble-Based Feature Selection Strategies

Dynamic Selection of Classifiers Applied to High-Dimensional Small-Instance Data Sets: Problems and Challenges

References

Bittner, M., Meltzer, P., Chen, Y., Jiang, Y., Seftor, E., Hendrix, M., et al. (2000). Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature, 406, 536–540.
Article Google Scholar
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
MATH MathSciNet Google Scholar
Brighton, H., & Mellish, C. (2002). Advances in instance selection for instance-based learning algorithms. Data Mining and Knowledge Discovery, 6, 153–172.
Article MATH MathSciNet Google Scholar
Dasarathy, B. (1991). Nearest neighbor (NN) norms: NN pattern classification techniques. Los Alamitos: IEEE Computer Society.
Google Scholar
Fix, E., & Hodges, J. (1951). Discriminatory analysis: Nonparametric discrimination: Consistency properties. Technical Report Project 21-49-004, Report Number 4, USAF School of Aviation Medicine, Randolf Field, TX.
Google Scholar
Freund, Y., & Schapire, R. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. In P. Vitányi (Ed.), Computational learning theory. Lecture notes in artificial intelligence (Vol. 904, pp. 23–37). Berlin: Springer.
Google Scholar
Hart, P. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14, 515–516.
Article Google Scholar
Huang, Y., & Suen, C. (1995). A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 90–94.
Article Google Scholar
Kohonen, T. (1988). Learning vector quantization. Neural Networks, 1, 303.
Article Google Scholar
Lausser, L., Müssel, C., & Kestler, H.A. (2012). Representative prototype sets for data characterization and classification. In N. Mana, F. Schwenker, & E. Trentin (Eds.), Artificial neural networks in pattern recognition (ANNPR12). Lecture notes in artificial intelligence (Vol. 7477, pp. 36–47). Berlin: Springer.
Google Scholar
Lausser, L., Müssel, C., & Kestler, H. A. (2014). Identifying predictive hubs to condense the training set of k-nearest neighbour classifiers. Computational Statistics, 29(1–2), 81–95.
Article MATH MathSciNet Google Scholar
Müssel, C., Lausser, L., Maucher, M., & Kestler H. A. (2012). Multi-objective parameter selection for classifiers. Journal of Statistical Software, 46(5), 1–27.
Google Scholar
Notterman, D., Alon, U., Sierk, A., & Levine, A. (2001). Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research, 61, 3124–3130.
Google Scholar
Schwenker, F., & Kestler, H. A. (2002) Analysis of support vectors helps to identify borderline patients in classification studies. In Computers in cardiology (pp. 305–308). Piscataway: IEEE Press.
Google Scholar
Shipp, M., Ross, K., Tamayo, P., Weng, A., Kutok, J., Aguiar, R. C., et al. (2002). Diffuse large b-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine, 8, 68–74.
Article Google Scholar
Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proceedings of the National Academy of Sciences, 99, 6567–6572.
Google Scholar
Vapnik, V. (1998). Statistical learning theory. New York: Wiley.
MATH Google Scholar
Wang, Q., Wen, Y.-G., Li, D.-P., Xia, J., Zhou, C.-Z., Yan, D.-W., et al. (2012). Upregulated INHBA expression is associated with poor survival in gastric cancer. Medical Oncology, 29, 77–83.
Article Google Scholar
Webb, A. R. (2002). Statistical pattern recognition (2nd ed.). Chichester: Wiley.
Book MATH Google Scholar

Download references

Acknowledgements

This work is supported by the German Science Foundation (SFB 1074, Project Z1) to HAK, and the Federal Ministry of Education and Research (BMBF, Gerontosys II, Forschungskern SyStaR, project ID 0315894A) to HAK.

Author information

Authors and Affiliations

Medical Systems Biology and Institute of Neural Information Processing, Ulm University, 89069, Ulm, Germany
Christoph Müssel, Ludwig Lausser & Hans A. Kestler
Leibniz Institute for Age Research – Fritz-Lipman Institute, 07745, Jena, Germany
Hans A. Kestler

Authors

Christoph Müssel
View author publications
You can also search for this author in PubMed Google Scholar
Ludwig Lausser
View author publications
You can also search for this author in PubMed Google Scholar
Hans A. Kestler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hans A. Kestler .

Editor information

Editors and Affiliations

University of Essex, Colchester, United Kingdom
Berthold Lausen
University of Luxembourg, Walferdange, Luxembourg
Sabine Krolak-Schwerdt
University of Luxembourg, Walferdange, Luxembourg
Matthias Böhmer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Müssel, C., Lausser, L., Kestler, H.A. (2015). Ensembles of Representative Prototype Sets for Classification and Data Set Analysis. In: Lausen, B., Krolak-Schwerdt, S., Böhmer, M. (eds) Data Science, Learning by Latent Structures, and Knowledge Discovery. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-44983-7_29

Download citation

DOI: https://doi.org/10.1007/978-3-662-44983-7_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-44982-0
Online ISBN: 978-3-662-44983-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Ensembles of Representative Prototype Sets for Classification and Data Set Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Ensemble feature selection for high dimensional data: a new method and a comparative study

Wisdom of Crowds: An Empirical Study of Ensemble-Based Feature Selection Strategies

Dynamic Selection of Classifiers Applied to High-Dimensional Small-Instance Data Sets: Problems and Challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Ensembles of Representative Prototype Sets for Classification and Data Set Analysis

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Ensemble feature selection for high dimensional data: a new method and a comparative study

Wisdom of Crowds: An Empirical Study of Ensemble-Based Feature Selection Strategies

Dynamic Selection of Classifiers Applied to High-Dimensional Small-Instance Data Sets: Problems and Challenges

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation